Commit Graph

1400 Commits

Author SHA1 Message Date
Craig Topper 9224c114a9 [X86] Mark the default case of the X86InstrInfo::convertToThreeAddress switch as unreachable.
This function should only be called with instructions that are really convertible. And all
convertible instructions need to be handled by the switch. So nothing should use the default.

llvm-svn: 357529
2019-04-02 20:52:16 +00:00
Craig Topper 7c9afc35bc [X86] Add post-isel pseudos for rotate by immediate using SHLD/SHRD
Haswell CPUs have special support for SHLD/SHRD with the same register for both sources. Such an instruction will go to the rotate/shift unit on port 0 or 6. This gives it 1 cycle latency and 0.5 cycle reciprocal throughput. When the register is not the same, it becomes a 3 cycle operation on port 1. Sandybridge and Ivybridge always have 1 cyc latency and 0.5 cycle reciprocal throughput for any SHLD.

When FastSHLDRotate feature flag is set, we try to use SHLD for rotate by immediate unless BMI2 is enabled. But MachineCopyPropagation can look through a copy and change one of the sources to be different. This will break the hardware optimization.

This patch adds psuedo instruction to hide the second source input until after register allocation and MachineCopyPropagation. I'm not sure if this is the best way to do this or if there's some other way we can make this work.

Fixes PR41055

Differential Revision: https://reviews.llvm.org/D59391

llvm-svn: 357096
2019-03-27 17:29:34 +00:00
Craig Topper b4c49255aa [X86] Make ADD*_DB post-RA pseudos and expand them in expandPostRAPseudo.
These are used to help convert OR->LEA when needed to avoid avoid a copy. They
aren't need after register allocation.

Happens to remove an ugly goto from X86MCCodeEmitter.cpp

llvm-svn: 356356
2019-03-18 05:48:18 +00:00
Craig Topper 4cf8cdc51d [X86] Remove VCVTSI2SDZrrb_Int as it shouldn't exist.
This would convert a signed 32-bit integer to double precision with rounding. But there's nothing to round.

llvm-svn: 355795
2019-03-11 01:20:37 +00:00
Craig Topper 4a9dd7c39b [X86] Enable 8-bit SHL to convert to LEA
Differential Revision: https://reviews.llvm.org/D58870

llvm-svn: 355425
2019-03-05 18:37:41 +00:00
Craig Topper 216bf7f03b [X86] Allow 8-bit INC/DEC to be converted to LEA.
We already do this for 16/32/64 as well as 8-bit add with register/immediate. Might as well do it for 8-bit INC/DEC too.

Differential Revision: https://reviews.llvm.org/D58869

llvm-svn: 355424
2019-03-05 18:37:37 +00:00
Craig Topper 572e94ca02 [X86] Enable 8-bit OR with disjoint bits to convert to LEA
We already support 8-bits adds in convertToThreeAddress. But we can also support 8-bit OR if the bits are disjoint. We already do this for 16/32/64.

Differential Revision: https://reviews.llvm.org/D58863

llvm-svn: 355423
2019-03-05 18:37:33 +00:00
Craig Topper 240315aa64 [X86] Use X86::LAST_VALID_COND instead of assuming X86::COND_S is the last encoding. NFC
llvm-svn: 355059
2019-02-28 01:00:31 +00:00
Simon Pilgrim 398d0b9e96 Fix MSVC constant truncation warnings. NFCI.
llvm-svn: 354731
2019-02-23 18:49:02 +00:00
Craig Topper 75afc0105c [X86] Sign extend the 8-bit immediate when commuting blend instructions to match isel.
Conversion from ConstantSDNode to MachineInstr sign extends immediates from their APInt representation to int64_t.

This commit makes sure we do the same for commuting. The tests changes show how this improves CSE. This issue was made worse by the MachineCSE using commuteInstruction to undo a commute. So we virtually guarantee the sign extend from isel would be lost.

The improved CSE also occurred with r354363, but that was reverted. I'm working to undo the revert, but wanted to get this fix in while it was easy to see the results.

llvm-svn: 354724
2019-02-23 08:34:10 +00:00
Craig Topper 61da80584d [X86] Don't prevent load folding for cvtsi2ss/cvtsi2sd based on hasPartialRegUpdate.
Preventing the load fold won't fix the partial register update since the
input we can fold is a GPR. So it will do nothing to prevent a false dependency
on an XMM register.

llvm-svn: 354193
2019-02-16 03:34:54 +00:00
Matt Arsenault 2a5488b877 X86: Replace isSafeToClobberEFLAGS implementation
Also use modifiesRegister instead of looping over operands.

llvm-svn: 354098
2019-02-15 04:01:39 +00:00
Craig Topper 41a1792b15 [X86] Remove isReMaterializable from X87 floating point constant loads and constant pool loads.
Summary: These instructions update FPSW so they aren't generically safe to rematerialize into any location if FPSW is live for a comparison result. They also use FPCW for exception masking control. Though the only exception they can generate is stack overflow and we manage the stack ourselves so that's not really going to occur.

Reviewers: RKSimon, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57934

llvm-svn: 353536
2019-02-08 17:07:54 +00:00
Simon Pilgrim e95550f508 [X86][AVX] Add VMOVDDUP-VPBROADCASTQ execution domain mapping
Noticed in D57514.

Differential Revision: https://reviews.llvm.org/D57519

llvm-svn: 352922
2019-02-01 21:41:30 +00:00
Simon Pilgrim 9e2c2cfcd9 Fix "comparison of unsigned expression >= 0 is always true" warning. NFCI.
llvm-svn: 351816
2019-01-22 13:18:26 +00:00
Simon Pilgrim 180fcff5a7 [X86][SSE] Add selective commutation support for insertps (PR40340)
When we are inserting 1 "inline" element, and zeroing 2 of the other elements then we can safely commute the insertps source inputs to improve memory folding.

Differential Revision: https://reviews.llvm.org/D56843

llvm-svn: 351807
2019-01-22 12:17:48 +00:00
Chandler Carruth 2946cd7010 Update the file headers across all of the LLVM projects in the monorepo
to reflect the new license.

We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.

Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.

llvm-svn: 351636
2019-01-19 08:50:56 +00:00
Craig Topper 62ec024d3b [X86] Don't allow optimizeCompareInstr to replace a CMP with BEXTR if the sign flag is used.
The BEXTR instruction documents the SF bit as undefined.

The TBM BEXTR instruction has the same issue, but I'm not sure how to test it. With the control being an immediate we can determine the sign bit is 0 or the BEXTR would have been removed.

Fixes PR40060

Differential Revision: https://reviews.llvm.org/D55807

llvm-svn: 349956
2018-12-21 21:16:26 +00:00
Craig Topper 18a9d545e1 [X86] Add BSR to isUseDefConvertible.
We already had BSF here as part of __builtin_ffs improvements and I was just wondering yesterday whether we should have BSR there.

This addresses one issue from PR40090.

llvm-svn: 349531
2018-12-18 20:03:54 +00:00
Craig Topper 1ff7356f96 [X86] Const correct some helper functions X86InstrInfo.cpp. NFC
llvm-svn: 349440
2018-12-18 04:58:05 +00:00
Craig Topper 8c9d772991 [X86] Add T1MSKC and TZMSK to isDefConvertible used by optimizeCompareInstr.
These seem to have been missed when the other TBM instructions were added.

llvm-svn: 349404
2018-12-17 21:50:06 +00:00
Sanjay Patel 44eaa492b8 [x86] allow 8-bit adds to be promoted by convertToThreeAddress() to form LEA
This extends the code that handles 16-bit add promotion to form LEA to also allow 8-bit adds. 
That allows us to combine add ops with register moves and save some instructions. This is 
another step towards allowing add truncation in generic DAGCombiner (see D54640).

Differential Revision: https://reviews.llvm.org/D55494

llvm-svn: 348946
2018-12-12 17:58:27 +00:00
Sanjay Patel 05e36982dd [x86] clean up code for converting 16-bit ops to LEA; NFC
As discussed in D55494, we want to extend this to handle 8-bit
ops too, but that could be extended further to enable this on
32-bit systems too.

llvm-svn: 348851
2018-12-11 15:29:40 +00:00
Sanjay Patel 9765ba5f86 [x86] remove dead code for 16-bit LEA formation; NFC
As discussed in:
D55494
...this code has been disabled/dead for a long time (the code references
Athlon and Pentium 4), and there's almost no chance that it will be used 
given the last decade of uarch evolution. Also, in SDAG we promote 16-bit 
ops to 32-bit, so there's almost no way to test this code any more.

llvm-svn: 348845
2018-12-11 14:05:03 +00:00
Sanjay Patel 19bc850220 [x86] don't try to convert add with undef operands to LEA
The existing code tries to handle an undef operand while transforming an add to an LEA, 
but it's incomplete because we will crash on the i16 test with the debug output shown below. 
It's better to just give up instead. Really, GlobalIsel should have folded these before we 
could get into trouble.

# Machine code for function add_undef_i16: NoPHIs, TracksLiveness, Legalized, RegBankSelected, Selected

bb.0 (%ir-block.0):
  liveins: $edi
  %1:gr32 = COPY killed $edi
  %0:gr16 = COPY %1.sub_16bit:gr32
  %5:gr64_nosp = IMPLICIT_DEF
  %5.sub_16bit:gr64_nosp = COPY %0:gr16
  %6:gr64_nosp = IMPLICIT_DEF
  %6.sub_16bit:gr64_nosp = COPY %2:gr16
  %4:gr32 = LEA64_32r killed %5:gr64_nosp, 1, killed %6:gr64_nosp, 0, $noreg
  %3:gr16 = COPY killed %4.sub_16bit:gr32
  $ax = COPY killed %3:gr16
  RET 0, implicit killed $ax

# End machine code for function add_undef_i16.

*** Bad machine code: Reading virtual register without a def ***
- function:    add_undef_i16
- basic block: %bb.0  (0x7fe6cd83d940)
- instruction: %6.sub_16bit:gr64_nosp = COPY %2:gr16
- operand 1:   %2:gr16
LLVM ERROR: Found 1 machine code errors.

Differential Revision: https://reviews.llvm.org/D54710

llvm-svn: 348722
2018-12-09 14:40:37 +00:00
Francis Visoiu Mistrih d7eebd6d83 [CodeGen][NFC] Make `TII::getMemOpBaseImmOfs` return a base operand
Currently, instructions doing memory accesses through a base operand that is
not a register can not be analyzed using `TII::getMemOpBaseRegImmOfs`.

This means that functions such as `TII::shouldClusterMemOps` will bail
out on instructions using an FI as a base instead of a register.

The goal of this patch is to refactor all this to return a base
operand instead of a base register.

Then in a separate patch, I will add FI support to the mem op clustering
in the MachineScheduler.

Differential Revision: https://reviews.llvm.org/D54846

llvm-svn: 347746
2018-11-28 12:00:20 +00:00
Evandro Menezes 9ef79c884a [TableGen] Refactor macro names (NFC)
Make the names for the macros for `TargetInstrInfo` uniform.

llvm-svn: 347706
2018-11-27 20:58:27 +00:00
Matthias Braun c6613879ce LivePhysRegs/IfConversion: Change some types from unsigned to MCPhysReg; NFC
Change the type in a couple of lists and sets that only store physical
registers from unsigned to MCPhysRegs. The later is only 16bits and
saves us a bit of memory.

llvm-svn: 346254
2018-11-06 19:00:11 +00:00
Craig Topper 6c3f1692c8 Revert r345165 "[X86] Bring back the MOV64r0 pseudo instruction"
Google is reporting regressions on some benchmarks.

llvm-svn: 345785
2018-10-31 21:53:24 +00:00
Andrea Di Biagio 3d2b7176fc [tblgen][PredicateExpander] Add the ability to describe more complex constraints on instruction operands.
Before this patch, class PredicateExpander only knew how to expand simple
predicates that performed checks on instruction operands.
In particular, the new scheduling predicate syntax was not rich enough to
express checks like this one:

  Foo(MI->getOperand(0).getImm()) == ExpectedVal;

Here, the immediate operand value at index zero is passed in input to function
Foo, and ExpectedVal is compared against the value returned by function Foo.

While this predicate pattern doesn't show up in any X86 model, it shows up in
other upstream targets. So, being able to support those predicates is
fundamental if we want to be able to modernize all the scheduling models
upstream.

With this patch, we allow users to specify if a register/immediate operand value
needs to be passed in input to a function as part of the predicate check. Now,
register/immediate operand checks all derive from base class CheckOperandBase.

This patch also changes where TIIPredicate definitions are expanded by the
instructon info emitter. Before, definitions were expanded in class
XXXGenInstrInfo (where XXX is a target name).
With the introduction of this new syntax, we may want to have TIIPredicates
expanded directly in XXXInstrInfo. That is because functions used by the new
operand predicates may only exist in the derived class (i.e. XXXInstrInfo).

This patch is a non functional change for the existing scheduling models.
In future, we will be able to use this richer syntax to better describe complex
scheduling predicates, and expose them to llvm-mca.

Differential Revision: https://reviews.llvm.org/D53880

llvm-svn: 345714
2018-10-31 12:28:05 +00:00
Reid Kleckner 49a24278ba [ELF] Fix large code model MIR verifier errors
Instead of using the MOVGOT64r pseudo, use the existing
MO_PIC_BASE_OFFSET support on symbol operands. Now I don't have to
create a "scratch register operand" for the pseudo to use, and the
register allocator can make better decisions.

Fixes some X86 verifier errors tracked in PR27481.

llvm-svn: 345219
2018-10-24 22:57:28 +00:00
Craig Topper 2417273255 [X86] Bring back the MOV64r0 pseudo instruction
This patch brings back the MOV64r0 pseudo instruction for zeroing a 64-bit register. This replaces the SUBREG_TO_REG MOV32r0 sequence we use today. Post register allocation we will rewrite the MOV64r0 to a 32-bit xor with an implicit def of the 64-bit register similar to what we do for the various XMM/YMM/ZMM zeroing pseudos.

My main motivation is to enable the spill optimization in foldMemoryOperandImpl. As we were seeing some code that repeatedly did "xor eax, eax; store eax;" to spill several registers with a new xor for each store. With this optimization enabled we get a store of a 0 immediate instead of an xor. Though I admit the ideal solution would be one xor where there are multiple spills. I don't believe we have a test case that shows this optimization in here. I'll see if I can try to reduce one from the code were looking at.

There's definitely some other machine CSE(and maybe other passes) behavior changes exposed by this patch. So it seems like there might be some other deficiencies in SUBREG_TO_REG handling.

Differential Revision: https://reviews.llvm.org/D52757

llvm-svn: 345165
2018-10-24 17:32:09 +00:00
Matthias Braun a0beeffeed X86: Do not optimize branches with undef eflags inputs
analyzeBranch()/insertBranch() etc. do not properly deal with an undef
flag on the eflags input and used to produce invalid MIR.  I don't see
this ever affecting real world inputs (I don't think it is possible to
produce undef flags with llvm IR), so I simply changed the code to bail
out in this case.

rdar://42122367

llvm-svn: 344970
2018-10-22 22:52:23 +00:00
Matthias Braun 81578e9f77 X86, AArch64, ARM: Do not attach debug location to spill/reload instructions
This rebases and recommits r343520. hwasan should be fixed now and this
shouldn't break the tests anymore.

Spill/reload instructions are artificially generated by the compiler and
have no relation to the original source code. So the best thing to do is
not attach any debug location to them (instead of just taking the next
debug location we find on following instructions).

Differential Revision: https://reviews.llvm.org/D52125

llvm-svn: 343895
2018-10-05 22:00:13 +00:00
Matt Morehouse 4b1ec17fb0 Revert "X86, AArch64, ARM: Do not attach debug location to spill/reload instructions"
This reverts r343520 due to breakage of HWASan tests on Android.

llvm-svn: 343616
2018-10-02 18:35:44 +00:00
Matthias Braun 3e081703c3 X86, AArch64, ARM: Do not attach debug location to spill/reload instructions
Spill/reload instructions are artificially generated by the compiler and
have no relation to the original source code. So the best thing to do is
not attach any debug location to them (instead of just taking the next
debug location we find on following instructions).

Differential Revision: https://reviews.llvm.org/D52125

llvm-svn: 343520
2018-10-01 18:56:39 +00:00
Craig Topper 1d1dca6a6f [X86] Change an llvm_unreachable to a report_fatal_error so the optimizer will stop making us reach the other report_fatal_error in this function.
There's a conditional report_fatal_error just above this llvm_unreachable. The optimizer when seeing the unreachable removes the conditional and just makes any other error trigger the existing report_fatal_error.

llvm-svn: 343428
2018-09-30 23:43:30 +00:00
Sander de Smalen c91b27d9ee Remove FrameAccess struct from hasLoadFromStackSlot
This removes the FrameAccess struct that was added to the interface
in D51537, since the PseudoValue from the MachineMemoryOperand
can be safely casted to a FixedStackPseudoSourceValue.

Reviewers: MatzeB, thegameg, javed.absar

Reviewed By: thegameg

Differential Revision: https://reviews.llvm.org/D51617

llvm-svn: 341454
2018-09-05 08:59:50 +00:00
Sander de Smalen 6cab60fa06 Extend hasStoreToStackSlot with list of FI accesses.
For instructions that spill/fill to and from multiple frame-indices
in a single instruction, hasStoreToStackSlot and hasLoadFromStackSlot
should return an array of accesses, rather than just the first encounter
of such an access.

This better describes FI accesses for AArch64 (paired) LDP/STP
instructions.

Reviewers: t.p.northover, gberry, thegameg, rengolin, javed.absar, MatzeB

Reviewed By: MatzeB

Differential Revision: https://reviews.llvm.org/D51537

llvm-svn: 341301
2018-09-03 09:15:58 +00:00
Alexander Ivchenko af96112ec6 Make TargetInstrInfo::isCopyInstr return true for regular COPY-instructions
..Move all target-dependent checks into new isCopyInstrImpl method.

This change allows us to treat MoveReg-type instructions and generic
COPY instruction in the same way

Differential Revision: https://reviews.llvm.org/D49913

llvm-svn: 341072
2018-08-30 14:32:47 +00:00
Martin Storsjo 489993db94 [MinGW] [X86] Add stubs for references to data variables that might end up imported from a dll
Variables declared with the dllimport attribute are accessed via a
stub variable named __imp_<var>. In MinGW configurations, variables that
aren't declared with a dllimport attribute might still end up imported
from another DLL with runtime pseudo relocs.

For x86_64, this avoids the risk that the target is out of range
for a 32 bit PC relative reference, in case the target DLL is loaded
further than 4 GB from the reference. It also avoids having to make the
text section writable at runtime when doing the runtime fixups, which
makes it worthwhile to do for i386 as well.

Add stub variables for all dso local data references where a definition
of the variable isn't visible within the module, since the DLL data
autoimporting might make them imported even though they are marked as
dso local within LLVM.

Don't do this for variables that actually are defined within the same
module, since we then know for sure that it actually is dso local.

Don't do this for references to functions, since there's no need for
runtime pseudo relocations for autoimporting them; if a function from
a different DLL is called without the appropriate dllimport attribute,
the call just gets routed via a thunk instead.

GCC does something similar since 4.9 (when compiling with -mcmodel=medium
or large; from that version, medium is the default code model for x86_64
mingw), but only for x86_64.

Differential Revision: https://reviews.llvm.org/D51288

llvm-svn: 340942
2018-08-29 17:28:34 +00:00
Chandler Carruth c73c0307fe [MI] Change the array of `MachineMemOperand` pointers to be
a generically extensible collection of extra info attached to
a `MachineInstr`.

The primary change here is cleaning up the APIs used for setting and
manipulating the `MachineMemOperand` pointer arrays so chat we can
change how they are allocated.

Then we introduce an extra info object that using the trailing object
pattern to attach some number of MMOs but also other extra info. The
design of this is specifically so that this extra info has a fixed
necessary cost (the header tracking what extra info is included) and
everything else can be tail allocated. This pattern works especially
well with a `BumpPtrAllocator` which we use here.

I've also added the basic scaffolding for putting interesting pointers
into this, namely pre- and post-instruction symbols. These aren't used
anywhere yet, they're just there to ensure I've actually gotten the data
structure types correct. I'll flesh out support for these in
a subsequent patch (MIR dumping, parsing, the works).

Finally, I've included an optimization where we store any single pointer
inline in the `MachineInstr` to avoid the allocation overhead. This is
expected to be the overwhelmingly most common case and so should avoid
any memory usage growth due to slightly less clever / dense allocation
when dealing with >1 MMO. This did require several ergonomic
improvements to the `PointerSumType` to reasonably support the various
usage models.

This also has a side effect of freeing up 8 bits within the
`MachineInstr` which could be repurposed for something else.

The suggested direction here came largely from Hal Finkel. I hope it was
worth it. ;] It does hopefully clear a path for subsequent extensions
w/o nearly as much leg work. Lots of thanks to Reid and Justin for
careful reviews and ideas about how to do all of this.

Differential Revision: https://reviews.llvm.org/D50701

llvm-svn: 339940
2018-08-16 21:30:05 +00:00
Chandler Carruth 66654b72c9 [SDAG] Remove the reliance on MI's allocation strategy for
`MachineMemOperand` pointers attached to `MachineSDNodes` and instead
have the `SelectionDAG` fully manage the memory for this array.

Prior to this change, the memory management was deeply confusing here --
The way the MI was built relied on the `SelectionDAG` allocating memory
for these arrays of pointers using the `MachineFunction`'s allocator so
that the raw pointer to the array could be blindly copied into an
eventual `MachineInstr`. This creates a hard coupling between how
`MachineInstr`s allocate their array of `MachineMemOperand` pointers and
how the `MachineSDNode` does.

This change is motivated in large part by a change I am making to how
`MachineFunction` allocates these pointers, but it seems like a layering
improvement as well.

This would run the risk of increasing allocations overall, but I've
implemented an optimization that should avoid that by storing a single
`MachineMemOperand` pointer directly instead of allocating anything.
This is expected to be a net win because the vast majority of uses of
these only need a single pointer.

As a side-effect, this makes the API for updating a `MachineSDNode` and
a `MachineInstr` reasonably different which seems nice to avoid
unexpected coupling of these two layers. We can map between them, but we
shouldn't be *surprised* at where that occurs. =]

Differential Revision: https://reviews.llvm.org/D50680

llvm-svn: 339740
2018-08-14 23:30:32 +00:00
Craig Topper 570d47a010 [X86] Change the MOV32ri64 pseudo instruction to def a GR64 directly instead of wrapping it in a SUBREG_TO_REG.
Now we switch to the subregister in expandPostRAPseudos where we already switched the opcode.

This simplifies a few isel patterns that used the pseudo directly. And magically seems to have improved our ability to CSE it in the undef-label.ll test.

llvm-svn: 339496
2018-08-11 05:33:00 +00:00
Craig Topper 0423881820 [X86] Allow fake unary unpckhpd and movhlps to be commuted for execution domain fixing purposes
These instructions perform the same operation, but the semantic of which operand is destroyed is reversed. If the same register is used as both operands we can change the execution domain without worrying about this difference.

Unfortunately, this really only works in cases where the input register is killed by the instruction. If its not killed, the two address isntruction pass inserts a copy that will become a move instruction. This makes the instruction use different physical registers that contain the same data at the time the unpck/movhlps executes. I've considered using a unary pseudo instruction with tied operand to trick the two address instruction pass. We could then expand the pseudo post regalloc to get the same physical register on both inputs.

Differential Revision: https://reviews.llvm.org/D50157

llvm-svn: 338735
2018-08-02 16:48:01 +00:00
Francis Visoiu Mistrih 7d003657de [MachineOutliner][X86] Use TAILJMPd64 instead of JMP_1 for TailCall construction
The machine verifier asserts with:

Assertion failed: (isMBB() && "Wrong MachineOperand accessor"), function getMBB, file ../include/llvm/CodeGen/MachineOperand.h, line 542.

It calls analyzeBranch which tries to call getMBB if the opcode is
JMP_1, but in this case we do:

JMP_1 @OUTLINED_FUNCTION

I believe we have to use TAILJMPd64 instead of JMP_1 since JMP_1 is used
with brtarget8.

Differential Revision: https://reviews.llvm.org/D49299

llvm-svn: 338237
2018-07-30 09:59:33 +00:00
Jessica Paquette 69f517df27 [MachineOutliner][NFC] Move target frame info into OutlinedFunction
Just some gardening here.

Similar to how we moved call information into Candidates, this moves outlined
frame information into OutlinedFunction. This allows us to remove
TargetCostInfo entirely.

Anywhere where we returned a TargetCostInfo struct, we now return an
OutlinedFunction. This establishes OutlinedFunctions as more of a general
repeated sequence, and Candidates as occurrences of those repeated sequences.

llvm-svn: 337848
2018-07-24 20:13:10 +00:00
Chandler Carruth c9313a9ecb [x86] Teach the x86 backend that it can fold between TCRETURNm* and TCRETURNr* and fix latent bugs with register class updates.
Summary:
Enabling this fully exposes a latent bug in the instruction folding: we
never update the register constraints for the register operands when
fusing a load into another operation. The fused form could, in theory,
have different register constraints on its operands. And in fact,
TCRETURNm* needs its memory operands to use tailcall compatible
registers.

I've updated the folding code to re-constrain all the registers after
they are mapped onto their new instruction.

However, we still can't enable folding in the general case from
TCRETURNr* to TCRETURNm* because doing so may require more registers to
be available during the tail call. If the call itself uses all but one
register, and the folded load would require both a base and index
register, there will not be enough registers to allocate the tail call.

It would be better, IMO, to teach the register allocator to *unfold*
TCRETURNm* when it runs out of registers (or specifically check the
number of registers available during the TCRETURNr*) but I'm not going
to try and solve that for now. Instead, I've just blocked the forward
folding from r -> m, leaving LLVM free to unfold from m -> r as that
doesn't introduce new register pressure constraints.

The down side is that I don't have anything that will directly exercise
this. Instead, I will be immediately using this it my SLH patch. =/

Still worse, without allowing the TCRETURNr* -> TCRETURNm* fold, I don't
have any tests that demonstrate the failure to update the memory operand
register constraints. This patch still seems correct, but I'm nervous
about the degree of testing due to this.

Suggestions?

Reviewers: craig.topper

Subscribers: sanjoy, mcrosier, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D49717

llvm-svn: 337845
2018-07-24 19:04:37 +00:00
Jessica Paquette fca55129b1 [MachineOutliner][NFC] Make Candidates own their call information
Before this, TCI contained all the call information for each Candidate.

This moves that information onto the Candidates. As a result, each Candidate
can now supply how it ought to be called. Thus, Candidates will be able to,
say, call the same function in cheaper ways when possible. This also removes
that information from TCI, since it's no longer used there.

A follow-up patch for the AArch64 outliner will demonstrate this.

llvm-svn: 337840
2018-07-24 17:42:11 +00:00
Reid Kleckner 980c4df037 Re-land r335297 "[X86] Implement more of x86-64 large and medium PIC code models"
Don't try to generate large PIC code for non-ELF targets. Neither COFF
nor MachO have relocations for large position independent code, and
users have been using "large PIC" code models to JIT 64-bit code for a
while now. With this change, if they are generating ELF code, their
JITed code will truly be PIC, but if they target MachO or COFF, it will
contain 64-bit immediates that directly reference external symbols. For
a JIT, that's perfectly fine.

llvm-svn: 337740
2018-07-23 21:14:35 +00:00
Craig Topper 92ea7a7b48 [X86] Enable commuting of VUNPCKHPD to VMOVLHPS to enable load folding by using VMOVLPS with a modified address.
This required an annoying amount of tablegen multiclass changes to make only VUNPCKHPDZ128rr commutable.

llvm-svn: 337357
2018-07-18 07:31:32 +00:00
Craig Topper cdb0ed2910 [X86] Add custom execution domain fixing for 128/256-bit integer logic operations with AVX512F, but not AVX512DQ.
AVX512F only has integer domain logic instructions. AVX512DQ added FP domain logic instructions.

Execution domain fixing runs before EVEX->VEX. So if we have AVX512F and not AVX512DQ we fail to do execution domain switching of the logic operations. This leads to mismatches in execution domain and more test differences.

This patch adds custom domain fixing that switches EVEX integer logic operations to VEX fp logic operations if XMM16-31 are not used.

llvm-svn: 337137
2018-07-15 23:32:36 +00:00
Craig Topper 7426cf6717 [X86] Fix a subtle bug in the custom execution domain fixing for blends.
The code tried to find the immediate by using getNumOperands() on the MachineInstr, but there might be implicit-defs after the immediate that get counted.

Instead use getNumOperands() from the instruction description which will only count the operands that are defined in the td file.

llvm-svn: 337088
2018-07-14 06:30:30 +00:00
Craig Topper 02867f0fa3 [X86] The TEST instruction is eliminated when BSF/TZCNT is used
Summary:
These changes cover the PR#31399.
Now the ffs(x) function is lowered to (x != 0) ? llvm.cttz(x) + 1 : 0
and it corresponds to the following llvm code:
  %cnt = tail call i32 @llvm.cttz.i32(i32 %v, i1 true)
  %tobool = icmp eq i32 %v, 0
  %.op = add nuw nsw i32 %cnt, 1
  %add = select i1 %tobool, i32 0, i32 %.op
and x86 asm code:
  bsfl     %edi, %ecx
  addl     $1, %ecx
  testl    %edi, %edi
  movl     $0, %eax
  cmovnel  %ecx, %eax
In this case the 'test' instruction can't be eliminated because
the 'add' instruction modifies the EFLAGS, namely, ZF flag
that is set by the 'bsf' instruction when 'x' is zero.

We now produce the following code:
  bsfl     %edi, %ecx
  movl     $-1, %eax
  cmovnel  %ecx, %eax
  addl     $1, %eax

Patch by Ivan Kulagin

Reviewers: davide, craig.topper, spatel, RKSimon

Reviewed By: craig.topper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D48765

llvm-svn: 336768
2018-07-11 06:57:42 +00:00
Craig Topper 860ab496d3 [X86] Teach X86InstrInfo::commuteInstructionImpl to use MOVSD/MOVSS for BLEND under optsize when the immediate allows it.
Isel currently emits movss/movsd a lot of the time and an accidental double commute turns it into a blend.

Ideally we'd select blend directly in isel under optspeed and not rely on the double commute to create blend.

llvm-svn: 336731
2018-07-10 22:02:23 +00:00
Yvan Roux eaececf5e0 [MachineOutliner] Fix typo in getOutliningCandidateInfo function name
getOutlininingCandidateInfo -> getOutliningCandidateInfo

Differential Revision: https://reviews.llvm.org/D48867

llvm-svn: 336285
2018-07-04 15:37:08 +00:00
Craig Topper 0661f67296 [X86] Remove FMA3Info DenseMap. Break into sorted tables that we can binary search.
I separated out the rounding and broadcast groups into their own tables because it made the ordering in the main table easier.

Further splitting of the tables might make it possible to directly index using bits from the TSFlags, but its probably not worth it right now.

llvm-svn: 336075
2018-07-02 06:23:39 +00:00
Craig Topper c004aa6c5f [X86] Remove the places that return nullptr from X86InstrInfo::commuteInstructionImpl.
findCommutedOpIndices does the pre-checking for whether commuting is possible. There should be no reason left to fail in commuteInstructionImpl. There was a missing pre-check that I've added there and changed the check to an assert in commuteInstructionImpl.

llvm-svn: 336070
2018-07-01 23:27:41 +00:00
Craig Topper 4e78213ae4 [X86] Move the memory unfolding table creation into its own class and make it a ManagedStatic.
Also move the static folding tables, their search functions and the new class into new cpp/h files.

The unfolding table is effectively static data. It's just a different ordering and a subset of the static folding tables.

By putting it in a separate ManagedStatic we ensure we only have one copy instead of one per X86InstrInfo object. This way also makes it only get initialized when really needed.

llvm-svn: 336056
2018-07-01 05:47:49 +00:00
Craig Topper 84199deb17 [X86] Move the X86InstrFMA3Info class into the cpp file. Expose only a getFMA3Group free function. NFCI
The class only exists to hold a DenseMap and is only created as a ManagedStatic. It used to expose a single static method that outside code was expected to use.

This patch moves that static function out of the class and moves it implementation into the cpp file. It can now access the ManagedStatic directly by name without the need for the other static method that accessed the ManagedStatic.

llvm-svn: 336055
2018-06-30 22:38:42 +00:00
Craig Topper 7c96f051d2 [X86] Use a std::vector for the memory unfolding table.
Previously we used a DenseMap which is costly to set up due to multiple full table rehashes as the size increases and causes the table to be reallocated.

This patch changes the table to a vector of structs. We now walk the reg->mem tables and push new entries in the mem->reg table for each row not marked TB_NO_REVERSE. Once all the table entries have been created, we sort the vector. Then we can use a binary search for lookups.

Differential Revision: https://reviews.llvm.org/D48585

llvm-svn: 335994
2018-06-29 17:11:26 +00:00
Jonas Devlieghere b757fc3878 Revert "Re-land r335297 "[X86] Implement more of x86-64 large and medium PIC code models""
Reverting because this is causing failures in the LLDB test suite on
GreenDragon.

  LLVM ERROR: unsupported relocation with subtraction expression, symbol
  '__GLOBAL_OFFSET_TABLE_' can not be undefined in a subtraction
  expression

llvm-svn: 335894
2018-06-28 17:56:43 +00:00
Benjamin Kramer f9613b2995 Unify sorted asserts to use the existing atomic pattern
These are all benign races and only visible in !NDEBUG. tsan complains
about it, but a simple atomic bool is sufficient to make it happy.

llvm-svn: 335823
2018-06-28 10:03:45 +00:00
Benjamin Kramer e214f046af [X86] Make folding table checking threadsafe
This is a benign race, but tsan likes to complain about it. Just make it
happy.

llvm-svn: 335788
2018-06-27 21:01:53 +00:00
Craig Topper 33aba0eb4c [X86] Don't store register and memory FMA3 opcodes in the same X86InstrFMA3Group.
Nothing was using this relationship. By splitting them we no longer need to worry about register or memory entries being empty in a group.

The memory folding tables in X86InstrInfo.cpp can be used to access this relationship if needed.

llvm-svn: 335694
2018-06-27 00:42:24 +00:00
Craig Topper 614f192471 [X86] Add comment about the sorting of the memory folding tables added in r335501.
llvm-svn: 335517
2018-06-25 20:11:16 +00:00
Reid Kleckner 88fee5fdbc Re-land r335297 "[X86] Implement more of x86-64 large and medium PIC code models"
The large code model allows code and data segments to exceed 2GB, which
means that some symbol references may require a displacement that cannot
be encoded as a displacement from RIP. The large PIC model even relaxes
the assumption that the GOT itself is within 2GB of all code. Therefore,
we need a special code sequence to materialize it:
  .LtmpN:
    leaq .LtmpN(%rip), %rbx
    movabsq $_GLOBAL_OFFSET_TABLE_-.LtmpN, %rax # Scratch
    addq %rax, %rbx # GOT base reg

From that, non-local references go through the GOT base register instead
of being PC-relative loads. Local references typically use GOTOFF
symbols, like this:
    movq extern_gv@GOT(%rbx), %rax
    movq local_gv@GOTOFF(%rbx), %rax

All calls end up being indirect:
    movabsq $local_fn@GOTOFF, %rax
    addq %rbx, %rax
    callq *%rax

The medium code model retains the assumption that the code segment is
less than 2GB, so calls are once again direct, and the RIP-relative
loads can be used to access the GOT. Materializing the GOT is easy:
    leaq _GLOBAL_OFFSET_TABLE_(%rip), %rbx # GOT base reg

DSO local data accesses will use it:
    movq local_gv@GOTOFF(%rbx), %rax

Non-local data accesses will use RIP-relative addressing, which means we
may not always need to materialize the GOT base:
    movq extern_gv@GOTPCREL(%rip), %rax

Direct calls are basically the same as they are in the small code model:
They use direct, PC-relative addressing, and the PLT is used for calls
to non-local functions.

This patch adds reasonably comprehensive testing of LEA, but there are
lots of interesting folding opportunities that are unimplemented.

I restricted the MCJIT/eh-lg-pic.ll test to Linux, since the large PIC
code model is not implemented for MachO yet.

Differential Revision: https://reviews.llvm.org/D47211

llvm-svn: 335508
2018-06-25 18:16:27 +00:00
Craig Topper 3cc6cb1d35 [X86] Sort the static memory folding tables by reg opcode. Remove the reg->mem DenseMaps in favor of binary search.
With the static tables sorted we can binary search them directly for reg->mem lookups. This removes 6 DenseMaps that had to be created when X86InstrInfo is constructed.

We still have one Mem->Reg DenseMap for the reverse direction. This is created just as before by walking the reg->mem arrays to populate it.

Differential Revision: https://reviews.llvm.org/D48527

llvm-svn: 335501
2018-06-25 17:26:56 +00:00
Craig Topper facea6b4a6 [X86] Block commuting operand 1 of FMA*_Int instructions in findThreeSrcCommutedOpIndices. Remove uncommutable returns from getThreeSrcCommuteCase/getFMA3OpcodeToCommuteOperands.
We should be blocking the operand while we are in the routine that tries to find commutable operand indices. Doing it later means we might have missed out on another valid set of operands we could have commuted.

The intrinsic case was the only case that could really prevent commuting in getFMA3OpcodeToCommuteOperands. All the other cases in getThreeSrcCommuteCase were not reachable conditions as they were protected by findThreeSrcCommutedOpIndices.

With that abort case pushed earlier, we can remove all the abort checks and replace with asserts.

llvm-svn: 335446
2018-06-25 06:05:37 +00:00
Craig Topper 19772c89c7 [X86] Rename VFPCLASSSS and VFPCLASSSD internal instruction names to include a Z to match other EVEX instructions.
llvm-svn: 335428
2018-06-24 06:29:50 +00:00
Reid Kleckner 3a2fd1c2f3 Revert r335297 "[X86] Implement more of x86-64 large and medium PIC code models"
MCJIT can't handle R_X86_64_GOT64 yet.

llvm-svn: 335300
2018-06-21 22:19:05 +00:00
Reid Kleckner 3286a6c896 [X86] Commit some comments that weren't in the medium code model patch
llvm-svn: 335298
2018-06-21 21:57:44 +00:00
Reid Kleckner 247fe6aeab [X86] Implement more of x86-64 large and medium PIC code models
Summary:
The large code model allows code and data segments to exceed 2GB, which
means that some symbol references may require a displacement that cannot
be encoded as a displacement from RIP. The large PIC model even relaxes
the assumption that the GOT itself is within 2GB of all code. Therefore,
we need a special code sequence to materialize it:
  .LtmpN:
    leaq .LtmpN(%rip), %rbx
    movabsq $_GLOBAL_OFFSET_TABLE_-.LtmpN, %rax # Scratch
    addq %rax, %rbx # GOT base reg

From that, non-local references go through the GOT base register instead
of being PC-relative loads. Local references typically use GOTOFF
symbols, like this:
    movq extern_gv@GOT(%rbx), %rax
    movq local_gv@GOTOFF(%rbx), %rax

All calls end up being indirect:
    movabsq $local_fn@GOTOFF, %rax
    addq %rbx, %rax
    callq *%rax

The medium code model retains the assumption that the code segment is
less than 2GB, so calls are once again direct, and the RIP-relative
loads can be used to access the GOT. Materializing the GOT is easy:
    leaq _GLOBAL_OFFSET_TABLE_(%rip), %rbx # GOT base reg

DSO local data accesses will use it:
    movq local_gv@GOTOFF(%rbx), %rax

Non-local data accesses will use RIP-relative addressing, which means we
may not always need to materialize the GOT base:
    movq extern_gv@GOTPCREL(%rip), %rax

Direct calls are basically the same as they are in the small code model:
They use direct, PC-relative addressing, and the PLT is used for calls
to non-local functions.

This patch adds reasonably comprehensive testing of LEA, but there are
lots of interesting folding opportunities that are unimplemented.

Reviewers: chandlerc, echristo

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D47211

llvm-svn: 335297
2018-06-21 21:55:08 +00:00
Craig Topper c2696d577b [X86] Use setcc ISD opcode for AVX512 integer comparisons all the way to isel
I don't believe there is any real reason to have separate X86 specific opcodes for vector compares. Setcc has the same behavior just uses a different encoding for the condition code.

I had to change the CondCodeAction for SETLT and SETLE to prevent some transforms from changing SETGT lowering.

Differential Revision: https://reviews.llvm.org/D43608

llvm-svn: 335173
2018-06-20 21:05:02 +00:00
Jessica Paquette 32de26d432 [MachineOutliner] NFC: Remove insertOutlinerPrologue, rename insertOutlinerEpilogue
insertOutlinerPrologue was not used by any target, and prologue-esque code was
beginning to appear in insertOutlinerEpilogue. Refactor that into one function,
buildOutlinedFrame.

This just removes insertOutlinerPrologue and renames insertOutlinerEpilogue.

llvm-svn: 335076
2018-06-19 21:14:48 +00:00
Craig Topper 9fe45d846e [X86] Add all the FMA instructions direclty to the load folding table instead of proxying through X86InstrFMA3Info.
These increases the size of the static tables, but is closer to what we would get if used the autogenerated table directly. This reduces the remaining large deltas between what's in the manual table and what's in the autogenerated table.

llvm-svn: 334915
2018-06-17 18:00:16 +00:00
Craig Topper 29f22d7baa [X86] More additions to the load folding tables based on the autogenerated tables.
Including more additions for NotMemoryFoldable to remove some entries from the autogenerated table.

llvm-svn: 334898
2018-06-16 23:25:50 +00:00
Craig Topper d00e375310 [X86] Add more instructions to the hasUndefRegUpdate list.
Not sure any of these matter today because I don't think we ever produce them with IMPLICIT_DEF as an input. But by listing them we don't be suprised in the future.

llvm-svn: 334867
2018-06-15 22:25:04 +00:00
Craig Topper 1657b7b8d2 [X86] Prevent folding stack reloads into instructions in hasUndefRegUpdate.
An earlier commit prevented folds from the peephole pass by checking for IMPLICIT_DEF. But later in the pipeline IMPLICIT_DEF just becomes and Undef flag on the input register so we need to check for that case too.

llvm-svn: 334848
2018-06-15 17:56:17 +00:00
Craig Topper c8a763ed84 Revert r334802 "[X86] Prevent folding stack reloads with instructions that have an undefined register update."
There's a typo causing the build to fail.

llvm-svn: 334803
2018-06-15 06:15:26 +00:00
Craig Topper 5ec210cc27 [X86] Prevent folding stack reloads with instructions that have an undefined register update.
We want to keep the load unfolded so we can use the same register for both sources to avoid a false dependency.

llvm-svn: 334802
2018-06-15 06:11:36 +00:00
Craig Topper 3c4cc01226 [X86] Add more instructions to the memory folding tables using the autogenerated table as a guide.
I think this covers most of the unmasked vector instructions. We're still missing a lot of the masked instructions.

There are some test changes here because of the new folding support. I don't think these particular cases should be folded because it creates an undef register dependency. I think the changes introduced in r334175 are not handling stack folding. They're only blocking the peephole pass.

llvm-svn: 334800
2018-06-15 05:49:19 +00:00
Craig Topper f43807dd89 [X86] Add 'Z' to the internal names of various EVEX instructions for overall consistency.
llvm-svn: 334785
2018-06-15 04:42:54 +00:00
Craig Topper bfa94d5086 [X86] Fix stale comment in folding tables.
llvm-svn: 334758
2018-06-14 19:28:31 +00:00
Craig Topper 3ffeb41f6b [X86] Add more vector instructions to the memory folding table using the autogenerated table as a guide.
The test cahnge is because we now fold stack reload into RNDSCALE and RNDSCALE can be turned into ROUND by EVEX->VEX.

llvm-svn: 334728
2018-06-14 15:40:31 +00:00
Craig Topper b0742bf30d [X86] Disable load unfolding for a bunch of instruction where unfolding would increase the size of the load.
Found by an audit of the manual table vs the autogenerated table.

llvm-svn: 334726
2018-06-14 15:40:29 +00:00
Craig Topper f7f663e0a9 [X86] Move RCPSSr_Int, RSQRTSSr_Int, SQRTSDr_Int, SQRTSSr_Int to the correct load folding table.
They were in the operand 1 folding table, but their foldable operand is operand 2.

llvm-svn: 334648
2018-06-13 20:03:42 +00:00
Craig Topper ede97c9548 [X86] Remove TB_ALIGN_16 from VEXTRACTF128/VEXTRACTI128 in the memory folding table.
llvm-svn: 334511
2018-06-12 15:48:03 +00:00
Tomasz Krupa f8c7637027 [X86] Block UndefRegUpdate
Summary: Prevent folding of operations with memory loads when one of the sources has undefined register update.

Reviewers: craig.topper

Subscribers: llvm-commits, mike.dvoretsky, ashlykov

Differential Revision: https://reviews.llvm.org/D47621

llvm-svn: 334175
2018-06-07 08:48:45 +00:00
Petar Jovanovic 8cb6a521be Change TII isCopyInstr way of returning arguments(NFC)
Make TII isCopyInstr() return MachineOperands through pointer to pointer
instead via reference.

Patch by Nikola Prica.

Differential Revision: https://reviews.llvm.org/D47364

llvm-svn: 334105
2018-06-06 16:36:30 +00:00
Jessica Paquette aa087327ce [MachineOutliner] NFC - Move intermediate data structures to MachineOutliner.h
This is setting up to fix bug 37573 cleanly.

This moves data structures that are technically both used in some way by the
target and the general-purpose outlining algorithm into MachineOutliner.h. In
particular, the `Candidate` class is of importance.

Before, the outliner passed the locations of `Candidates` to the target, which
would then make some decisions about the prospective outlined function. This
change allows us to just pass `Candidates` along to the target. This will allow
the target to discard `Candidates` that would be considered unsafe before cost
calculation. Thus, we will be able to remove the unsafe candidates described in
the bug without resorting to torching the entire prospective function.

Also, as a side-effect, it makes the outliner a bit cleaner.

https://bugs.llvm.org/show_bug.cgi?id=37573

llvm-svn: 333952
2018-06-04 21:14:16 +00:00
Craig Topper 6b545182fb [X86] Fix typo in comment. NFC
llvm-svn: 333382
2018-05-28 19:33:06 +00:00
Petar Jovanovic c051000b83 [X86][MIPS][ARM] New machine instruction property 'isMoveReg'
This property is needed in order to follow values movement between
registers. This property is used in TII to implement method that
returns true if simple copy like instruction is recognized, along
with source and destination machine operands.

Patch by Nikola Prica.

Differential Revision: https://reviews.llvm.org/D45204

llvm-svn: 333093
2018-05-23 15:28:28 +00:00
Eli Friedman d268bf0a4d Fix unused lambda capture.
llvm-svn: 332686
2018-05-18 02:11:25 +00:00
Eli Friedman 4081a57af7 [MachineOutliner] Count savings from outlining in bytes.
Counting the number of instructions is both unintuitive and inaccurate.
On AArch64, this only affects the generated remarks and certain rare
pseudo-instructions, but it will have a bigger impact on other targets.

Differential Revision: https://reviews.llvm.org/D46921

llvm-svn: 332685
2018-05-18 01:52:16 +00:00
Nicola Zaghen d34e60ca85 Rename DEBUG macro to LLVM_DEBUG.
The DEBUG() macro is very generic so it might clash with other projects.
The renaming was done as follows:
- git grep -l 'DEBUG' | xargs sed -i 's/\bDEBUG\s\?(/LLVM_DEBUG(/g'
- git diff -U0 master | ../clang/tools/clang-format/clang-format-diff.py -i -p1 -style LLVM
- Manual change to APInt
- Manually chage DOCS as regex doesn't match it.

In the transition period the DEBUG() macro is still present and aliased
to the LLVM_DEBUG() one.

Differential Revision: https://reviews.llvm.org/D43624

llvm-svn: 332240
2018-05-14 12:53:11 +00:00
Shiva Chen 801bf7ebbe [DebugInfo] Examine all uses of isDebugValue() for debug instructions.
Because we create a new kind of debug instruction, DBG_LABEL, we need to
check all passes which use isDebugValue() to check MachineInstr is debug
instruction or not. When expelling debug instructions, we should expel
both DBG_VALUE and DBG_LABEL. So, I create a new function,
isDebugInstr(), in MachineInstr to check whether the MachineInstr is
debug instruction or not.

This patch has no new test case. I have run regression test and there is
no difference in regression test.

Differential Revision: https://reviews.llvm.org/D45342

Patch by Hsiangkai Wang.

llvm-svn: 331844
2018-05-09 02:42:00 +00:00
Adrian Prantl 5f8f34e459 Remove \brief commands from doxygen comments.
We've been running doxygen with the autobrief option for a couple of
years now. This makes the \brief markers into our comments
redundant. Since they are a visual distraction and we don't want to
encourage more \brief markers in new code either, this patch removes
them all.

Patch produced by

  for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done

Differential Revision: https://reviews.llvm.org/D46290

llvm-svn: 331272
2018-05-01 15:54:18 +00:00
Andrea Di Biagio d4c58400c5 [X86] Correct spill slot size.
This patch fixes a bug introduced by revision 330778 (originally reviewed at:
https://reviews.llvm.org/D44782), where function isFrameLoadOpcode returned
the wrong number of bytes read for opcodes VMOVSSrm and VMOVSDrm.

This corrects that mistake, and extends the regression test to catch cases where
the dead stores should be removed.

Patch by Jeremy Morse.

Differential Revision: https://reviews.llvm.org/D46256

llvm-svn: 331252
2018-05-01 10:29:38 +00:00
Craig Topper 8a6532ae84 [X86] Rename BNDMOV instructions and hide redundant instruction encoding from the assembler.
Favor the 0x1a encoding for register/register move to match gas.

The instructions used RM and MR in their name along with rr/rm/mr at the end. To make more consistent with other instructions remove the RM/MR and use rr/rm/mr/rr_REV.

Hide the _REV encoding from the assembler but leave it for the disassembler.

llvm-svn: 331101
2018-04-28 06:02:39 +00:00