Commit Graph

50667 Commits

Author SHA1 Message Date
Nikita Popov f6058ff140 [X86] Use SADDSAT/SSUBSAT instead of ADDS/SUBS
Migrate the X86 backend from X86ISD opcodes ADDS and SUBS to generic
ISD opcodes SADDSAT and SSUBSAT. This also improves scodegen for
@llvm.sadd.sat() and @llvm.ssub.sat() intrinsics.

This is a followup to D55787 and part of PR40056.

Differential Revision: https://reviews.llvm.org/D55833

llvm-svn: 349520
2018-12-18 18:28:22 +00:00
Craig Topper 20a6db5a84 [X86] Create PSUBUS from (add (umax X, C), -C)
InstCombine seems to canonicalize or PSUB patter into a max with the cosntant and an add with an inverse of the constant.

This patch recognizes this pattern and turns it into PSUBUS. Future work could improve undef element handling.

Fixes some of PR40053

Differential Revision: https://reviews.llvm.org/D55780

llvm-svn: 349519
2018-12-18 18:26:25 +00:00
Simon Pilgrim 1411917431 [X86][SSE] Don't use 'sign bit select' vXi8 ROTL lowering for constant rotation amounts
Noticed by @spatel on D55747 - we get much better codegen if we use the regular shift expansion.

llvm-svn: 349510
2018-12-18 17:31:11 +00:00
Simon Pilgrim e9effe9744 [X86][SSE] Don't use 'sign bit select' vXi8 ROTL lowering for splat rotation amounts
Noticed by @spatel on D55747 - we get much better codegen if we use the regular shift expansion.

llvm-svn: 349500
2018-12-18 16:02:23 +00:00
Petar Avramovic 0a5e4eb776 [MIPS GlobalISel] Select G_SDIV, G_UDIV, G_SREM and G_UREM
Add support for s64 libcalls for G_SDIV, G_UDIV, G_SREM and G_UREM
and use integer type of correct size when creating arguments for
CLI.lowerCall.
Select G_SDIV, G_UDIV, G_SREM and G_UREM for types s8, s16, s32 and s64
on MIPS32.

Differential Revision: https://reviews.llvm.org/D55651

llvm-svn: 349499
2018-12-18 15:59:51 +00:00
Nikita Popov 665ab08178 [X86] Use UADDSAT/USUBSAT instead of ADDUS/SUBUS
Replace the X86ISD opcodes ADDUS and SUBUS with generic ISD opcodes
UADDSAT and USUBSAT. As a side-effect, this also makes codegen for
the @llvm.uadd.sat and @llvm.usub.sat intrinsics reasonable.

This only replaces use in the X86 backend, and does not move any of
the ADDUS/SUBUS X86 specific combines into generic codegen.

Differential Revision: https://reviews.llvm.org/D55787

llvm-svn: 349481
2018-12-18 13:23:03 +00:00
Petar Avramovic 150fd430f6 [MIPS GlobalISel] ClampScalar G_AND G_OR and G_XOR
Add narrowScalar for G_AND and G_XOR.
Legalize G_AND G_OR and G_XOR for types other then s32 
with clampScalar on MIPS32.

Differential Revision: https://reviews.llvm.org/D55362

llvm-svn: 349475
2018-12-18 11:36:14 +00:00
Luke Cheeseman f57d7d8237 [AArch64] - Return address signing dwarf support
- Reapply changes intially introduced in r343089
- The archtecture info is no longer loaded whenever a DWARFContext is created
- The runtimes libraries (santiziers) make use of the dwarf context classes but
  do not intialise the target info
- The architecture of the object can be obtained without loading the target info
- Adding a method to the dwarf context to get this information and multiplex the
  string printing later on

Differential Revision: https://reviews.llvm.org/D55774

llvm-svn: 349472
2018-12-18 10:37:42 +00:00
Matt Arsenault c94e26c71d AMDGPU: Legalize/regbankselect frame_index
llvm-svn: 349468
2018-12-18 09:46:13 +00:00
Matt Arsenault c0ea221068 AMDGPU: Legalize/regbankselect fma
llvm-svn: 349467
2018-12-18 09:39:56 +00:00
Matt Arsenault e01e7c81f2 AMDGPU/GlobalISel: Legalize/regbankselect fneg/fabs/fsub
llvm-svn: 349463
2018-12-18 09:19:03 +00:00
Simon Pilgrim 8488a44c34 [X86][SSE] Move VSRAI sign extend in reg fold into SimplifyDemandedBits
(VSRAI (VSHLI X, C1), C1) --> X iff NumSignBits(X) > C1

This works better as part of SimplifyDemandedBits than part of the general combine.

llvm-svn: 349462
2018-12-18 09:11:34 +00:00
Simon Pilgrim 26c630f416 [X86][SSE] Replace (VSRLI (VSRAI X, Y), 31) -> (VSRLI X, 31) fold.
This fold was incredibly specific - replace with a SimplifyDemandedBits fold to remove a VSRAI if only the original sign bit is demanded (its guaranteed to stay the same).

Test change is merely a rescheduling.

llvm-svn: 349459
2018-12-18 08:55:47 +00:00
Kristof Beyls e66bc1f756 Introduce control flow speculation tracking pass for AArch64
The pass implements tracking of control flow miss-speculation into a "taint"
register. That taint register can then be used to mask off registers with
sensitive data when executing under miss-speculation, a.k.a. "transient
execution".
This pass is aimed at mitigating against SpectreV1-style vulnarabilities.

At the moment, it implements the tracking of miss-speculation of control
flow into a taint register, but doesn't implement a mechanism yet to then
use that taint register to mask off vulnerable data in registers (something
for a follow-on improvement). Possible strategies to mask out vulnerable
data that can be implemented on top of this are:
- speculative load hardening to automatically mask of data loaded
  in registers.
- using intrinsics to mask of data in registers as indicated by the
  programmer (see https://lwn.net/Articles/759423/).

For AArch64, the following implementation choices are made.
Some of these are different than the implementation choices made in
the similar pass implemented in X86SpeculativeLoadHardening.cpp, as
the instruction set characteristics result in different trade-offs.
- The speculation hardening is done after register allocation. With a
  relative abundance of registers, one register is reserved (X16) to be
  the taint register. X16 is expected to not clash with other register
  reservation mechanisms with very high probability because:
  . The AArch64 ABI doesn't guarantee X16 to be retained across any call.
  . The only way to request X16 to be used as a programmer is through
    inline assembly. In the rare case a function explicitly demands to
    use X16/W16, this pass falls back to hardening against speculation
    by inserting a DSB SYS/ISB barrier pair which will prevent control
    flow speculation.
- It is easy to insert mask operations at this late stage as we have
  mask operations available that don't set flags.
- The taint variable contains all-ones when no miss-speculation is detected,
  and contains all-zeros when miss-speculation is detected. Therefore, when
  masking, an AND instruction (which only changes the register to be masked,
  no other side effects) can easily be inserted anywhere that's needed.
- The tracking of miss-speculation is done by using a data-flow conditional
  select instruction (CSEL) to evaluate the flags that were also used to
  make conditional branch direction decisions. Speculation of the CSEL
  instruction can be limited with a CSDB instruction - so the combination of
  CSEL + a later CSDB gives the guarantee that the flags as used in the CSEL
  aren't speculated. When conditional branch direction gets miss-speculated,
  the semantics of the inserted CSEL instruction is such that the taint
  register will contain all zero bits.
  One key requirement for this to work is that the conditional branch is
  followed by an execution of the CSEL instruction, where the CSEL
  instruction needs to use the same flags status as the conditional branch.
  This means that the conditional branches must not be implemented as one
  of the AArch64 conditional branches that do not use the flags as input
  (CB(N)Z and TB(N)Z). This is implemented by ensuring in the instruction
  selectors to not produce these instructions when speculation hardening
  is enabled. This pass will assert if it does encounter such an instruction.
- On function call boundaries, the miss-speculation state is transferred from
  the taint register X16 to be encoded in the SP register as value 0.

Future extensions/improvements could be:
- Implement this functionality using full speculation barriers, akin to the
  x86-slh-lfence option. This may be more useful for the intrinsics-based
  approach than for the SLH approach to masking.
  Note that this pass already inserts the full speculation barriers if the
  function for some niche reason makes use of X16/W16.
- no indirect branch misprediction gets protected/instrumented; but this
  could be done for some indirect branches, such as switch jump tables.

Differential Revision: https://reviews.llvm.org/D54896

llvm-svn: 349456
2018-12-18 08:50:02 +00:00
Martin Storsjo 8f0cb9c3a8 [AArch64] [MinGW] Allow enabling SEH exceptions
The default still is dwarf, but SEH exceptions can now be enabled
optionally for the MinGW target.

Differential Revision: https://reviews.llvm.org/D55748

llvm-svn: 349451
2018-12-18 08:32:37 +00:00
Kewen Lin 44ace92596 [PowerPC] Exploit power9 new instruction setb
Check the expected pattens feeding to SELECT_CC like:
   (select_cc lhs, rhs,  1, (sext (setcc [lr]hs, [lr]hs, cc2)), cc1)
   (select_cc lhs, rhs, -1, (zext (setcc [lr]hs, [lr]hs, cc2)), cc1)
   (select_cc lhs, rhs,  0, (select_cc [lr]hs, [lr]hs,  1, -1, cc2), seteq)
   (select_cc lhs, rhs,  0, (select_cc [lr]hs, [lr]hs, -1,  1, cc2), seteq)
Further transform the sequence to comparison + setb if hits.

Differential Revision: https://reviews.llvm.org/D53275

llvm-svn: 349445
2018-12-18 07:53:26 +00:00
Craig Topper 1ff7356f96 [X86] Const correct some helper functions X86InstrInfo.cpp. NFC
llvm-svn: 349440
2018-12-18 04:58:05 +00:00
Kewen Lin 3dac1252da [PowerPC] Improve vec_abs on P9
Improve the current vec_abs support on P9, generate ISD::ABS node for vector types,
combine ABS node to VABSD node for some special cases to make use of P9 VABSD* insns,
do custom lowering to vsub(vneg later)+vmax if it has no combination opportunity.

Differential Revision: https://reviews.llvm.org/D54783

llvm-svn: 349437
2018-12-18 03:16:43 +00:00
Simon Pilgrim 7e2975a44c [X86][SSE] Improve immediate vector shift known bits handling.
Convert VSRAI to VSRLI is the sign bit is known zero and improve KnownBits output for all shift instruction.

Fixes the poor codegen comments in D55768.

llvm-svn: 349407
2018-12-17 22:09:47 +00:00
Wouter van Oortmerssen d3c544aa6e [WebAssembly] Fix assembler parsing of br_table.
Summary:
We use `variable_ops` in the tablegen defs to denote the list of
branch targets in `br_table`, but unlike other uses of `variable_ops`
(e.g. call) the these branch targets need to actually be encoded in the
instruction. The existing tables for `variable_ops` cause not operands
to be accepted by the assembly matcher.

Following the example of ARM:
2cc0a7da87/lib/Target/ARM/ARMInstrInfo.td (L550-L555)
we introduce a new operand type to capture this list, and we use the
same {} syntax as ARM as well to differentiate them from regular
integer operands.

Also removed definition and use of TSFlags in tablegen defs, since
`br_table` now has a non-variable_ops immediate operand, so the
previous logic of only the variable_ops arguments being labels didn't
make sense anymore.

Reviewers: dschuff, aheejin, sunfish

Subscribers: javed.absar, sbc100, jgravelle-google, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D55401

llvm-svn: 349405
2018-12-17 22:04:44 +00:00
Craig Topper 8c9d772991 [X86] Add T1MSKC and TZMSK to isDefConvertible used by optimizeCompareInstr.
These seem to have been missed when the other TBM instructions were added.

llvm-svn: 349404
2018-12-17 21:50:06 +00:00
Simon Pilgrim 6b5e0b7b2b [X86][SSE] Split SimplifyDemandedBitsForTargetNode X86ISD::VSRLI/VSRAI handling.
First step towards adding more capable combines to fix comments in D55768.

llvm-svn: 349400
2018-12-17 21:36:17 +00:00
Craig Topper 728cbc0378 Convert (CMP (srl/shl X, C), 0) to (CMP (and X, C'), 0) when only the zero flag is used.
This allows a TEST to be used and can be combined with any AND that may already exist as an input to the shift.

This was already done in EmitTest, but was easily tricked by multiple uses because the setcc might be used by multiple instructions. Once the SETCC and users are legalized then we can look for the shift to be used by a single CMP, but the CMP itself can have multiple users.

This appears to fix the case in PR39968.

llvm-svn: 349385
2018-12-17 20:02:16 +00:00
Simon Pilgrim 9274f17a5e [TargetLowering] Add DemandedElts mask to SimplifyDemandedBits (PR40000)
This is an initial patch to add the necessary support for a DemandedElts argument to SimplifyDemandedBits, more closely matching computeKnownBits and to help improve vector codegen.

I've added only a small amount of the changes necessary to get at least one test to update - a lot more can be done but I'd like to add these methodically with proper test coverage, at the same time the hope is to slowly move some/all of SimplifyDemandedVectorElts into SimplifyDemandedBits as well.

Differential Revision: https://reviews.llvm.org/D55768

llvm-svn: 349374
2018-12-17 18:43:43 +00:00
Tim Northover 256a16d031 FastIsel: take care to update iterators when removing instructions.
We keep a few iterators into the basic block we're selecting while
performing FastISel. Usually this is fine, but occasionally code wants
to remove already-emitted instructions. When this happens we have to be
careful to update those iterators so they're not pointint at dangling
memory.

llvm-svn: 349365
2018-12-17 17:25:53 +00:00
Petar Avramovic f9c9bc09ab [MIPS GlobalISel] Remove switch statement (fix r349346 for MSVC)
Temporarily remove switch statement without any case labels 
in function legalizeCustom in order to fix r349346 for MSVC.

llvm-svn: 349356
2018-12-17 15:12:53 +00:00
Tim Northover ae3b66b7b0 ARM: use acquire/release instruction variants when available.
These features (fairly) recently got split out into their own feature, so we
should make CodeGen use them when available. The main change here is that the
check used to be based on the triple, but now it's based on CPU features.

llvm-svn: 349355
2018-12-17 15:05:32 +00:00
Petar Avramovic b8276f2280 [MIPS GlobalISel] Lower G_UADDE and narrowScalar G_ADD
Lower G_UADDE and legalize G_ADD using narrowScalar on MIPS32.

Differential Revision: https://reviews.llvm.org/D54580

llvm-svn: 349346
2018-12-17 12:31:07 +00:00
Alexandros Lamprineas 490ae11717 [AArch64] Re-run load/store optimizer after aggressive tail duplication
The Load/Store Optimizer runs before Machine Block Placement. At O3 the
Tail Duplication Threshold is set to 4 instructions and this can create
new opportunities for the Load/Store Optimizer. It seems worthwhile to
run it once again.

llvm-svn: 349338
2018-12-17 10:45:43 +00:00
Craig Topper fa4907d671 [X86] Fix bad operand lookup for cmov introduced in r349315
The CC is operand 2 not operand 3.

llvm-svn: 349330
2018-12-17 06:40:35 +00:00
Simon Pilgrim d0c9e43b1c [X86] Pull out constant splat rotation detection.
We had 3 different approaches - consistently use getTargetConstantBitsFromNode and allow undef elts.

llvm-svn: 349319
2018-12-16 19:46:04 +00:00
Craig Topper 10f8892837 [X86] Remove truncation handling from EmitTest. Replace it with a DAG combine.
I'd like to try to move a lot of the flag matching out of EmitTest and push it to isel or isel preprocessing. This is a step towards that.

The test-shrink-bug.ll changie is an improvement because we are no longer interfering with test shrink handling in isel.

The pr34137.ll change is a regression, but the IR came from -O0 and was not reduced by InstCombine. So it contains a lot of redundancies like duplicate loads that made it combine poorly.

llvm-svn: 349315
2018-12-16 18:35:55 +00:00
Sanjay Patel 13ac2f15b0 [x86] increment/decrement constant vector with min/max in vsetcc lowering (PR39859)
This is part of fixing PR39859:
https://bugs.llvm.org/show_bug.cgi?id=39859

We have a crippled vector ISA, so we have to invert a typical fold and create min/max here.

As discussed in the bug report, we can probably do better by using saturating subtract when 
it's available, but we should have this improvement for the min/max patterns regardless.

Alive proofs:
https://rise4fun.com/Alive/zsf
https://rise4fun.com/Alive/Qrl

Differential Revision: https://reviews.llvm.org/D55515

llvm-svn: 349304
2018-12-16 15:05:48 +00:00
Simon Pilgrim 52c982406e [X86] Begin cleaning up combineOr -> SHLD/SHRD. NFCI.
In preparation for converting to funnel shifts.

llvm-svn: 349286
2018-12-15 21:11:49 +00:00
Simon Pilgrim ef7b5949e5 [X86] Lower to SHLD/SHRD on slow machines for optsize
Use consistent rules for when to lower to SHLD/SHRD for slow machines - fixes a weird issue where funnel shift gets expanded but then X86ISelLowering's combineOr sees the optsize and combines to SHLD/SHRD, but now with the modulo amount guard......

llvm-svn: 349285
2018-12-15 19:43:44 +00:00
Simon Pilgrim 9831d4058c Fix -Wunused-variable warning. NFCI.
llvm-svn: 349265
2018-12-15 12:25:22 +00:00
Florian Hahn abe32c9125 [SILoadStoreOptimizer] Use std::abs to avoid truncation.
Using regular abs() causes the following warning

error: absolute value function 'abs' given an argument of type 'int64_t' (aka 'long') but has parameter of type 'int' which may cause truncation of value [-Werror,-Wabsolute-value]
        (uint32_t)abs(Dist) > MaxDist) {
                  ^
lib/Target/AMDGPU/SILoadStoreOptimizer.cpp:1369:19: note: use function 'std::abs' instead

which causes a bot to fail:
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/18284/steps/bootstrap%20clang/logs/stdio

llvm-svn: 349224
2018-12-15 01:32:58 +00:00
Craig Topper 1fc257d97f [X86] Rename hasNoSignedComparisonUses to hasNoSignFlagUses. Add the instruction that only modify the O flag to the waiver list.
The only caller of this turns CMP with 0 into TEST. CMP with 0 and TEST both set OF to 0 so we should have no issues with instructions that only use OF.

Though I don't think there's any reason we would read just OF after a compare with 0 anyway. So this probably isn't an observable change.

llvm-svn: 349223
2018-12-15 01:07:19 +00:00
Craig Topper 5c304eac41 [X86] Make hasNoCarryFlagUses/hasNoSignedComparisonUses take an SDValue that indicates which result is the flag result. NFCI
hasNoCarryFlagUses hardcoded that the flag result is 1 and used that to filter which uses were of interest. hasNoSignedComparisonUses just assumes the only result is flags and checks whether any user of the node is a CopyToReg instruction.

After this patch we now do a result number check in both and rely on the caller to provide the result number.

This shouldn't change behavior it was just an odd difference between the two functions that I noticed.

llvm-svn: 349222
2018-12-15 01:07:16 +00:00
Artem Belevich 6d74bd638a [NVPTX] Lower instructions that expand into libcalls.
The change is an effort to split and refactor abandoned
D34708 into smaller parts.

Here the behaviour of unsupported instructions is changed
to match the behaviour of explicit intrinsics calls.
Currently LLVM crashes with:
> Assertion getInstruction() && "Not a call or invoke instruction!" failed.

With this patch LLVM produces a more sensible error message:
> Cannot select: ... i32 = ExternalSymbol'__foobar'

Author: Denys Zariaiev <denys.zariaiev@gmail.com>

Differential Revision: https://reviews.llvm.org/D55145

llvm-svn: 349213
2018-12-14 23:53:06 +00:00
Krzysztof Parzyszek 26d994f56e [Hexagon] Add patterns for shifts of v2i16
This fixes https://llvm.org/PR39983.

llvm-svn: 349202
2018-12-14 22:33:48 +00:00
Krzysztof Parzyszek c0fc0a9775 [Hexagon] Use IMPLICIT_DEF to any-extend 32-bit values to 64 bits
llvm-svn: 349199
2018-12-14 22:05:44 +00:00
Farhana Aleen ce095c564a [AMDGPU] Promote constant offset to the immediate by finding a new base with 13bit constant offset from the nearby instructions.
Summary: Promote constant offset to immediate by recomputing the relative 13bit offset from nearby instructions.
 E.g.
  s_movk_i32 s0, 0x1800
  v_add_co_u32_e32 v0, vcc, s0, v2
  v_addc_co_u32_e32 v1, vcc, 0, v6, vcc

  s_movk_i32 s0, 0x1000
  v_add_co_u32_e32 v5, vcc, s0, v2
  v_addc_co_u32_e32 v6, vcc, 0, v6, vcc
  global_load_dwordx2 v[5:6], v[5:6], off
  global_load_dwordx2 v[0:1], v[0:1], off
  =>
  s_movk_i32 s0, 0x1000
  v_add_co_u32_e32 v5, vcc, s0, v2
  v_addc_co_u32_e32 v6, vcc, 0, v6, vcc
  global_load_dwordx2 v[5:6], v[5:6], off
  global_load_dwordx2 v[0:1], v[5:6], off offset:2048

Author: FarhanaAleen

Reviewed By: arsenm, rampitec

Subscribers: llvm-commits, AMDGPU

Differential Revision: https://reviews.llvm.org/D55539

llvm-svn: 349196
2018-12-14 21:13:14 +00:00
Evandro Menezes ea9d90083f [AArch64] Simplify the scheduling predicates (NFC)
The instruction encodings make it unnecessary to distinguish extended W-form
from X-form instructions.

llvm-svn: 349185
2018-12-14 20:04:58 +00:00
Ehsan Amiri de1742c284 NFC. Adding an empty line to test the updated commit credentials.
llvm-svn: 349158
2018-12-14 16:19:02 +00:00
Diana Picus 02c8343c75 [ARM GlobalISel] Thumb2: casts between int and ptr
Mark as legal and add tests. Nothing special to do.

llvm-svn: 349147
2018-12-14 13:45:38 +00:00
Diana Picus 813af0d283 [ARM GlobalISel] Minor refactoring. NFCI
Refactor the ARMInstructionSelector to cache some opcodes in the
constructor instead of checking all the time if we're in ARM or Thumb
mode.

llvm-svn: 349143
2018-12-14 12:37:24 +00:00
Diana Picus 14dc3b2959 [ARM GlobalISel] Allow simple binary ops in Thumb2
Mark G_ADD, G_SUB, G_MUL, G_AND, G_OR and G_XOR as legal for both ARM
and Thumb2.

Extract the legalizer tests for these opcodes into another file.

Add tests for the instruction selector.

llvm-svn: 349142
2018-12-14 11:58:14 +00:00
Craig Topper 257ce3871e [DAGCombiner][X86] Prevent visitSIGN_EXTEND from returning N when (sext (setcc)) already has the target desired type for the setcc
Summary:
If the setcc already has the target desired type we can reach the getSetCC/getSExtOrTrunc after the MatchingVecType check with the exact same types as the nodes we started with. This causes those causes VsetCC to be CSEd to N0 and the getSExtOrTrunc will CSE to N. When we return N, the caller will think that meant we called CombineTo and did our own worklist management. But that's not what happened. This prevents target hooks from being called for the node.

To fix this, I've now returned SDValue if the setcc is already the desired type. But to avoid some regressions in X86 I've had to disable one of the target combines that wasn't being reached before in the case of a (sext (setcc)). If we get vector widening legalization enabled that entire function will be deleted anyway so hopefully this is only for the short term.

Reviewers: RKSimon, spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D55459

llvm-svn: 349137
2018-12-14 08:28:24 +00:00
Craig Topper 178abc59ac [X86] Demote EmitTest to a helper function of EmitCmp. Route all callers except EmitCmp through EmitCmp.
This requires the two callers to manifest a 0 to make EmitCmp call EmitTest.

I'm looking into changing how we combine TEST and flag setting instructions to not be part of lowering. And instead be part of DAG combine or isel. Which will mean EmitTest will probably become gutted and maybe disappear entirely.

llvm-svn: 349094
2018-12-13 23:55:30 +00:00
Evandro Menezes 6fe51ac973 [AArch64] Fix Exynos predicates (NFC)
Fix the logic in the definition of the `ExynosShiftExPred` as a more
specific version of `ExynosShiftPred`.  But, since `ExynosShiftExPred` is
not used yet, this change has NFC.

llvm-svn: 349091
2018-12-13 23:19:46 +00:00
Aakanksha Patil bc568766b2 Revert r348971: [AMDGPU] Support for "uniform-work-group-size" attribute
This patch breaks RADV (and probably RadeonSI as well)

llvm-svn: 349084
2018-12-13 21:23:12 +00:00
Matt Arsenault 934e534c47 AMDGPU/GlobalISel: Legalize/regbankselect block_addr
llvm-svn: 349081
2018-12-13 20:34:15 +00:00
Mircea Trofin 41c729e78e [llvm] Address base discriminator overflow in X86DiscriminateMemOps
Summary:
Macros are expanded on a single line. In case of large expansions,
with sufficiently many instructions with memory operands (and when
-fdebug-info-for-profiling is requested), we may be unable to generate
new base discriminator values - new values overflow (base
discriminators may not be larger than 2^12).

This CL warns instead of asserting in such a case. A subsequent CL
will add APIs to check for overflow before creating new debug info.

See https://bugs.llvm.org/show_bug.cgi?id=39890

Reviewers: davidxl, wmi, gbedwell

Reviewed By: davidxl

Subscribers: aprantl, llvm-commits

Differential Revision: https://reviews.llvm.org/D55643

llvm-svn: 349075
2018-12-13 19:40:59 +00:00
Simon Pilgrim b5aaa673c6 [X86][SSE] Add SSE vector imm/var shift support to SimplifyDemandedVectorEltsForTargetNode
llvm-svn: 349057
2018-12-13 16:39:29 +00:00
Simon Pilgrim b0b2f1503a [X86][SSE] Fix all remaining modulo vector rotation amounts (PR38243)
There's still a couple of minor SimplifyDemandedElts regressions in some of the shift amount splats that will be fixed in future patches.

llvm-svn: 349052
2018-12-13 15:50:31 +00:00
Daniel Cederman 77611426e1 [Sparc] Add membar assembler tags
Summary: The Sparc V9 membar instruction can enforce different types of
memory orderings depending on the value in its immediate field.  In the
architectural manual the type is selected by combining different assembler
tags into a mask. This patch adds support for these tags.

Reviewers: jyknight, venkatra, brad

Reviewed By: jyknight

Subscribers: fedor.sergeev, jrtc27, jfb, llvm-commits

Differential Revision: https://reviews.llvm.org/D53491

llvm-svn: 349048
2018-12-13 15:29:12 +00:00
Simon Pilgrim ba91ff4a86 [X86][SSE] Fix modulo rotation amounts for v8i16/v16i16/v4i32 (PR38243)
llvm-svn: 349047
2018-12-13 15:23:09 +00:00
Daniel Cederman b5d284408e [Sparc] Use float register for integer constrained with "f" in inline asm
Summary:
Constraining an integer value to a floating point register using "f"
causes an llvm_unreachable to trigger. This patch allows i32 integers
to be placed in a single precision float register and i64 integers to
be placed in a double precision float register. This matches the behavior
of GCC.

For other types the llvm_unreachable is removed to instead trigger an
error message that points out the offending line.

Reviewers: jyknight, venkatra

Reviewed By: jyknight

Subscribers: eraman, fedor.sergeev, jrtc27, llvm-commits

Differential Revision: https://reviews.llvm.org/D51614

llvm-svn: 349045
2018-12-13 15:13:29 +00:00
Jinsong Ji c7b43b94ce [PowerPC][NFC] Sorting out Pseudo related classes to avoid confusion
There are several Pseudo in PowerPC backend. 
eg:

* ISel Pseudo-instructions , which has let usesCustomInserter=1 in td 
ExpandISelPseudos -> EmitInstrWithCustomInserter will deal with them.
* Post-RA pseudo instruction, which has let isPseudo = 1 in td, or Standard pseudo (SUBREG_TO_REG,COPY etc.) 
ExpandPostRAPseudos -> expandPostRAPseudo will expand them
* Multi-instruction pseudo operations will expand them PPCAsmPrinter::EmitInstruction
* Pseudo instruction in CodeEmitter, which has encoding of 0.

Currently, in td files, especially PPCInstrVSX.td, 
we did not distinguish Post-RA pseudo instruction and Pseudo instruction in CodeEmitter very clearly.

This patch is to

* Rename Pseudo<> class to PPCEmitTimePseudo, which means encoding of 0 in CodeEmitter
* Introduce new class PPCPostRAExpPseudo <> for previous PostRA Pseudo
* Introduce new class PPCCustomInserterPseudo <> for previous Isel Pseudo

Differential Revision: https://reviews.llvm.org/D55143

llvm-svn: 349044
2018-12-13 15:12:57 +00:00
Simon Pilgrim 7c84f7ae3a [X86][SSE] Merge the vXi16/vXi32 vector rotation expansion cases. NFCI.
Merged the repeated code into a single if().

llvm-svn: 349040
2018-12-13 14:51:28 +00:00
Jonas Paulsson e79b1b986d [SystemZ] Pass copy-hinted regs first from getRegAllocationHints().
When computing register allocation hints for a GRX32Bit register, make sure
that any of the hinted registers that are also copy hints are returned first
in the list.

Review: Ulrich Weigand.
llvm-svn: 349037
2018-12-13 14:37:05 +00:00
Simon Pilgrim 320fd7383f [X86][BWI] Don't custom lower vXi8 rotations.
We always expand to shifts anyhow - test changes are just different scheduling only.

llvm-svn: 349034
2018-12-13 13:44:33 +00:00
Chen Zheng 9c6fa536e0 [PowerPC] intrinsic llvm.eh.sjlj.setjmp should not have flag isBarrier.
Differential Revision: https://reviews.llvm.org/D55499

llvm-svn: 349029
2018-12-13 12:25:20 +00:00
Simon Pilgrim ab973a45b9 [DAGCombine] Moved X86 rotate_amount % bitwidth == 0 early out to DAGCombiner
Remove common code from custom lowering (code is still safe if somehow a zero value gets used).

llvm-svn: 349028
2018-12-13 12:23:32 +00:00
Diana Picus 99cd644b6c [ARM GlobalISel] Support exts and truncs for Thumb2
Mark G_SEXT, G_ZEXT and G_ANYEXT to 32 bits as legal and add support for
them in the instruction selector. This uses handwritten code again
because the patterns that are generated with TableGen are tuned for what
the DAG combiner would produce and not for simple sext/zext nodes.
Luckily, we only need to update the opcodes to use the Thumb2 variants,
everything else can be reused from ARM.

llvm-svn: 349026
2018-12-13 12:06:54 +00:00
Simon Pilgrim 77fc551d1a [TargetLowering] Add ISD::ROTL/ROTR vector expansion
Move existing rotation expansion code into TargetLowering and set it up for vectors as well.

Ideally this would share more of the funnel shift expansion, but we handle the shift amount modulo quite differently at the moment.

Begun removing x86 vector rotate custom lowering to use the expansion.

llvm-svn: 349025
2018-12-13 11:20:48 +00:00
Alex Bradbury 919f5fb8ca [RISCV] Add support for the various RISC-V FMA instruction variants
Adds support for the various RISC-V FMA instructions (fmadd, fmsub, fnmsub, fnmadd).

The criteria for choosing whether a fused add or subtract is used, as well as
whether the product is negated or not, is whether some of the arguments to the
llvm.fma.* intrinsic are negated or not. In the tests, extraneous fadd
instructions were added to avoid the negation being performed using a xor
trick, which prevented the proper FMA forms from being selected and thus
tested.

The FMA instruction patterns might seem incorrect (e.g., fnmadd: -rs1 * rs2 -
rs3), but they should be correct. The misleading names were inherited from
MIPS, where the negation happens after computing the sum.

The llvm.fmuladd.* intrinsics still do not generate RISC-V FMA instructions,
as that depends on TargetLowering::isFMAFasterthanFMulAndFAdd.

Some comments in the test files about what type of instructions are there
tested were updated, to better reflect the current content of those test
files.

Differential Revision: https://reviews.llvm.org/D54205
Patch by Luís Marques.

llvm-svn: 349023
2018-12-13 10:49:05 +00:00
Arnaud A. de Grandmaison dfe861087d [AArch64] Catch some more CMN opportunities.
Fixes https://bugs.llvm.org/show_bug.cgi?id=33486

llvm-svn: 349022
2018-12-13 10:31:32 +00:00
Matt Arsenault 577b9fc543 AMDGPU/GlobalISel: Legalize f64 fadd/fmul
llvm-svn: 349014
2018-12-13 08:27:48 +00:00
Matt Arsenault f38f483bef AMDGPU/GlobalISel: RegBankSelect some simple operations
llvm-svn: 349012
2018-12-13 08:23:51 +00:00
Craig Topper a048d58de7 [X86] Remove assert leftover from when i1 was a legal type. Add more accurate assert. NFC
llvm-svn: 349007
2018-12-13 06:14:25 +00:00
Stanislav Mekhanoshin d933c2ced7 [AMDGPU] Fix build failure, second attempt
Some compilers complain that variable is captured and some
complain when it is not. Switch to [&].

llvm-svn: 349006
2018-12-13 05:52:11 +00:00
Stanislav Mekhanoshin 5225746e03 [AMDGPU] Fix build failure
Fixed error 'lambda capture 'CondReg' is not required to be captured
for this use'.

llvm-svn: 349005
2018-12-13 05:21:25 +00:00
Stanislav Mekhanoshin 6071e1aa58 [AMDGPU] Simplify negated condition
Optimize sequence:

  %sel = V_CNDMASK_B32_e64 0, 1, %cc
  %cmp = V_CMP_NE_U32 1, %1
  $vcc = S_AND_B64 $exec, %cmp
  S_CBRANCH_VCC[N]Z
=>
  $vcc = S_ANDN2_B64 $exec, %cc
  S_CBRANCH_VCC[N]Z

It is the negation pattern inserted by DAGCombiner::visitBRCOND() in the
rebuildSetCC().

Differential Revision: https://reviews.llvm.org/D55402

llvm-svn: 349003
2018-12-13 03:17:40 +00:00
Craig Topper d1c61861dd [X86] Don't emit MULX by default with BMI2
MULX has somewhat improved register allocation constraints compared to the legacy MUL instruction. Both output registers are encoded instead of fixed to EAX/EDX, but EDX is used as input. It also doesn't touch flags. Unfortunately, the encoding is longer.

Prefering it whenever BMI2 is enabled is probably not optimal. Choosing it should somehow be a function of register allocation constraints like converting adds to three address. gcc and icc definitely don't pick MULX by default. Not sure what if any rules they have for using it.

Differential Revision: https://reviews.llvm.org/D55565

llvm-svn: 348975
2018-12-12 21:21:31 +00:00
Aakanksha Patil 729309cc89 [AMDGPU] Support for "uniform-work-group-size" attribute
Updated the annotate-kernel-features pass to support the propagation of uniform-work-group attribute from the kernel to the called functions. Once this pass is run, all kernels, even the ones which initially did not have the attribute, will be able to indicate weather or not they have uniform work group size depending on the value of the attribute. 

Differential Revision: https://reviews.llvm.org/D50200

llvm-svn: 348971
2018-12-12 20:49:17 +00:00
Scott Linder f5b36e56fb [AMDGPU] Emit MessagePack HSA Metadata for v3 code object
Continue to present HSA metadata as YAML in ASM and when output by tools
(e.g. llvm-readobj), but encode it in Messagepack in the code object.

Differential Revision: https://reviews.llvm.org/D48179

llvm-svn: 348963
2018-12-12 19:39:27 +00:00
Craig Topper 4937adf75f [X86] Emit SBB instead of SETCC_CARRY from LowerSELECT. Break false dependency on the SBB input.
I'm hoping we can just replace SETCC_CARRY with SBB. This is another step towards that.

I've explicitly used zero as the input to the setcc to avoid a false dependency that we've had with the SETCC_CARRY. I changed one of the patterns that used NEG to instead use an explicit compare with 0 on the LHS. We needed the zero anyway to avoid the false dependency. The negate would clobber its input register. By using a CMP we can avoid that which could be useful.

Differential Revision: https://reviews.llvm.org/D55414

llvm-svn: 348959
2018-12-12 19:20:21 +00:00
Simon Pilgrim eb508f8ccb [SelectionDAG] Add a generic isSplatValue function
This patch introduces a generic function to determine whether a given vector type is known to be a splat value for the specified demanded elements, recursing up the DAG looking for BUILD_VECTOR or VECTOR_SHUFFLE splat patterns.

It also keeps track of the elements that are known to be UNDEF - it returns true if all the demanded elements are UNDEF (as this may be useful under some circumstances), so this needs to be handled by the caller.

A wrapper variant is also provided that doesn't take the DemandedElts or UndefElts arguments for cases where we just want to know if the SDValue is a splat or not (with/without UNDEFS).

I had hoped to completely remove the X86 local version of this function, but I'm seeing some regressions in shift/rotate codegen that will take a little longer to fix and I hope to get this in sooner so I can continue work on PR38243 which needs more capable splat detection.

Differential Revision: https://reviews.llvm.org/D55426

llvm-svn: 348953
2018-12-12 18:32:29 +00:00
Artem Belevich f802b9324a [NVPTX] do not rely on cached subtarget info.
If a module has function references, but no functions
themselves, we may end up never calling runOnMachineFunction
and therefore would never initialize nvptxSubtarget field
which would eventually cause a crash.

Instead of relying on nvptxSubtarget being initialized by
one of the methods, retrieve subtarget info directly.

Differential Revision: https://reviews.llvm.org/D55580

llvm-svn: 348952
2018-12-12 18:31:04 +00:00
Sanjay Patel 44eaa492b8 [x86] allow 8-bit adds to be promoted by convertToThreeAddress() to form LEA
This extends the code that handles 16-bit add promotion to form LEA to also allow 8-bit adds. 
That allows us to combine add ops with register moves and save some instructions. This is 
another step towards allowing add truncation in generic DAGCombiner (see D54640).

Differential Revision: https://reviews.llvm.org/D55494

llvm-svn: 348946
2018-12-12 17:58:27 +00:00
Neil Henning 76504a4c5e [AMDGPU] Extend the SI Load/Store optimizer to combine more things.
I've extended the load/store optimizer to be able to produce dwordx3
loads and stores, This change allows many more load/stores to be combined,
and results in much more optimal code for our hardware.

Differential Revision: https://reviews.llvm.org/D54042

llvm-svn: 348937
2018-12-12 16:15:21 +00:00
Simon Atanasyan fa020082e4 [mips] Enable using of integrated assembler in all cases.
llvm-svn: 348934
2018-12-12 15:32:03 +00:00
Piotr Sobczak 3732b4ce25 [AMDGPU] Set metadata access for explicit section
Summary:
This patch provides a means to set Metadata section kind
for a global variable, if its explicit section name is
prefixed with ".AMDGPU.metadata."
This could be useful to make the global variable go to
an ELF section without any section flags set.

Reviewers: dstuttard, tpr, kzhuravl, nhaehnle, t-tye

Reviewed By: dstuttard, kzhuravl

Subscribers: llvm-commits, arsenm, jvesely, wdng, yaxunl, t-tye

Differential Revision: https://reviews.llvm.org/D55267

llvm-svn: 348922
2018-12-12 11:20:04 +00:00
Diana Picus 59720b422a [ARM GlobalISel] Select load/store for Thumb2
Unfortunately we can't use TableGen for this because it doesn't yet
support predicates on the source pattern root. Therefore, add a bit of
handwritten code to the instruction selector to handle the most basic
cases.

Also mark them as legal and extract their legalizer test cases to a new
test file.

llvm-svn: 348920
2018-12-12 10:32:15 +00:00
Jonas Paulsson 896775c2d3 [SystemZ] Minor cleanup of SchedModels
Some fixes of a few InstRWs for z13 and z14.

Review: Ulrich Weigand
llvm-svn: 348917
2018-12-12 08:26:24 +00:00
Craig Topper 1fe466689b [X86] Combine vpmovdw+vpacksswb into vpmovdb.
This is similar to the combine we already have for vpmovdw+vpackuswb.

llvm-svn: 348910
2018-12-12 05:56:01 +00:00
Mandeep Singh Grang 802dc40f41 [COFF, ARM64] Emit COFF function header
Summary:
Emit COFF header when printing out the function. This is important as the
header contains two important pieces of information: the storage class for the
symbol and the symbol type information. This bit of information is required for
the linker to correctly identify the type of symbol that it is dealing with.

This patch mimics X86 and ARM COFF behavior for function header emission.

Reviewers: rnk, mstorsjo, compnerd, TomTan, ssijaric

Reviewed By: mstorsjo

Subscribers: dmajor, javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D55535

llvm-svn: 348875
2018-12-11 18:36:14 +00:00
Craig Topper b51283bfd7 Fix not correct imm operand assertion for SUB32ri in X86CondBrFolding::analyzeCompare
Summary:
When doing X86CondBrFolding::analyzeCompare, it will meet the SUB32ri instruction as below to use the global address for its operand,
  %733:gr32 = SUB32ri %62:gr32(tied-def 0), @img2buf_normal, implicit-def $eflags
  JNE_1 %bb.41, implicit $eflags

so the assertion "assert(MI.getOperand(ValueIndex).isImm() && "Expecting Imm operand")" is not correct and change the assert to if make X86CondBrFolding::analyzeCompare return false as not finding the compare for this

Patch by Jianping Chen

Reviewers: smaslov, LuoYuanke, liutianle, Jianping

Reviewed By: Jianping

Subscribers: lebedev.ri, llvm-commits

Differential Revision: https://reviews.llvm.org/D54250

llvm-svn: 348853
2018-12-11 15:32:14 +00:00
Sanjay Patel 05e36982dd [x86] clean up code for converting 16-bit ops to LEA; NFC
As discussed in D55494, we want to extend this to handle 8-bit
ops too, but that could be extended further to enable this on
32-bit systems too.

llvm-svn: 348851
2018-12-11 15:29:40 +00:00
Sanjay Patel 9765ba5f86 [x86] remove dead code for 16-bit LEA formation; NFC
As discussed in:
D55494
...this code has been disabled/dead for a long time (the code references
Athlon and Pentium 4), and there's almost no chance that it will be used 
given the last decade of uarch evolution. Also, in SDAG we promote 16-bit 
ops to 32-bit, so there's almost no way to test this code any more.

llvm-svn: 348845
2018-12-11 14:05:03 +00:00
Martell Malone 0b3ddec7ed [PPC][NFC] store operands are dst not src
Differential Revision: https://reviews.llvm.org/D55502

llvm-svn: 348826
2018-12-11 03:14:56 +00:00
Heejin Ahn be5e5874f6 [WebAssembly] Add '.eventtype' directive support
Summary:
This patch supports `.eventtype` directive printing and parsing in the
same syntax with `.functype`.

Reviewers: aardappel, sbc100

Subscribers: dschuff, sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D55353

llvm-svn: 348818
2018-12-11 01:11:04 +00:00
Heejin Ahn 21d45a2c98 [WebAssembly] TargetStreamer cleanup (NFC)
Summary:
- Unify mixed argument names (`Symbol` and `Sym`) to `Sym`
- Changed `MCSymbolWasm*` argument of `emit***` functions to `const
  MCSymbolWasm*`. It seems not very intuitive that emit function in the
  streamer modifies symbol contents.
- Moved empty function bodies to the header
- clang-format

Reviewers: aardappel, dschuff, sbc100

Subscribers: jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D55347

llvm-svn: 348816
2018-12-11 00:53:59 +00:00
Aditya Nandakumar cef44a2342 [GISel]: Refactor MachineIRBuilder to allow passing additional parameters to build Instrs
https://reviews.llvm.org/D55294

Previously MachineIRBuilder::buildInstr used to accept variadic
arguments for sources (which were either unsigned or
MachineInstrBuilder). While this worked well in common cases, it doesn't
allow us to build instructions that have multiple destinations.
Additionally passing in other optional parameters in the end (such as
flags) is not possible trivially. Also a trivial call such as

B.buildInstr(Opc, Reg1, Reg2, Reg3)
can be interpreted differently based on the opcode (2defs + 1 src for
unmerge vs 1 def + 2srcs).
This patch refactors the buildInstr to

buildInstr(Opc, ArrayRef<DstOps>, ArrayRef<SrcOps>)
where DstOps and SrcOps are typed unions that know how to add itself to
MachineInstrBuilder.
After this patch, most invocations would look like

B.buildInstr(Opc, {s32, DstReg}, {SrcRegs..., SrcMIBs..});
Now all the other calls (such as buildAdd, buildSub etc) forward to
buildInstr. It also makes it possible to build instructions with
multiple defs.
Additionally in a subsequent patch, we should make it possible to add
flags directly while building instructions.
Additionally, the main buildInstr method is now virtual and other
builders now only have to override buildInstr (for say constant
folding/cseing) is straightforward.

Also attached here (https://reviews.llvm.org/F7675680) is a clang-tidy
patch that should upgrade the API calls if necessary.

llvm-svn: 348815
2018-12-11 00:48:50 +00:00
Krzysztof Parzyszek 9f003f9262 [Hexagon] Couple of fixes in optimize addressing mode
- Check if an operand is an immediate before calling getImm. Some operands
  that take constant values can actually have global symbols or other
  constant expressions.
- When a load-constant instruction can be folded into users, make sure to
  only delete it when all users have been successfully converted.

llvm-svn: 348802
2018-12-10 21:56:04 +00:00
Krzysztof Parzyszek c1b2d5905a Revert "[Hexagon] Check if operand is an immediate before getImm"
This reverts r348787. The patch wasn't quite correct.

llvm-svn: 348792
2018-12-10 19:30:08 +00:00
Amara Emerson 5ec146046c [GlobalISel] Restrict G_MERGE_VALUES capability and replace with new opcodes.
This patch restricts the capability of G_MERGE_VALUES, and uses the new
G_BUILD_VECTOR and G_CONCAT_VECTORS opcodes instead in the appropriate places.

This patch also includes AArch64 support for selecting G_BUILD_VECTOR of <4 x s32>
and <2 x s64> vectors.

Differential Revisions: https://reviews.llvm.org/D53629

llvm-svn: 348788
2018-12-10 18:44:58 +00:00
Krzysztof Parzyszek c6e9380a56 [Hexagon] Check if operand is an immediate before getImm
llvm-svn: 348787
2018-12-10 18:39:47 +00:00
Krzysztof Parzyszek 914f2d1c46 [Hexagon] Add patterns for any_extend from i1 and short vectors of i1
llvm-svn: 348785
2018-12-10 18:36:06 +00:00
Sanjay Patel 134f56e702 [x86] fix formatting; NFC
This should really be generalized to allow increment and/or
we should replace it by using ISD::matchUnaryPredicate().
See D55515 for context.

llvm-svn: 348776
2018-12-10 17:23:44 +00:00
Evandro Menezes 53f0d41dc4 [AArch64] Refactor the Exynos scheduling predicates
Refactor the scheduling predicates based on `MCInstPredicate`.  In this
case, for the Exynos processors.

Differential revision: https://reviews.llvm.org/D55345

llvm-svn: 348774
2018-12-10 17:17:26 +00:00
Neil Henning e448351b77 [AMDGPU] Change the l1 flush instruction for AMDPAL/MESA3D.
This commit changes which l1 flush instruction is used for AMDPAL and
MESA3d workloads to flush the entire l1 cache instead of just the
volatile lines.

Differential Revision: https://reviews.llvm.org/D55367

llvm-svn: 348771
2018-12-10 16:35:53 +00:00
Evandro Menezes 1ec1a0d342 [AArch64] Refactor the scheduling predicates
Refactor the scheduling predicates based on `MCInstPredicate`.  Augment the
number of helper predicates used by processor specific predicates.

Differential revision: https://reviews.llvm.org/D55375

llvm-svn: 348768
2018-12-10 16:24:30 +00:00
Tim Corringham 2faadb15f4 [AMDGPU] Add new Mode Register pass - minor fix
Trivial change to add parentheses to an expression to avoid a
sanitizer error in SIModeRegister.cpp, which was committed earlier.

llvm-svn: 348767
2018-12-10 16:23:30 +00:00
Cameron McInally 872ed41a1e [AVX512] Update typo in comment
Should be "Sae" for "Suppress All Exceptions".

NFC

llvm-svn: 348763
2018-12-10 15:21:35 +00:00
Vladimir Stefanovic 4433f93afe [mips][mc] Emit R_{MICRO}MIPS_JALR when expanding jal to jalr
When replacing jal with jalr, also emit '.reloc R_MIPS_JALR' (R_MICROMIPS_JALR
for micromips). The linker might then be able to turn jalr into a direct
call.
Add '-mips-jalr-reloc' to enable/disable this feature (default is true).

Differential revision: https://reviews.llvm.org/D55292

llvm-svn: 348760
2018-12-10 15:07:36 +00:00
Tim Corringham 4c4d2fe280 [AMDGPU] Add new Mode Register pass
A new pass to manage the Mode register.

Currently this just manages the floating point double precision
rounding requirements, but is intended to be easily extended to
encompass all Mode register settings.

The immediate motivation comes from the requirement to use the
round-to-zero rounding mode for the 16 bit interpolation
instructions, where the rounding mode setting is shared between
16 and 64 bit operations.

llvm-svn: 348754
2018-12-10 12:06:10 +00:00
Nikita Popov e79477895e [X86] Fix AvoidStoreForwardingBlocks pass for negative displacements
Fixes https://bugs.llvm.org/show_bug.cgi?id=39926.

The size of the first copy was computed as
std::abs(std::abs(LdDisp2) - std::abs(LdDisp1)), which results in
skipped bytes if the signs of LdDisp2 and LdDisp1 differ. As far as
I can see, this should just be LdDisp2 - LdDisp1. The case where
LdDisp1 > LdDisp2 is already handled in the code above, in which case
LdDisp2 is set to LdDisp1 and this subtraction will evaluate to
Size1 = 0, which is the correct value to skip an overlapping copy.

Differential Revision: https://reviews.llvm.org/D55485

llvm-svn: 348750
2018-12-10 10:16:50 +00:00
Craig Topper 02b614abc8 [X86] Merge addcarryx/addcarry intrinsic into a single addcarry intrinsic.
Both intrinsics do the exact same thing so we really only need one.

Earlier in the 8.0 cycle we changed the signature of this intrinsic without renaming it. But it looks difficult to get the autoupgrade code to allow me to merge the intrinsics and change the signature at the same time. So I've renamed the intrinsic slightly for the new merged intrinsic. I'm skipping autoupgrading from the previous new to 8.0 signature. I've also renamed the subborrow for consistency.

llvm-svn: 348737
2018-12-10 06:07:50 +00:00
Brian Gesiak b963c5150d [AMDGPU] Fix discarded result of addAttribute
Summary:
`llvm::AttributeList` and `llvm::AttributeSet` are immutable, and so methods
defined on these classes, such as `addAttribute`, return a new immutable
object with the attribute added. In https://reviews.llvm.org/D55217 I attempted
to annotate methods such as `addAttribute` with `LLVM_NODISCARD`, since
calling these methods has no side-effects, and so ignoring the result
that is returned is almost certainly a programmer error.

However, committing the change resulted in new warnings in the AMDGPU target.
The AMDGPU simplify libcalls pass added in https://reviews.llvm.org/D36436
attempts to add the readonly and nounwind attributes to simplified
library functions, but instead calls the `addAttribute` methods and
ignores the result.

Modify the simplify libcalls pass to actually add the nounwind and
readonly attributes. Also update the simplify libcalls test to assert
that these attributes are actually being set.

Reviewers: rampitec, vpykhtin, rnk

Reviewed By: rampitec

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D55435

llvm-svn: 348732
2018-12-09 21:56:50 +00:00
Craig Topper 2b09d17d93 [X86] If the carry input to an addcarry/subborrow intrinsic is known to be 0, emit a flag setting ADD/SUB instead of ADC/SBB.
Previously we had to take the carry in and add -1 to it to set the carry flag so we could use it with ADC/SBB. But if we know its 0 then we don't need to bother.

This should go a long way towards fixing PR24545.

llvm-svn: 348727
2018-12-09 18:02:37 +00:00
Nico Weber b961661977 Remove unneeded dependency from lib/Target/X86/Utils/ to lib/IR (aka Core).
The dependency was added in r213995 in response to r213986 which did make
X86/Utils depend on IR, but r256680 later removed that dependency again.

llvm-svn: 348724
2018-12-09 15:15:13 +00:00
Sanjay Patel 19bc850220 [x86] don't try to convert add with undef operands to LEA
The existing code tries to handle an undef operand while transforming an add to an LEA, 
but it's incomplete because we will crash on the i16 test with the debug output shown below. 
It's better to just give up instead. Really, GlobalIsel should have folded these before we 
could get into trouble.

# Machine code for function add_undef_i16: NoPHIs, TracksLiveness, Legalized, RegBankSelected, Selected

bb.0 (%ir-block.0):
  liveins: $edi
  %1:gr32 = COPY killed $edi
  %0:gr16 = COPY %1.sub_16bit:gr32
  %5:gr64_nosp = IMPLICIT_DEF
  %5.sub_16bit:gr64_nosp = COPY %0:gr16
  %6:gr64_nosp = IMPLICIT_DEF
  %6.sub_16bit:gr64_nosp = COPY %2:gr16
  %4:gr32 = LEA64_32r killed %5:gr64_nosp, 1, killed %6:gr64_nosp, 0, $noreg
  %3:gr16 = COPY killed %4.sub_16bit:gr32
  $ax = COPY killed %3:gr16
  RET 0, implicit killed $ax

# End machine code for function add_undef_i16.

*** Bad machine code: Reading virtual register without a def ***
- function:    add_undef_i16
- basic block: %bb.0  (0x7fe6cd83d940)
- instruction: %6.sub_16bit:gr64_nosp = COPY %2:gr16
- operand 1:   %2:gr16
LLVM ERROR: Found 1 machine code errors.

Differential Revision: https://reviews.llvm.org/D54710

llvm-svn: 348722
2018-12-09 14:40:37 +00:00
Simon Pilgrim e9d8275e43 [X86] Extend pfm counter coverage for llvm-exegesis
Extension to rL348617, turns out llvm-exegesis doesn't need to match the perf counter name against a scheduler model resource name - so I've added a few more counters that I could find in the libpfm4 source code (and fix a typo in the knl/knm retired_uops counter - which uses 'all' instead of 'any').

llvm-svn: 348721
2018-12-09 13:45:15 +00:00
Matt Arsenault b5613ecf17 AMDGPU: Fix offsets for < 4-byte aggregate kernel arguments
We were still using the rounded down offset and alignment even though
they aren't handled because you can't trivially bitcast the loaded
value.

llvm-svn: 348658
2018-12-07 22:12:17 +00:00
Krzysztof Parzyszek b754f7a2e0 [Hexagon] Fix post-ra expansion of PS_wselect
llvm-svn: 348655
2018-12-07 22:00:53 +00:00
Simon Pilgrim 44dfd81d01 Fix unused variable warning. NFCI.
llvm-svn: 348649
2018-12-07 21:44:25 +00:00
Heejin Ahn 7ce5edf1ea [WebAssembly] clang-format/clang-tidy AsmParser (NFC)
Summary:
- LLVM clang-format style doesn't allow one-line ifs.
- LLVM clang-tidy style says method names should start with a lowercase
  letter. But currently WebAssemblyAsmParser's parent class
  MCTargetAsmParser is mixing lowercase and uppercase method names
  itself so overridden methods cannot be renamed now.
- Changed else ifs after returns to ifs.
- Added some newlines for readability.

Reviewers: aardappel, sbc100

Subscribers: dschuff, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D55350

llvm-svn: 348648
2018-12-07 21:35:37 +00:00
Heejin Ahn d2fd70991d Delete registerScope function
`unregisterScope()` is not currently used, so removing it.

llvm-svn: 348647
2018-12-07 21:31:14 +00:00
Simon Pilgrim 9b8fdab26c [X86] Replace instregex with instrs list. NFCI.
llvm-svn: 348626
2018-12-07 18:47:05 +00:00
Matt Arsenault ce2e053134 AMDGPU: Allow f32 types for llvm.amdgcn.s.buffer.load
llvm-svn: 348625
2018-12-07 18:41:39 +00:00
Craig Topper ba3ab78291 [X86] Initialize and Register X86CondBrFoldingPass
To make X86CondBrFoldingPass can be run with --run-pass option, this can test one wrong assertion on analyzeCompare function for SUB32ri when its operand is not imm

Patch by Jianping Chen

Differential Revision: https://reviews.llvm.org/D55412

llvm-svn: 348620
2018-12-07 18:10:34 +00:00
Matt Arsenault ca8eb0b672 AMDGPU: Remove llvm.SI.tbuffer.store
llvm-svn: 348619
2018-12-07 18:03:47 +00:00
Simon Pilgrim 6155b32250 [X86] Improve pfm counter coverage for llvm-exegesis
This patch attempts to improve pfm perf counter coverage for all the x86 CPUs that libpfm4 supports.

Intel/AMD CPU families tend to share names for cycle/uops counters so even if they don't have a scheduler model yet they can at least use the default values (checked against the libpfm4 source code).

The remaining CPUs (where their port/pipe resource counters are known) I've tried to add to the existing model mappings.

These are untested but don't represent a regression to current llvm-exegesis behaviour for these CPUs.

Differential Revision: https://reviews.llvm.org/D55432

llvm-svn: 348617
2018-12-07 17:48:40 +00:00
Matt Arsenault 3ff764a944 AMDGPU: Remove llvm.SI.buffer.load.dword
llvm-svn: 348616
2018-12-07 17:46:20 +00:00
Matt Arsenault aa9bcd56b1 AMDGPU: Remove llvm.AMDGPU.kill
This is the last of the old AMDGPU intrinsics.

llvm-svn: 348615
2018-12-07 17:46:16 +00:00
Graham Sellers b297379ef0 [AMDGPU] Shrink scalar AND, OR, XOR instructions
This change attempts to shrink scalar AND, OR and XOR instructions which take an immediate that isn't inlineable.

It performs:
AND s0, s0, ~(1 << n) -> BITSET0 s0, n
OR s0, s0, (1 << n) -> BITSET1 s0, n
AND s0, s1, x -> ANDN2 s0, s1, ~x
OR s0, s1, x -> ORN2 s0, s1, ~x
XOR s0, s1, x -> XNOR s0, s1, ~x

In particular, this catches setting and clearing the sign bit for fabs (and x, 0x7ffffffff -> bitset0 x, 31 and or x, 0x80000000 -> bitset1 x, 31).

llvm-svn: 348601
2018-12-07 15:33:21 +00:00
Tim Northover 4bf394be3a ARM: use correct offset from base pointer (r6) in call frame regions.
When we had dynamic call frames (i.e. sp adjustment around each call) we
were including that adjustment into offsets calculated based on r6, even
though it's only sp that changes. This led to incorrect stack slot
accesses.

llvm-svn: 348591
2018-12-07 13:43:55 +00:00
David Green ca29c271d2 [Targets] Add errors for tiny and kernel codemodel on targets that don't support them
Adds fatal errors for any target that does not support the Tiny or Kernel
codemodels by rejigging the getEffectiveCodeModel calls.

Differential Revision: https://reviews.llvm.org/D50141

llvm-svn: 348585
2018-12-07 12:10:23 +00:00
Simon Pilgrim 74c371da7b Fix gcc7.3 -Wparentheses warning. NFCI.
llvm-svn: 348581
2018-12-07 11:10:03 +00:00
Simon Pilgrim 9c7d85bc62 [X86] Add ivybridge to llvm-exegesis PFM counter mappings
llvm-svn: 348575
2018-12-07 09:27:35 +00:00
Zi Xuan Wu cf4d477b0b [PowerPC] Fix assert from machine verify pass that missing undef register flag
Fix assert about using an undefined physical register in machine instruction verify pass. 
The reason is that register flag undef is missing when doing transformation from If Conversion Pass.

```
Bad machine code: Using an undefined physical register 
- function:    func_65
- basic block: %bb.0 entry (0x10024740738)
- instruction: BCLR killed $cr5lt, implicit $lr8, implicit $rm, implicit undef $x3
- operand 0:   killed $cr5lt
LLVM ERROR: Found 1 machine code errors.
```

There are also other existing testcases with same issue. So I add -verify-machineinstrs option to open verifying.

Differential Revision: https://reviews.llvm.org/D55408

llvm-svn: 348566
2018-12-07 05:25:16 +00:00
Craig Topper 2c7a9476e0 [X86] Directly create ADC/SBB nodes instead of using ADD/SUB with (and SETCC_CARRY, 1)
This addresses a FIXME and avoids depending on an isel pattern match I think. I've remove the isel patterns too since he have no lit tests left that cover them. Hopefully that really means they are unused.

I'm trying to decide if we need SETCC_CARRY. This removes one of its usages.

Differential Revision: https://reviews.llvm.org/D55355

llvm-svn: 348536
2018-12-06 22:26:59 +00:00
Evandro Menezes 799b76eae2 [AArch64] Fix Exynos predicate
Fix predicate for arithmetic instructions with shift and/or extend.

llvm-svn: 348510
2018-12-06 18:25:37 +00:00
Simon Pilgrim bb650daeaf [X86] Refactored IsSplatVector to use switch. NFCI.
Initial step towards making the function more generic (and probably move into SelectionDAG).

This is necessary to avoid massive codegen bloat for PR38243 (Add modulo rotate support to LowerRotate).

llvm-svn: 348498
2018-12-06 16:29:14 +00:00
Alexey Bataev 2e1a782189 [DEBUGINFO, NVPTX] Disable emission of ',debug' option if only debug directives are allowed.
Summary:
If the output of debug directives only is requested, we should drop
emission of ',debug' option from the target directive. Required for
supporting of nvprof profiler.

Reviewers: echristo

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D46061

llvm-svn: 348497
2018-12-06 16:25:35 +00:00
Alexey Bataev 64ad0ad5ed [DEBUGINFO, NVPTX]Emit last debugging directives.
Summary:
We may end up with not emitted debug directives at the end of the module
emission. Patch fixes this problem emitting those last directives the
end of the module emission.

Reviewers: echristo

Subscribers: jholewinski, llvm-commits

Differential Revision: https://reviews.llvm.org/D54320

llvm-svn: 348495
2018-12-06 16:02:09 +00:00
Diogo N. Sampaio 9c9067316b [NFC][AArch64] Split out backend features
This patch splits backend features currently
hidden behind architecture versions.

For example, currently the only way to activate
complex numbers extension is targeting an v8.3
architecture, where after the patch this extension
can be added separately.

This refactoring is required by the new command lines proposal:
http://lists.llvm.org/pipermail/llvm-dev/2018-September/126346.html


Reviewers: DavidSpickett, olista01, t.p.northover

Subscribers: kristof.beyls, bryanpkc, javed.absar, pbarrio

Differential revision: https://reviews.llvm.org/D54633

--

It was reverted in rL348249 due a	build bot failure in one of the
regression tests:
http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-win/builds/14386

The problem seems to be that FileCheck behaves
different in windows and linux. This new patch
splits the test file in multiple,
and does more exact pattern matching attempting
to circumvent the issue.

llvm-svn: 348493
2018-12-06 15:39:17 +00:00
Nicolai Haehnle ca4a32945f AMDGPU: Generate VALU ThreeOp Integer instructions
Summary:
Original patch by: Fabian Wahlster <razor@singul4rity.com>

Change-Id: I148f692a88432541fad468963f58da9ddf79fac5

Reviewers: arsenm, rampitec

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, b-sumner, llvm-commits

Differential Revision: https://reviews.llvm.org/D51995

llvm-svn: 348488
2018-12-06 14:33:40 +00:00
Valery Pykhtin f479fbba5f [AMDGPU] Partial revert of rL348371: Turn on the DPP combiner by default
Turn the combiner back off as there're failures until the issue is fixed.

Differential revision: https://reviews.llvm.org/D55314

llvm-svn: 348487
2018-12-06 14:20:02 +00:00
Diana Picus 1027249ec9 [ARM GlobalISel] Nothing is legal for Thumb
...yet!

A lot of the current code should be shared for arm and thumb mode, but
until we add tests and work out some of the details (e.g. checking the
correct subtarget feature for G_SDIV) it's safer to bail out as early as
possible for thumb targets.

This should have arguably been part of r348347, which allowed Thumb
functions to be handled by the IR Translator.

llvm-svn: 348472
2018-12-06 09:26:14 +00:00
Craig Topper 6a6d77b851 [X86] Remove some leftover code for handling an i1 setcc type. NFC
We should only need to handle i8 now.

llvm-svn: 348460
2018-12-06 07:00:02 +00:00
Matthias Braun d041212c07 AArch64: Fix invalid CCMP emission
The code emitting AND-subtrees used to check whether any of the operands
was an OR in order to figure out if the result needs to be negated.
However the OR could be hidden in further subtrees and not immediately
visible.

Change the code so that canEmitConjunction() determines whether the
result of the generated subtree needs to be negated. Cleanup emission
logic to use this. I also changed the code a bit to make all negation
decisions early before we actually emit the subtrees.

This fixes http://llvm.org/PR39550

Differential Revision: https://reviews.llvm.org/D54137

llvm-svn: 348444
2018-12-06 01:40:23 +00:00
Krzysztof Parzyszek 8eb394d764 [Hexagon] Add intrinsics for Hexagon V66
llvm-svn: 348413
2018-12-05 21:14:51 +00:00
Krzysztof Parzyszek 545a68ca4b [Hexagon] Add instruction definitions for Hexagon V66
llvm-svn: 348411
2018-12-05 21:01:07 +00:00
Krzysztof Parzyszek 13a9cf28a1 [Hexagon] Foundation of support for Hexagon V66
llvm-svn: 348407
2018-12-05 20:18:09 +00:00
Aditya Nandakumar f75d4f329c [GISel]: Provide standard interface to observe changes in GISel passes
https://reviews.llvm.org/D54980

This provides a standard API across GISel passes to observe and notify
passes about changes (insertions/deletions/mutations) to MachineInstrs.
This patch also removes the recordInsertion method in MachineIRBuilder
and instead provides method to setObserver.

Reviewed by: vkeles.

llvm-svn: 348406
2018-12-05 20:14:52 +00:00
Evandro Menezes 4b5707550c [AArch64] Reword description of feature (NFC)
Reword the description of the feature that enables custom handling of cheap
instructions.

llvm-svn: 348398
2018-12-05 18:42:57 +00:00
Chandler Carruth 71c14a36a2 [SLH] Fix a nasty bug in SLH.
Whenever we effectively take the address of a basic block we need to
manually update that basic block to reflect that fact or later passes
such as tail duplication and tail merging can break the invariants of
the code. =/ Sadly, there doesn't appear to be any good way of
automating this or even writing a reasonable assert to catch it early.

The change seems trivially and obviously correct, but sadly the only
really good test case I have is 1000s of basic blocks. I've tried
directly writing a test case that happens to make tail duplication do
something that crashes later on, but this appears to require an
*amazingly* complex set of conditions that I've not yet reproduced.

The change is technically covered by the tests because we mark the
blocks as having their address taken, but that doesn't really count as
properly testing the functionality.

llvm-svn: 348374
2018-12-05 15:42:11 +00:00
Valery Pykhtin 5b4db77b13 [AMDGPU]: Turn on the DPP combiner by default
Differential revision: https://reviews.llvm.org/D55314

llvm-svn: 348371
2018-12-05 15:21:17 +00:00
Simon Pilgrim 32483668d7 [X86][SSE] Begun adding modulo rotate support to LowerRotate
Prep work for PR38243 - mainly adding comments on where we need to add modulo support (doing so at the moment causes massive codegen regressions).

I've also consistently added support for modulo folding for uniform constants (although at the moment we have no way to trigger this) and removed the old assertions.

llvm-svn: 348366
2018-12-05 14:46:37 +00:00
Simon Pilgrim 180639afe5 [SelectionDAG] Initial support for FSHL/FSHR funnel shift opcodes (PR39467)
This is an initial patch to add a minimum level of support for funnel shifts to the SelectionDAG and to begin wiring it up to the X86 SHLD/SHRD instructions.

Some partial legalization code has been added to handle the case for 'SlowSHLD' where we want to expand instead and I've added a few DAG combines so we don't get regressions from the existing DAG builder expansion code.

Differential Revision: https://reviews.llvm.org/D54698

llvm-svn: 348353
2018-12-05 11:12:12 +00:00
Diana Picus 8a1b4f57c9 [ARM GlobalISel] Implement call lowering for Thumb2
The only things that are different from arm are:
* different opcodes for calls and returns
* Thumb calls take predicate operands

llvm-svn: 348347
2018-12-05 10:35:28 +00:00
Saleem Abdulrasool efd2cb8a0d AArch64: support funclets in fastcall and swift_call
Functions annotated with `__fastcall` or `__attribute__((__fastcall__))`
or `__attribute__((__swiftcall__))` may contain SEH handlers even on
Win64.  This matches the behaviour of cl which allows for
`__try`/`__except` inside a `__fastcall` function.  This was detected
while trying to self-host clang on Windows ARM64.

llvm-svn: 348337
2018-12-05 07:09:20 +00:00
Amara Emerson 8547f4fb7f [AArch64][GlobalISel] Re-enable selection of volatile loads.
We previously disabled this in r323371 because of a bug where we selected an
extending load, but didn't delete the old G_LOAD, resulting in two loads being
generated for volatile loads.

Since we now have dedicated G_SEXTLOAD/G_ZEXTLOAD operations, and that the
tablegen patterns should no longer be able to select (ext(load x)) patterns, it
should be safe to re-enable it.

The old test case should still work as expected.

llvm-svn: 348320
2018-12-05 00:03:09 +00:00
Saleem Abdulrasool a9248fdfab AArch64: clean up some whitespace in Windows CC (NFC)
Drive by clean up for Windows ARM64 variadic CC (NFC).

llvm-svn: 348310
2018-12-04 22:19:29 +00:00
Nirav Dave 9c593e8676 [AVR] Silence fallthrough warning. NFC.
llvm-svn: 348304
2018-12-04 21:41:52 +00:00
Stefan Pintilie 46f840f286 [PowerPC] Make no-PIC default to match GCC - LLVM
Change the default for PowerPC LE to -fno-PIC.

Differential Revision: https://reviews.llvm.org/D53383

llvm-svn: 348298
2018-12-04 20:14:57 +00:00
Nirav Dave ce26c27b2a [SelectionDAG] Redefine isGAPlusOffset in terms of unwrapAddress. NFCI.
llvm-svn: 348288
2018-12-04 17:59:43 +00:00
Matt Arsenault 0f414b81cc AMDGPU: Add f32 vectors to SGPR register classes
llvm-svn: 348286
2018-12-04 17:51:36 +00:00
Simon Pilgrim 07843640d5 [X86][SSE] Add SimplifyDemandedBitsForTargetNode handling for MOVMSK
Moves existing SimplifyDemandedBits call out of combineMOVMSK and add SimplifyDemandedVectorElts call based on the sign bits we need.

llvm-svn: 348282
2018-12-04 16:52:32 +00:00
Krzysztof Parzyszek 9fc0a2fe30 [Hexagon] Remove unused checker functions from asm parser
llvm-svn: 348269
2018-12-04 14:58:14 +00:00
Simon Pilgrim 6a088b2ce5 Fix MSVC "unknown pragma" warning. NFCI.
llvm-svn: 348256
2018-12-04 12:31:52 +00:00
Simon Pilgrim 58d44235e5 Fix -Wparentheses warning. NFCI.
llvm-svn: 348254
2018-12-04 12:24:10 +00:00
Simon Pilgrim b1d6db7693 [X86] Remove unnecessary peekThroughEXTRACT_SUBVECTORs call.
The GetSplatValue/IsSplatVector call will call this anyhow and the later code is just for a v2i64 type so doesn't need it.

llvm-svn: 348253
2018-12-04 12:21:43 +00:00
Simon Pilgrim 0add090e24 [TargetLowering] expandFP_TO_UINT - avoid FPE due to out of range conversion (PR17686)
PR17686 demonstrates that for some targets FP exceptions can fire in cases where the FP_TO_UINT is expanded using a FP_TO_SINT instruction.

The existing code converts both the inrange and outofrange cases using FP_TO_SINT and then selects the result, this patch changes this for 'strict' cases to pre-select the FP_TO_SINT input and the offset adjustment.

The X87 cases don't need the strict flag but generates much nicer code with it....

Differential Revision: https://reviews.llvm.org/D53794

llvm-svn: 348251
2018-12-04 11:21:30 +00:00
Simon Pilgrim 1a2e0200ac Revert rL348121 from llvm/trunk: [NFC][AArch64] Split out backend features
This patch splits backend features currently
hidden behind architecture versions.

For example, currently the only way to activate
complex numbers extension is targeting an v8.3
architecture, where after the patch this extension
can be added separately.

This refactoring is required by the new command lines proposal:
http://lists.llvm.org/pipermail/llvm-dev/2018-September/126346.html

Reviewers: DavidSpickett, olista01, t.p.northover

Subscribers: kristof.beyls, bryanpkc, javed.absar, pbarrio

Differential revision: https://reviews.llvm.org/D54633

........

This has been causing buildbots failures for the past 24 hours: http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-win/builds/14386

llvm-svn: 348249
2018-12-04 10:55:48 +00:00
Craig Topper 35585aff34 [X86] Remove custom DAG combine for SIGN_EXTEND_VECTOR_INREG/ZERO_EXTEND_VECTOR_INREG.
We only needed this because it provided really aggressive constant folding even through constant pool entries created from build_vectors. The main case was for vXi8 MULH legalization which was happening as part of legalize DAG instead of as part of legalize vector ops. Now its part of vector op legalization and we've added special handling for build vectors of all constants there. This has removed the need for this code on the list tests we have.

llvm-svn: 348237
2018-12-04 04:51:07 +00:00
Sanjin Sijaric dc6403d133 [ARM64][Windows] Fix local stack size for funclets
The comment was misplaced, and the code didn't do what the comment indicated,
namely ignoring the varargs portion when computing the local stack size of a
funclet in emitEpilogue.  This results in incorrect offset computations within
funclets that are contained in vararg functions.

Differential Revision: https://reviews.llvm.org/D55096

llvm-svn: 348222
2018-12-04 00:54:52 +00:00
Jessica Paquette bce2086ad1 [MachineOutliner] Move stack instr check logic to getOutliningCandidateInfo
This moves the stack check logic into a lambda within getOutliningCandidateInfo.

This allows us to be less conservative with stack checks. Whether or not a
stack instruction is safe to outline is dependent on the frame variant and call
variant of the outlined function; only in cases where we modify the stack can
these be unsafe.

So, if we move that logic later, when we're looking at an individual candidate,
we can make better decisions here.

This gives some code size savings as a result.

llvm-svn: 348220
2018-12-04 00:31:55 +00:00
Jessica Paquette 2f5833ecd9 [MachineOutliner][AArch64][NFC] Add early exit to candidate discarding logic
If we dropped too many candidates to be beneficial when dropping candidates
that modify the stack, there's no reason to check for other cost model
qualities.

llvm-svn: 348219
2018-12-04 00:31:47 +00:00
Krzysztof Parzyszek 44c1f81b27 [Hexagon] Switch to auto-generated intrinsic definitions and patterns
llvm-svn: 348206
2018-12-03 22:40:36 +00:00
Krzysztof Parzyszek 9dafa8a2c6 [Hexagon] Extract operand decoders into a separate file, NFC
These decoders are automatically generated. Keeping them separated makes
updating architectures easier.

llvm-svn: 348196
2018-12-03 21:59:21 +00:00
Sanjay Patel d24f63477d [DAGCombiner] narrow truncated vector binops when legal
This is the smallest vector enhancement I could find to D54640.
Here, we're allowing narrowing to only legal vector ops because we'll see
regressions without that. All of the test diffs are wins from what I can tell.
With AVX/AVX512, we can shrink ymm/zmm ops to xmm.

x86 vector multiplies are the problem case that we're avoiding due to the
patchwork ISA, and it's not clear to me if we can dance around those
regressions using TLI hooks or if we need preliminary patches to plug those
holes.

Differential Revision: https://reviews.llvm.org/D55126

llvm-svn: 348195
2018-12-03 21:57:35 +00:00
Simon Atanasyan f76884b0d3 [mips] Fix TestDWARF32Version5Addr8AllForms test failure on MIPS hosts
The `DIEExpr` is used in debug information entries for either TLS variables
or call sites. For now the last case is unsupported for targets with delay
slots, for MIPS in particular.

The `DIEExpr::EmitValue` method calls a virtual `EmitDebugThreadLocal`
routine which, in case of MIPS, always emits either `.dtprelword` or
`.dtpreldword` directives. That is okay for "main" code, but in unit
tests `DIEExpr` instances can be created not for TLS variables only even
on MIPS hosts. That is a reason of the `TestDWARF32Version5Addr8AllForms`
failure because handling of the `R_MIPS_TLS_DTPREL` relocation writes
incorrect value into dwarf structures. And anyway unconditional emitting
of `.dtprelword` directives will be incorrect when/if debug information
entries for call sites become supported on MIPS.

The patch solves the problem by wrapping expression created in the
`MipsTargetObjectFile::getDebugThreadLocalSymbol` method in to the
`MipsMCExpr` expression with a new `MEK_DTPREL` tag. This tag is
recognized in the `MipsAsmPrinter::EmitDebugThreadLocal` method and
`.dtprelword` directives created in this case only. In other cases the
expression saved as a regular data.

Differential Revision: http://reviews.llvm.org/D54937

llvm-svn: 348194
2018-12-03 21:54:43 +00:00
Krzysztof Parzyszek a45a55fc67 [Hexagon] Remove unused encodings, NFC
llvm-svn: 348193
2018-12-03 21:49:12 +00:00
Wouter van Oortmerssen c7b89f0f62 [WebAssembly] Enforce assembler emits to streamer in order.
Summary:
The assembler processes directives and instructions in whatever order
they are in the file, then directly emits them to the streamer. This
could cause badly written (or generated) .s files to produce
incorrect binaries.

It now has state that tracks what it has most recently seen, to
enforce they are emitted in a given order that always produces
correct wasm binaries.

Also added a new test that compares obj2yaml output from llc (the
backend) to that going via .s and the assembler to ensure both paths
generate the same binaries.

The features this test covers could be extended.

Passes all wasm Lit tests.

Fixes: https://bugs.llvm.org/show_bug.cgi?id=39557

Reviewers: sbc100, dschuff, aheejin

Subscribers: jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D55149

llvm-svn: 348185
2018-12-03 20:30:28 +00:00
Krzysztof Parzyszek 6290a73f29 [Hexagon] Update timing classes
llvm-svn: 348183
2018-12-03 20:13:18 +00:00
Krzysztof Parzyszek 1cbc5cd364 [Hexagon] Change instruction type field in TSFlags to 7 bits
llvm-svn: 348171
2018-12-03 19:34:04 +00:00
Jessica Paquette 2accb31690 [MachineOutliner] Drop candidates that require fixups if it's beneficial
If it's a bigger code size win to drop candidates that require stack fixups
than to demote every candidate to that variant, the outliner should do that.

This happens if the number of bytes taken by calls to functions that don't
require fixups, plus the number of bytes that'd be left is less than the
number of bytes that it'd take to emit a save + restore for all candidates.

Also add tests for each possible new behaviour.

- machine-outliner-compatible-candidates shows that when we have candidates
that don't use the stack, we can use the default call variant along with the
no save/regsave variant.

- machine-outliner-all-stack shows that when it's better to fix up the stack,
we still will demote all candidates to that case

- machine-outliner-drop-stack shows that we can discard candidates that
require stack fixups when it would be beneficial to do so.

llvm-svn: 348168
2018-12-03 19:11:27 +00:00
Krzysztof Parzyszek 71a7f447f6 [Hexagon] Add HasV5 predicate for compatibility with auto-generated files
llvm-svn: 348167
2018-12-03 19:05:42 +00:00
Krzysztof Parzyszek a55515f9a6 [Hexagon] Remove unused operand definitions, NFC
llvm-svn: 348163
2018-12-03 18:54:24 +00:00
Krzysztof Parzyszek 7ecc277ef9 [Hexagon] Some formatting changes, NFC
llvm-svn: 348162
2018-12-03 18:40:15 +00:00
Craig Topper 5440b63fa8 [X86] Teach LowerMUL/LowerMULH for vXi8 to unpack constant RHS.
Summary:
We need to unpackl and unpackh the operands to use two vXi16 multiplies. Previously it looks like the low unpack would get constant folded at least in the 128-bit case after shuffle lowering turned the unpackl into ZERO_EXTEND_VECTOR_INREG and X86 custom DAG combined it. The same doesn't happen for the high half. So we'd load a constant and then shuffle it. But the low half would just be loaded and used by the multiply directly.

After this patch we now end up with a constant pool entry for the low and high unpacks separately with no shuffle operations.

This is a step towards removing custom constant folding for ZERO_EXTEND_VECTOR_INREG/SIGN_EXTEND_VECTOR_INREG in the X86 backend.

Reviewers: RKSimon, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D55165

llvm-svn: 348159
2018-12-03 18:26:27 +00:00
Craig Topper e35b01f8ea [X86] Add DAG combine to combine a v8i32->v8i16 truncate with a packuswb that truncates v8i16->v8i8.
Summary:
Under -x86-experimental-vector-widening-legalization, fp_to_uint/fp_to_sint with a smaller than 128 bit vector type results are custom type legalized by promoting the result to a 128 bit vector by promoting the elements, inserting an assertzext/assertsext, then truncating back to original type. The truncate will be further legalizdd to a pack shuffle. In the case of a v8i8 result type, we'll end up with a v8i16 fp_to_sint. This will need to be further legalized during vector op legalization by promoting to v8i32 and then truncating again. Under avx2 this produces good code with two pack instructions, but Under avx512 this will result in a truncate instruction and a packuswb instruction. But we should be able to get away with a single truncate instruction.

The other option is to promote all the way to vXi32 result type during the first type legalization. But in some experimentation that seemed to require more work to produce good code for other configurations.

Reviewers: RKSimon, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D54836

llvm-svn: 348158
2018-12-03 18:26:24 +00:00
Jonas Paulsson 8ae0f88b13 [SystemZ::TTI] Return zero cost for ICmp that becomes Load And Test.
A loaded value with multiple users compared with 0 will become a load and
test single instruction. The load is not folded in this case (multiple
users), but the compare instruction is eliminated.

This patch returns 0 cost for the icmp in these cases.

Review: Ulrich Weigand
https://reviews.llvm.org/D55111

llvm-svn: 348141
2018-12-03 14:30:18 +00:00
Pablo Barrio a17f855698 [AArch64] Add command-line option for SSBS
Summary:
SSBS (Speculative Store Bypass Safe) is only mandatory from 8.5
onwards but is optional from Armv8.0-A. This patch adds a command
line option to enable SSBS, as it was previously only possible to
enable by selecting -march=armv8.5-a.

Similar patch upstream in GNU binutils:
https://sourceware.org/ml/binutils/2018-09/msg00274.html

Reviewers: olista01, samparker, aemerson

Reviewed By: samparker

Subscribers: javed.absar, kristof.beyls, kristina, llvm-commits

Differential Revision: https://reviews.llvm.org/D54629

llvm-svn: 348137
2018-12-03 14:00:47 +00:00
Ron Lieberman 16de4fd2eb [AMDGPU] Add sdwa support for ADD|SUB U64 decomposed Pseudos
The introduction of S_{ADD|SUB}_U64_PSEUDO instructions which are decomposed
into VOP3 instruction pairs for S_ADD_U64_PSEUDO:
  V_ADD_I32_e64
  V_ADDC_U32_e64
and for S_SUB_U64_PSEUDO
  V_SUB_I32_e64
  V_SUBB_U32_e64
preclude the use of SDWA to encode a constant.
SDWA: Sub-Dword addressing is supported on VOP1 and VOP2 instructions,
but not on VOP3 instructions.

We desire to fold the bit-and operand into the instruction encoding
for the V_ADD_I32 instruction. This requires that we transform the
VOP3 into a VOP2 form of the instruction (_e32).
  %19:vgpr_32 = V_AND_B32_e32 255,
      killed %16:vgpr_32, implicit $exec
  %47:vgpr_32, %49:sreg_64_xexec = V_ADD_I32_e64
      %26.sub0:vreg_64, %19:vgpr_32, implicit $exec
 %48:vgpr_32, dead %50:sreg_64_xexec = V_ADDC_U32_e64
      %26.sub1:vreg_64, %54:vgpr_32, killed %49:sreg_64_xexec, implicit $exec

which then allows the SDWA encoding and becomes
  %47:vgpr_32 = V_ADD_I32_sdwa
      0, %26.sub0:vreg_64, 0, killed %16:vgpr_32, 0, 6, 0, 6, 0,
      implicit-def $vcc, implicit $exec
  %48:vgpr_32 = V_ADDC_U32_e32
      0, %26.sub1:vreg_64, implicit-def $vcc, implicit $vcc, implicit $exec


Differential Revision: https://reviews.llvm.org/D54882

llvm-svn: 348132
2018-12-03 13:04:54 +00:00
Tim Northover 5745b6ac3b ARM: use target-specific SUBS node when combining cmp with cmov.
This has two positive effects. First, using a custom node prevents
recombination leading to an infinite loop since the output DAG is notionally a
little more complex than the input one. Using a flag-setting instruction also
allows the subtraction to be folded with the related comparison more easily.

https://reviews.llvm.org/D53190

llvm-svn: 348122
2018-12-03 11:16:21 +00:00
Diogo N. Sampaio 3c7d062b6b [NFC][AArch64] Split out backend features
This patch splits backend features currently
hidden behind architecture versions.

For example, currently the only way to activate
complex numbers extension is targeting an v8.3
architecture, where after the patch this extension
can be added separately.

This refactoring is required by the new command lines proposal:
http://lists.llvm.org/pipermail/llvm-dev/2018-September/126346.html

Reviewers: DavidSpickett, olista01, t.p.northover

Subscribers: kristof.beyls, bryanpkc, javed.absar, pbarrio

Differential revision: https://reviews.llvm.org/D54633

llvm-svn: 348121
2018-12-03 11:08:13 +00:00
Oliver Stannard 4cf35b4ab0 [ARM][MC] Move information about variadic register defs into tablegen
Currently, variadic operands on an MCInst are assumed to be uses,
because they come after the defs. However, this is not always the case,
for example the Arm/Thumb LDM instructions write to a variable number of
registers.

This adds a property of instruction definitions which can be used to
mark variadic operands as defs. This only affects MCInst, because
MachineInstruction already tracks use/def per operand in each instance
of the instruction, so can already represent this.

This property can then be checked in MCInstrDesc, allowing us to remove
some special cases in ARMAsmParser::isITBlockTerminator.

Differential revision: https://reviews.llvm.org/D54853

llvm-svn: 348114
2018-12-03 10:32:42 +00:00
Oliver Stannard c588110f13 [ARM][Asm] Debug trace for the processInstruction loop
In the Arm assembly parser, we first match an instruction, then call
processInstruction to possibly change it to a different encoding, to
match rules in the architecture manual which can't be expressed by the
table-generated matcher.

This adds debug printing so that this process is visible when using the
-debug option.

To support this, I've added a new overload of MCInst::dump_pretty which
takes the opcode name as a StringRef, since we don't have an InstPrinter
instance in the assembly parser. Instead, we can get the same
information directly from the MCInstrInfo.

Differential revision: https://reviews.llvm.org/D54852

llvm-svn: 348113
2018-12-03 10:21:28 +00:00
Sjoerd Meijer 5afc957eba [ARM] FP16: support vld1.16 for vector loads with post-increment
Differential Revision: https://reviews.llvm.org/D55112

llvm-svn: 348110
2018-12-03 08:26:34 +00:00
Kang Zhang 51986417f9 [PowerPC] Fix inconsistent ImmMustBeMultipleOf for same instruction
Summary:
There are 4 instructions which have Inconsistent ImmMustBeMultipleOf in the
function PPCInstrInfo::instrHasImmForm, they are LFS, LFD, STFS, STFD.
These four instructions should set the ImmMustBeMultipleOf to 1 instead of 4.

Reviewed By: steven.zhang

Differential Revision: https://reviews.llvm.org/D54738

llvm-svn: 348109
2018-12-03 03:32:57 +00:00
QingShan Zhang 8b7653db72 [NFC] [PowerPC] add an routine in PPCTargetLowering to determine if a global is accessed as got-indirect or not.
In theory, we should let the PPC target to determine how to lower the TOC Entry for globals. 
And the PPCTargetLowering requires this query to do some optimization for TOC_Entry. 

Differential Revision: https://reviews.llvm.org/D54925

llvm-svn: 348108
2018-12-03 03:32:16 +00:00
Craig Topper 959b415e2f [X86] Add a DAG combine to turn stores of vXi1 on pre-avx512 targets into a bitcast and a store of a iX scalar.
llvm-svn: 348104
2018-12-02 19:47:14 +00:00
Craig Topper 6f54ff57fd [X86] Fix bad comment. NFC
llvm-svn: 348103
2018-12-02 19:47:13 +00:00
Craig Topper 204e4110e0 [X86] Simplify LowerBITCAST code for v2i32/v4i16/v8i8/i64->mmx/i64/f64 bitcast.
Previously this code generated its own extracts and build_vector. But we can use a simpler concat_vectors or scalar_to_vector operation and let type legalization do additional legalization of those operations.

llvm-svn: 348087
2018-12-02 07:52:39 +00:00
Craig Topper 4bb077910a [X86] Add custom type legalization for v2i32/v4i16/v8i8->mmx bitcasts to avoid a store/load to/from the stack.
Widen the input to a 128 bit vector by padding with undef elements. Then use a movdq2q to convert from xmm register to mmx register.

llvm-svn: 348086
2018-12-02 05:46:50 +00:00
Craig Topper ec096a1dae [X86] Custom type legalize v2i32/v4i16/v8i8->i64 bitcasts in 64-bit mode similar to what's done when the destination is f64.
The generic legalizer will fall back to a stack spill that uses a truncating store. That store will get expanded into a shuffle and non-truncating store on pre-avx512 targets. Once that happens the stack store/load pair will be combined away leaving behind the shuffle and bitcasts. On avx512 targets the truncating store is legal so doesn't get folded away.

By custom legalizing it we can avoid this churn and maybe produce better code.

llvm-svn: 348085
2018-12-02 05:46:48 +00:00
Jessica Paquette 9a7103b0f8 [MachineOutliner][AArch64] Improve checks for stack instructions
If we know that we'll definitely save LR to a register, there's no reason to
pre-check whether or not a stack instruction is unsafe to fix up.

This makes it so that we check for that condition before mapping instructions.

This allows us to outline more, since we don't pessimise as many instructions.

Also update some tests, since we outline more.

llvm-svn: 348081
2018-12-01 21:24:06 +00:00
Craig Topper f4b13927e7 [X86] Don't use zero_extend_vector_inreg for mulhu lowering with sse 4.1
Summary: With sse4.1 we use two zero_extend_vector_inreg and a pshufd to expand the v16i8 input into two v8i16 vectors for the multiply. That's 3 shuffles to extend one operand. The other operand is usually constant as this is mostly used by division by constant optimization. Pre sse4.1 we use a punpckhbw and a punpcklbw with a zero vector. That's two shuffles and an xor and a copy due to tied register constraints. That seems maybe better than the 3 shuffles. With AVX we avoid the copy so that's obviously better.

Reviewers: spatel, RKSimon

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D55138

llvm-svn: 348079
2018-12-01 19:26:31 +00:00
Graham Sellers ba559ac058 [AMDGPU] Split 64-Bit XNOR to 64-Bit NOT/XOR
The identity ~(x ^ y) == (~x ^ y) == (x ^ ~y) allows XNOR (XOR/NOT) to turn into NOT/XOR. Handling this case with its own split means we can make the NOT remain in the scalar unit. Previously, we split 64-bit XNOR into two 32-bit XNOR, then lowered. Now, we get three instructions (s_not, v_xor, v_xor) rather than four in the case where either of the sources is a scalar 64-bit.

Add test cases to xnor.ll to attempt XNOR Vx, Sy and XNOR Sx, Vy. Also adding test that uses the opposite identity such that (~x ^ y) on the scalar unit (or vector for gfx906) can generate XNOR. This already worked, but I didn't see a test for it.

Differential: https://reviews.llvm.org/D55071
llvm-svn: 348075
2018-12-01 12:27:53 +00:00
Alex Bradbury 757d296222 [RISCV] Remove RV64I SLLW/SRLW/SRAW patterns and add new test cases
As noted by Eli Friedman <https://reviews.llvm.org/D52977?id=168629#1315291>, 
the RV64I shift patterns for SLLW/SRLW/SRAW make some incorrect assumptions. 
SRAW assumed that (sext_inreg foo, i32) could only be produced when 
sign-extended an i32. However, it can be produced by input such as:

define i64 @tricky_ashr(i64 %a, i64 %b) {
  %1 = shl i64 %a, 32
  %2 = ashr i64 %1, 32
  %3 = ashr i64 %2, %b
  ret i64 %3
}

It's important not to select sraw in the above case, because sraw only uses 
bits lower 5 bits from the shift, while a shift of 32-63 would be valid.

Similarly, the patterns for srlw assumed (and foo, 0xffffffff) would only be 
produced when zero-extending a value that was originally i32 in LLVM IR. This
is obviously incorrect.

This patch removes the SLLW/SRLW/SRAW shift patterns for the time being and 
adds test cases that would demonstrate a miscompile if the incorrect patterns 
were re-added.

llvm-svn: 348067
2018-12-01 05:00:00 +00:00
Artem Belevich e5664b1559 [NVPTX] Add lowering of i128 numbers as struct fields
Addition to D34555 - override VTs computation with ComputePTXValueVTs
for struct fields.

Author: Denys Zariaiev<denys.zariaiev@gmail.com>

Differential Revision: https://reviews.llvm.org/D55144

llvm-svn: 348057
2018-12-01 00:21:52 +00:00
Nicolai Haehnle a7b00058e0 AMDGPU: Divergence-driven selection of scalar buffer load intrinsics
Summary:
Moving SMRD to VMEM in SIFixSGPRCopies is rather bad for performance if
the load is really uniform. So select the scalar load intrinsics directly
to either VMEM or SMRD buffer loads based on divergence analysis.

If an offset happens to end up in a VGPR -- either because a floating
point calculation was involved, or due to other remaining deficiencies
in SIFixSGPRCopies -- we use v_readfirstlane.

There is some unrelated churn in tests since we now select MUBUF offsets
in a unified way with non-scalar buffer loads.

Change-Id: I170e6816323beb1348677b358c9d380865cd1a19

Reviewers: arsenm, alex-t, rampitec, tpr

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D53283

llvm-svn: 348050
2018-11-30 22:55:38 +00:00
Nicolai Haehnle a9cc92c247 AMDGPU: Fix various issues around the VirtReg2Value mapping
Summary:
The VirtReg2Value mapping is crucial for getting consistently
reliable divergence information into the SelectionDAG. This
patch fixes a bunch of issues that lead to incorrect divergence
info and introduces tight assertions to ensure we don't regress:

1. VirtReg2Value is generated lazily; there were some cases where
   a lookup was performed before all relevant virtual registers were
   created, leading to an out-of-sync mapping. Those cases were:

  - Complex code to lower formal arguments that generated CopyFromReg
    nodes from live-in registers (fixed by never querying the mapping
    for live-in registers).

  - Code that generates CopyToReg for formal arguments that are used
    outside the entry basic block (fixed by never querying the
    mapping for Register nodes, which don't need the divergence info
    anyway).

2. For complex values that are lowered to a sequence of registers,
   all registers must be reflected in the VirtReg2Value mapping.

I am not adding any new tests, since I'm not actually aware of any
bugs that these problems are causing with trunk as-is. However,
I recently added a test case (in r346423) which fails when D53283 is
applied without this change. Also, the new assertions should provide
most of the effective test coverage.

There is one test change in sdwa-peephole.ll. The underlying issue
is that since the divergence info is now correct, the DAGISel will
select V_OR_B32 directly instead of S_OR_B32. This leads to an extra
COPY which affects the behavior of MachineLICM in a way that ends up
with the S_MOV_B32 with the constant in a different basic block than
the V_OR_B32, which is presumably what defeats the peephole.

Reviewers: alex-t, arsenm, rampitec

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D54340

llvm-svn: 348049
2018-11-30 22:55:29 +00:00
Jessica Paquette 1cb18ec4ec [MachineOutliner] Outline both register save calls + no LR save calls together
Instead of treating the outlined functions for these as distinct frames, they
should be combined into one case. Neither allows for stack fixups, and both
generate the same frame. Thus, they ought to be considered one case.

This makes the code far easier to understand, for one thing. It also offers
some small code size improvements. It's fairly rare to see a class of outlined
functions that doesn't fall entirely into one variant (on CTMark anyway). It
does happen from time to time though.

This mostly offers some serious simplification.

Also update the test to show the added functionality.

llvm-svn: 348036
2018-11-30 21:14:58 +00:00
Peter Collingbourne 35fcc294ab AArch64: Don't emit CFI for SCS register in nounwind functions.
All that you can legitimately do with the CFI for a nounwind function
is get a backtrace, and adjusting the SCS register is not (currently)
required for this purpose.

Differential Revision: https://reviews.llvm.org/D54988

llvm-svn: 348035
2018-11-30 21:04:25 +00:00
Craig Topper 4d80f199e8 [X86] Change vXi8 MULHU lowering to unpack high and low half of lanes instead of extracting and concating low and high half registers.
This reduces the number of shuffle operations that need to be done. The splitting strategy requires the shuffle unit for the extraction and the extension. With the unpack strategy the unpacks accomplish a splitting and extending in one operation.

llvm-svn: 348019
2018-11-30 18:43:18 +00:00
Craig Topper 8191307d09 [X86] Prefer lowerVectorShuffleAsBitMask over using a avx512 masked operation when avx512bw/avx512vl is enabled.
This does require a constant pool load instead of loading an immediate into a gpr, moving to a k register and masking. But its less instructions and more consistent with previous ISAs. It probably opens up more combine opportunities as one of the test cases demonstrates.

llvm-svn: 348018
2018-11-30 18:43:15 +00:00
Ron Lieberman f48e43bbf7 [AMDGPU] Disable SReg Global LD/ST, perf regression
Differential Revision: https://reviews.llvm.org/D55093

llvm-svn: 348014
2018-11-30 18:29:17 +00:00
Valery Pykhtin 3d9afa273f [AMDGPU] Combine DPP mov with use instructions (VOP1/2/3)
Introduces DPP pseudo instructions and the pass that combines DPP mov with subsequent uses.

Differential revision: https://reviews.llvm.org/D53762

llvm-svn: 347993
2018-11-30 14:21:56 +00:00
Alex Bradbury 4830fdd21a [RISCV] Add additional CSR instruction aliases (imm. operands)
This patch adds CSR instructions aliases for the cases where the instruction 
takes an immediate operand but the alias doesn't have the i suffix. This is 
necessary for gas/gcc compatibility.

gas doesn't do a similar conversion for fsflags or fsrm, so this should be 
complete.

Differential Revision: https://reviews.llvm.org/D55008
Patch by Luís Marques.

llvm-svn: 347991
2018-11-30 14:10:52 +00:00
Alex Bradbury 26403def69 [RISCV] Add UNIMP instruction (32- and 16-bit forms)
This patch adds support for UNIMP in both 32- and 16-bit forms. The 32-bit 
form can be seen as a variant of the ECALL/EBREAK/etc. family of instructions. 
The 16-bit form is just all zeroes, which isn't a valid RISC-V instruction, 
but still follows the 16-bit instruction form (i.e. bits 0-1 != 11).

Until recently unimp was undocumented and supported just by binutils, which 
printed unimp for either the 16 or 32-bit form. Both forms are now documented 
<https://github.com/riscv/riscv-asm-manual/pull/20> and binutils now supports 
c.unimp <https://sourceware.org/ml/binutils-cvs/2018-11/msg00179.html>.

Differential Revision: https://reviews.llvm.org/D54316
Patch by Luís Marques.

llvm-svn: 347988
2018-11-30 13:39:17 +00:00
Alex Bradbury e0e62e97df [TargetLowering][RISCV] Introduce isSExtCheaperThanZExt hook and implement for RISC-V
DAGTypeLegalizer::PromoteSetCCOperands currently prefers to zero-extend 
operands when it is able to do so. For some targets this is more expensive 
than a sign-extension, which is also a valid choice. Introduce the 
isSExtCheaperThanZExt hook and use it in the new SExtOrZExtPromotedInteger 
helper. On RISC-V, we prefer sign-extension for FromTy == MVT::i32 and ToTy == 
MVT::i64, as it can be performed using a single instruction.

Differential Revision: https://reviews.llvm.org/D52978

llvm-svn: 347977
2018-11-30 09:56:54 +00:00
Alex Bradbury bc96a98ed0 [RISCV] Introduce codegen patterns for instructions introduced in RV64I
As discussed in the RFC 
<http://lists.llvm.org/pipermail/llvm-dev/2018-October/126690.html>, 64-bit 
RISC-V has i64 as the only legal integer type.  This patch introduces patterns 
to support codegen of the new instructions 
introduced in RV64I: addiw, addiw, subw, sllw, slliw, srlw, srliw, sraw, 
sraiw, ld, sd.

Custom selection code is needed for srliw as SimplifyDemandedBits will remove 
lower bits from the mask, meaning the obvious pattern won't work:

def : Pat<(sext_inreg (srl (and GPR:$rs1, 0xffffffff), uimm5:$shamt), i32),
          (SRLIW GPR:$rs1, uimm5:$shamt)>;
This is sufficient to compile and execute all of the GCC torture suite for 
RV64I other than those files using frameaddr or returnaddr intrinsics 
(LegalizeDAG doesn't know how to promote the operands - a future patch 
addresses this).

When promoting i32 sltu/sltiu operands, it would be more efficient to use 
sign-extension rather than zero-extension for RV64. A future patch adds a hook 
to allow this.

Differential Revision: https://reviews.llvm.org/D52977

llvm-svn: 347973
2018-11-30 09:38:44 +00:00
Craig Topper a2133061c0 [X86] Emit PACKUS directly from the v16i8 LowerMULH code instead of using a shuffle.
llvm-svn: 347967
2018-11-30 08:32:05 +00:00
Craig Topper 6e4b266a0d [X86] Change the pre-sse4.1 code in the v16i8 MULHU lowering to be what we get after DAG combine cleans it up.
Previously we emitted a punpcklbw/punpckhbw to move the byte elements into the upper half of 16 bit elements then shifted right by 8 to zero the upper bits. After DAG combine we end up with punpcklbw/punpckhbw into the lower bits with zeros in the uppers bits and no shifts. So just emit that directly.

llvm-svn: 347966
2018-11-30 08:32:01 +00:00
Sjoerd Meijer ecc7dcb879 [ARM] Don't expand sdiv when optimising for minsize
Don't expand SDIV with an immediate that is a power of 2 if we optimise for
minimum code size. For example:

sdiv %1, i32 4

gets expanded to a sequence of 3 instructions, but this is suboptimal for
minimum code size so instead we just generate a MOV and a SDIV if integer
division is supported.

Differential Revision: https://reviews.llvm.org/D54546

llvm-svn: 347965
2018-11-30 08:14:28 +00:00
Jonas Paulsson b1d014883c [SystemZ::TTI] i8/i16 operands extension costs revisited
Three minor changes to these extra costs:

* For ICmp instructions, instead of adding 2 all the time for extending each
  operand, this is only done if that operand is neither a load or an
  immediate.

* The operands extension costs for divides removed, because we now use a high
  cost already for the divide (20).

* The costs for lhsr/ashr extra costs removed as this did not seem useful.

Review: Ulrich Weigand
https://reviews.llvm.org/D55053

llvm-svn: 347961
2018-11-30 07:09:34 +00:00
Craig Topper 0850e8a6b6 [X86] Fix a couple types in SimplifyDemandedVectorEltsForTargetNode. NFCI
We had a EVT variable capturing the result of getSimpleValueType which returns an MVT. Another place using EVT that could have been MVT. And an 'int' that should be 'unsigned'.

llvm-svn: 347959
2018-11-30 06:23:55 +00:00
Mircea Trofin 5e0b21fb45 Fix build warnings introduced in rL347938
Summary:
Suppressed warnings in release builds due to variable used
only in assert statement.

Subscribers: llvm-commits, eraman, mgorny

Differential Revision: https://reviews.llvm.org/D55100

llvm-svn: 347939
2018-11-30 01:53:17 +00:00
Mircea Trofin f1a49e8525 Revert "Revert r347596 "Support for inserting profile-directed cache prefetches""
Summary:
This reverts commit d8517b96dfbd42e6a8db33c50d1fa1e58e63fbb9.

Fix: correct  the use of DenseMap.

Reviewers: davidxl, hans, wmi

Reviewed By: wmi

Subscribers: mgorny, eraman, llvm-commits

Differential Revision: https://reviews.llvm.org/D55088

llvm-svn: 347938
2018-11-30 01:01:52 +00:00
Thomas Lively 66ea30c7bc [WebAssembly] Expand unavailable integer operations for vectors
Summary:
Expands for vector types all of the integer operations that are
expanded for scalars because they are not supported at all by
WebAssembly.

This CL has no tests because such tests would really be testing the
target-independent expansion, but I'm happy to add tests if reviewers
think it would be helpful.

Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D55010

llvm-svn: 347923
2018-11-29 22:01:01 +00:00
Jonas Devlieghere ccf7d4b4aa Produce an error on non-encodable offsets for darwin ARM scattered relocations.
Scattered ARM relocations for Mach-O's only have 24 bits available to
encode the offset. This is not checked but just truncated and can result
in corrupt binaries after linking because the relocations are applied to
the wrong offset. This patch will check and error out in those
situations instead of emitting a wrong relocation.

Patch by: Sander Bogaert (dzn)

Differential revision: https://reviews.llvm.org/D54776

llvm-svn: 347922
2018-11-29 21:58:23 +00:00
Alex Bradbury 66d9a752b9 [RISCV] Implement codegen for cmpxchg on RV32IA
Utilise a similar ('late') lowering strategy to D47882. The changes to 
AtomicExpandPass allow this strategy to be utilised by other targets which 
implement shouldExpandAtomicCmpXchgInIR.

All cmpxchg are lowered as 'strong' currently and failure ordering is ignored. 
This is conservative but correct.

Differential Revision: https://reviews.llvm.org/D48131

llvm-svn: 347914
2018-11-29 20:43:42 +00:00
Craig Topper 73c1d75d58 [X86] Change the pre-type legalization DAG combine added in r347898 into a custom type legalization operation instead.
This seems to produce the same results on the tests we have.

llvm-svn: 347912
2018-11-29 20:18:58 +00:00
David Stuttard c6603861d8 Revert r347871 "Fix: Add support for TFE/LWE in image intrinsic"
Also revert fix r347876

One of the buildbots was reporting a failure in some relevant tests that I can't
repro or explain at present, so reverting until I can isolate.

llvm-svn: 347911
2018-11-29 20:14:17 +00:00
Francis Visoiu Mistrih 0b8dd4488e [MachineScheduler] Order FI-based memops based on stack direction
It makes more sense to order FI-based memops in descending order when
the stack goes down. This allows offsets to stay "consecutive" and allow
easier pattern matching.

llvm-svn: 347906
2018-11-29 20:03:19 +00:00
Craig Topper 129d529ab3 [SelectionDAG][AArch64][X86] Move legalization of vector MULHS/MULHU from LegalizeDAG to LegalizeVectorOps
I believe we should be legalizing these with the rest of vector binary operations. If any custom lowering is required for these nodes, this will give the DAG combine between LegalizeVectorOps and LegalizeDAG to run on the custom code before constant build_vectors are lowered in LegalizeDAG.

I've moved MULHU/MULHS handling in AArch64 from Lowering to isel. Moving the lowering earlier caused build_vector+extract_subvector simplifications to kick in which made the generated code worse.

Differential Revision: https://reviews.llvm.org/D54276

llvm-svn: 347902
2018-11-29 19:36:17 +00:00
Craig Topper 6cd0b17078 [X86] Add a DAG combine pre type legalization to widen division by constant splat on narrow vectors to avoid scalarization
This is another patch for -x86-experimental-vector-widening. This pre widens narrow division by constants so that we can get pass the legal type check in the generic DAG combiner. Otherwise we end up scalarizing.

I've restricted this to splats for now because it was easy to just call DAG.getConstant. Not sure what we should do for non-splat? Increase the element size?Widen the constant vector by padding with 1?

Differential Revision: https://reviews.llvm.org/D54919

llvm-svn: 347898
2018-11-29 19:13:38 +00:00
Graham Sellers 04f7a4d2d2 [AMDGPU] Add and update scalar instructions
This patch adds support for S_ANDN2, S_ORN2 32-bit and 64-bit instructions and adds splits to move them to the vector unit (for which there is no equivalent instruction). It modifies the way that the more complex scalar instructions are lowered to vector instructions by first breaking them down to sequences of simpler scalar instructions which are then lowered through the existing code paths. The pattern for S_XNOR has also been updated to apply inversion to one input rather than the output of the XOR as the result is equivalent and may allow leaving the NOT instruction on the scalar unit.

A new tests for NAND, NOR, ANDN2 and ORN2 have been added, and existing tests now hit the new instructions (and have been modified accordingly).

Differential: https://reviews.llvm.org/D54714
llvm-svn: 347877
2018-11-29 16:05:38 +00:00
David Stuttard 535c1af0bf Fix: Add support for TFE/LWE in image intrinsic
My change svn-id: 347871 caused a buildbot failure due to an unused
variable def (used in an assert).

Change-Id: Ia882d18bb6fa79b4d7bbfda422b9ea5d23eab336
llvm-svn: 347876
2018-11-29 15:56:36 +00:00
David Stuttard de02e4b1cc Add support for TFE/LWE in image intrinsics
TFE and LWE support requires extra result registers that are written in the
event of a failure in order to detect that failure case.
The specific use-case that initiated these changes is sparse texture support.

This means that if image intrinsics are used with either option turned on, the
programmer must ensure that the return type can contain all of the expected
results. This can result in redundant registers since the vector size must be a
power-of-2.

This change takes roughly 6 parts:
1. Modify the instruction defs in tablegen to add new instruction variants that
can accomodate the extra return values.
2. Updates to lowerImage in SIISelLowering.cpp to accomodate setting TFE or LWE
(where the bulk of the work for these instruction types is now done)
3. Extra verification code to catch cases where intrinsics have been used but
insufficient return registers are used.
4. Modification to the adjustWritemask optimisation to account for TFE/LWE being
enabled (requires extra registers to be maintained for error return value).
5. An extra pass to zero initialize the error value return - this is because if
the error does not occur, the register is not written and thus must be zeroed
before use. Also added a new (on by default) option to ensure ALL return values
are zero-initialized that is required for sparse texture support.
6. Disable the inst_combine optimization in the presence of tfe/lwe (later TODO
for this to re-enable and handle correctly).

There's an additional fix now to avoid a dmask=0

For an image intrinsic with tfe where all result channels except tfe
were unused, I was getting an image instruction with dmask=0 and only a
single vgpr result for tfe. That is incorrect because the hardware
assumes there is at least one vgpr result, plus the one for tfe.

Fixed by forcing dmask to 1, which gives the desired two vgpr result
with tfe in the second one.

The TFE or LWE result is returned from the intrinsics using an aggregate
type. Look in the test code provided to see how this works, but in essence IR
code to invoke the intrinsic looks as follows:

%v = call {<4 x float>,i32} @llvm.amdgcn.image.load.1d.v4f32i32.i32(i32 15,
                                      i32 %s, <8 x i32> %rsrc, i32 1, i32 0)
%v.vec = extractvalue {<4 x float>, i32} %v, 0
%v.err = extractvalue {<4 x float>, i32} %v, 1

Differential revision: https://reviews.llvm.org/D48826

Change-Id: If222bc03642e76cf98059a6bef5d5bffeda38dda
llvm-svn: 347871
2018-11-29 15:21:13 +00:00
Hans Wennborg 6e3be9d12e Revert r347596 "Support for inserting profile-directed cache prefetches"
It causes asserts building BoringSSL. See https://crbug.com/91009#c3 for
repro.

This also reverts the follow-ups:
Revert r347724 "Do not insert prefetches with unsupported memory operands."
Revert r347606 "[X86] Add dependency from X86 to ProfileData after rL347596"
Revert r347607 "Add new passes to X86 pipeline tests"

llvm-svn: 347864
2018-11-29 13:58:02 +00:00
Petr Pavlu e6406d568c [GlobalISel] Make EnableGlobalISel always set when GISel is enabled
Change meaning of TargetOptions::EnableGlobalISel. The flag was
previously set only when a target switched on GlobalISel but it is now
always set when the GlobalISel pipeline is enabled. This makes the flag
consistent with TargetOptions::EnableFastISel and allows its use in
other parts of the compiler to determine when GlobalISel is enabled.

The EnableGlobalISel flag had previouly only one use in
TargetPassConfig::isGlobalISelAbortEnabled(). The method used its value
to determine if GlobalISel was enabled by a target and returned false in
such a case. To preserve the current behaviour, a new flag
TargetOptions::GlobalISelAbort is introduced to separately record the
abort behaviour.

Differential Revision: https://reviews.llvm.org/D54518

llvm-svn: 347861
2018-11-29 12:56:32 +00:00
Andrea Di Biagio 373a4ccf6c [llvm-mca][MC] Add the ability to declare which processor resources model load/store queues (PR36666).
This patch adds the ability to specify via tablegen which processor resources
are load/store queue resources.

A new tablegen class named MemoryQueue can be optionally used to mark resources
that model load/store queues.  Information about the load/store queue is
collected at 'CodeGenSchedule' stage, and analyzed by the 'SubtargetEmitter' to
initialize two new fields in struct MCExtraProcessorInfo named `LoadQueueID` and
`StoreQueueID`.  Those two fields are identifiers for buffered resources used to
describe the load queue and the store queue.
Field `BufferSize` is interpreted as the number of entries in the queue, while
the number of units is a throughput indicator (i.e. number of available pickers
for loads/stores).

At construction time, LSUnit in llvm-mca checks for the presence of extra
processor information (i.e. MCExtraProcessorInfo) in the scheduling model.  If
that information is available, and fields LoadQueueID and StoreQueueID are set
to a value different than zero (i.e. the invalid processor resource index), then
LSUnit initializes its LoadQueue/StoreQueue based on the BufferSize value
declared by the two processor resources.

With this patch, we more accurately track dynamic dispatch stalls caused by the
lack of LS tokens (i.e. load/store queue full). This is also shown by the
differences in two BdVer2 tests. Stalls that were previously classified as
generic SCHEDULER FULL stalls, are not correctly classified either as "load
queue full" or "store queue full".

About the differences in the -scheduler-stats view: those differences are
expected, because entries in the load/store queue are not released at
instruction issue stage. Instead, those are released at instruction executed
stage.  This is the main reason why for the modified tests, the load/store
queues gets full before PdEx is full.

Differential Revision: https://reviews.llvm.org/D54957

llvm-svn: 347857
2018-11-29 12:15:56 +00:00
Nicolai Haehnle 7bed696915 AMDGPU/InsertWaitcnts: Remove the dependence on MachineLoopInfo
Summary:
MachineLoopInfo cannot be relied on for correctness, because it cannot
properly recognize loops in irreducible control flow which can be
introduced by late machine basic block optimization passes. See the new
test case for the reduced form of an example that occurred in practice.

Use a simple fixpoint iteration instead.

In order to facilitate this change, refactor WaitcntBrackets so that it
only tracks pending events and registers, rather than also maintaining
state that is relevant for the high-level algorithm. Various accessor
methods can be removed or made private as a consequence.

Affects (in radv):
- dEQP-VK.glsl.loops.special.{for,while}_uniform_iterations.select_iteration_count_{fragment,vertex}

Fixes: r345719 ("AMDGPU: Rewrite SILowerI1Copies to always stay on SALU")

Reviewers: msearles, rampitec, scott.linder, kanarayan

Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits, hakzsam

Differential Revision: https://reviews.llvm.org/D54231

llvm-svn: 347853
2018-11-29 11:06:26 +00:00
Nicolai Haehnle ab43bf60fe AMDGPU/InsertWaitcnt: Consistently use uint32_t for scores / time points
Summary:
There is one obsolete reference to using -1 as an indication of "unknown",
but this isn't actually used anywhere.

Using unsigned makes robust wrapping checks easier.

Reviewers: msearles, rampitec, scott.linder, kanarayan

Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, llvm-commits, tpr, t-tye, hakzsam

Differential Revision: https://reviews.llvm.org/D54230

llvm-svn: 347852
2018-11-29 11:06:21 +00:00
Nicolai Haehnle f96456c611 AMDGPU/InsertWaitcnt: Remove unused WaitAtBeginning
Reviewers: msearles, rampitec, scott.linder, kanarayan

Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits, hakzsam

Differential Revision: https://reviews.llvm.org/D54229

llvm-svn: 347851
2018-11-29 11:06:18 +00:00
Nicolai Haehnle d1f45dad84 AMDGPU/InsertWaitcnts: Simplify pending events tracking
Summary:
Instead of storing the "score" (last time point) of the various relevant
events, only store whether an event is pending or not.

This is sufficient, because whenever only one event of a count type is
pending, its last time point is naturally the upper bound of all time
points of this count type, and when multiple event types are pending,
the count type has gone out of order and an s_waitcnt to 0 is required
to clear any pending event type (and will then clear all pending event
types for that count type).

This also removes the special handling of GDS_GPR_LOCK and EXP_GPR_LOCK.
I do not understand what this special handling ever attempted to achieve.
It has existed ever since the original port from an internal code base,
so my best guess is that it solved a problem related to EXEC handling in
that internal code base.

Reviewers: msearles, rampitec, scott.linder, kanarayan

Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits, hakzsam

Differential Revision: https://reviews.llvm.org/D54228

llvm-svn: 347850
2018-11-29 11:06:14 +00:00
Nicolai Haehnle ae369d70c3 AMDGPU/InsertWaitcnts: Use foreach loops for inst and wait event types
Summary:
It hides the type casting ugliness, and I happened to have to add a new
such loop (in a later patch).

Reviewers: msearles, rampitec, scott.linder, kanarayan

Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits, hakzsam

Differential Revision: https://reviews.llvm.org/D54227

llvm-svn: 347849
2018-11-29 11:06:11 +00:00
Nicolai Haehnle 1a94cbb3f5 AMDGPU/InsertWaitcnts: Untangle some semi-global state
Summary:
Reduce the statefulness of the algorithm in two ways:

1. More clearly split generateWaitcntInstBefore into two phases: the
   first one which determines the required wait, if any, without changing
   the ScoreBrackets, and the second one which actually inserts the wait
   and updates the brackets.

2. Communicate pre-existing s_waitcnt instructions using an argument to
   generateWaitcntInstBefore instead of through the ScoreBrackets.

To simplify these changes, a Waitcnt structure is introduced which carries
the counts of an s_waitcnt instruction in decoded form.

There are some functional changes:

1. The FIXME for the VCCZ bug workaround was implemented: we only wait for
   SMEM instructions as required instead of waiting on all counters.

2. We now properly track pre-existing waitcnt's in all cases, which leads
   to less conservative waitcnts being emitted in some cases.

     s_load_dword ...
     s_waitcnt lgkmcnt(0)    <-- pre-existing wait count
     ds_read_b32 v0, ...
     ds_read_b32 v1, ...
     s_waitcnt lgkmcnt(0)    <-- this is too conservative
     use(v0)
     more code
     use(v1)

   This increases code size a bit, but the reduced latency should still be a
   win in basically all cases. The worst code size regressions in my shader-db
   are:

 WORST REGRESSIONS - Code Size
 Before After     Delta Percentage
   1724  1736        12    0.70 %   shaders/private/f1-2015/1334.shader_test [0]
   2276  2284         8    0.35 %   shaders/private/f1-2015/1306.shader_test [0]
   4632  4640         8    0.17 %   shaders/private/ue4_elemental/62.shader_test [0]
   2376  2384         8    0.34 %   shaders/private/f1-2015/1308.shader_test [0]
   3284  3292         8    0.24 %   shaders/private/talos_principle/1955.shader_test [0]

Reviewers: msearles, rampitec, scott.linder, kanarayan

Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits, hakzsam

Differential Revision: https://reviews.llvm.org/D54226

llvm-svn: 347848
2018-11-29 11:06:06 +00:00
Craig Topper c2540995ed [X86] Correct comment. NFC
llvm-svn: 347835
2018-11-29 05:56:03 +00:00
Li Jia He bcae407a3c [PowerPC] Fix a conversion is not considered when the ISD::BR_CC node making the instruction selection
Summary:
 A signed comparison of i1 values produces the opposite result to an unsigned one if the condition code 
 includes less-than or greater-than. This is so because 1 is the most negative signed i1 number and the 
 most positive unsigned i1 number. The CR-logical operations used for such comparisons are non-commutative
 so for signed comparisons vs. unsigned ones, the input operands just need to be swapped.

Reviewed By: steven.zhang

Differential Revision: https://reviews.llvm.org/D54825

llvm-svn: 347831
2018-11-29 03:04:39 +00:00
Sanjay Patel 2de209313e [x86] try select simplification for target-specific nodes
This failed to select (which might be a separate bug) in
X86ISelDAGToDAG because we try to create a select node
that can be simplified away after rL347227.

This change avoids the problem by simplifying the SHRUNKBLEND
node sooner. In the test case, we manage to realize that the
true/false values of the select (SHRUNKBLEND) are the same thing,
so it simplifies away completely.

llvm-svn: 347818
2018-11-28 22:51:04 +00:00
Craig Topper 81f1b4a361 [X86] Make X86TTIImpl::getCastInstrCost properly handle the case where AVX512 is enabled, but 512-bit vectors aren't legal.
Unlike most cost model functions this code makes a lot of table lookups without using the results from getTypeLegalizationCost. This means 512-bit vectors can be looked up even when the type isn't legal.

This patch adds a check around the two tables that contain 512-bit types to make sure that neither of the types would be split by type legalization. Meaning 512 bit types are illegal. I wanted to write this in a somewhat generic way that uses type legalization query hooks. But if prefered, I can switch to just using is512BitVector and the subtarget feature.

Differential Revision: https://reviews.llvm.org/D54984

llvm-svn: 347786
2018-11-28 18:11:42 +00:00
Craig Topper d3bb036bc9 [X86] Add some cost model entries for sext/zext for avx512bw
This fixes some of scalarization costs reported for sext/zext using avx512bw. This does not fix all scalarization costs being reported. Just the worst.

I've restricted this only to combinations of types that are legal with avx512bw like v32i1/v64i1/v32i16/v64i8 and conversions between vXi1 and vXi8/vXi16 with legal vXi8/vXi16 result types.

Differential Revision: https://reviews.llvm.org/D54979

llvm-svn: 347785
2018-11-28 18:11:39 +00:00
Craig Topper f3b6f583e2 [X86] Add a combine for back to back VSRAI instructions
Expansion of SIGN_EXTEND_INREG can create a VSRAI instruction. If there is already a VSRAI after it, we should combine them into a larger VSRAI

Differential Revision: https://reviews.llvm.org/D54959

llvm-svn: 347784
2018-11-28 18:03:38 +00:00
Alex Bradbury 893e5bc774 [RISCV] Support .option push and .option pop
This adds support in the RISCVAsmParser the storing of Subtarget feature bits to a stack so that they can be pushed/popped to enable/disable multiple features at once.

Differential Revision: https://reviews.llvm.org/D46424
Patch by Lewis Revill.

llvm-svn: 347774
2018-11-28 16:39:14 +00:00
Francis Visoiu Mistrih 879087ce5b [MachineScheduler] Add support for clustering mem ops with FI base operands
Before this patch, the following stores in `merge_fail` would fail to be
merged, while they would get merged in `merge_ok`:

```
void use(unsigned long long *);
void merge_fail(unsigned key, unsigned index)
{
  unsigned long long args[8];
  args[0] = key;
  args[1] = index;
  use(args);
}
void merge_ok(unsigned long long *dst, unsigned a, unsigned b)
{
  dst[0] = a;
  dst[1] = b;
}
```

The reason is that `getMemOpBaseImmOfs` would return false for FI base
operands.

This adds support for this.

Differential Revision: https://reviews.llvm.org/D54847

llvm-svn: 347747
2018-11-28 12:00:28 +00:00
Francis Visoiu Mistrih d7eebd6d83 [CodeGen][NFC] Make `TII::getMemOpBaseImmOfs` return a base operand
Currently, instructions doing memory accesses through a base operand that is
not a register can not be analyzed using `TII::getMemOpBaseRegImmOfs`.

This means that functions such as `TII::shouldClusterMemOps` will bail
out on instructions using an FI as a base instead of a register.

The goal of this patch is to refactor all this to return a base
operand instead of a base register.

Then in a separate patch, I will add FI support to the mem op clustering
in the MachineScheduler.

Differential Revision: https://reviews.llvm.org/D54846

llvm-svn: 347746
2018-11-28 12:00:20 +00:00
Simon Atanasyan af860d44fe [DebugInfo] Rename EmitDebugThreadLocal back to EmitDebugValue. NFC
This reverts r294500. DwarfCompileUnit::addAddressExpr uses DIEExpr
for PCOffset. In that case the expression is unrelated to thread locals
and so emitting a value of the DIEExpr does not have to always mean
emit-debug-thread-local.

llvm-svn: 347744
2018-11-28 11:48:07 +00:00
Jonas Paulsson 06acb3a236 [SystemZ::TTI] Improve cost for compare of i64 with extended i32 load
CGF/CLGF compares an i64 register with a sign/zero extended loaded i32 value
in memory.

This patch makes such a load considered foldable and so gets a 0 cost.

Review: Ulrich Weigand
https://reviews.llvm.org/D54944

llvm-svn: 347735
2018-11-28 08:58:27 +00:00
Jonas Paulsson d6b7aca911 [SystemZ::TTI] Improve costs for i16 add, sub and mul against memory.
AH, SH and MH costs are already covered in the cases where LHS is 32 bits and
RHS is 16 bits of memory sign-extended to i32.

As these instructions are also used when LHS is i16, this patch recognizes
that the loads will get folded then as well.

Review: Ulrich Weigand
https://reviews.llvm.org/D54940

llvm-svn: 347734
2018-11-28 08:31:50 +00:00
Jonas Paulsson 011a503f25 [SystemZ::TTI] Improved cost values for comparison against memory.
Single instructions exist for i8 and i16 comparisons of memory against a
small immediate.

This patch makes sure that if the load in these cases has a single user (the
ICmp), it gets a 0 cost (folded), and also that the ICmp gets a cost of 1.

Review: Ulrich Weigand
https://reviews.llvm.org/D54897

llvm-svn: 347733
2018-11-28 08:08:05 +00:00
Jonas Paulsson 5da8e432b9 [SystemZ::TTI] Return zero cost for scalar load/store connected with a bswap.
Since byte-swapping loads and stores are supported, a 'load -> bswap' or
'bswap -> store' sequence should have the cost of one.

Review: Ulrich Weigand
https://reviews.llvm.org/D54870

llvm-svn: 347732
2018-11-28 07:52:34 +00:00
Mircea Trofin 35f0e5cd2d Do not insert prefetches with unsupported memory operands.
Summary:
Ignore advices where the memory operand of the 'anchor' instruction
uses unsupported register types.

Reviewers: davidxl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D54983

llvm-svn: 347724
2018-11-28 01:08:45 +00:00
Evandro Menezes 9ef79c884a [TableGen] Refactor macro names (NFC)
Make the names for the macros for `TargetInstrInfo` uniform.

llvm-svn: 347706
2018-11-27 20:58:27 +00:00
Craig Topper 7ceef03dc9 [X86] Replace an APInt that is guaranteed to be 8-bits with just an 'unsigned'
We're already mixing this APInt with other 'unsigned' variables. This allows us to use regular comparison operators instead of needing to use APInt::ult or APInt::uge. And it removes a later conversion from APInt to unsigned.

I might be adding another combine to this function and this will probably simplify the logic required for that.

llvm-svn: 347684
2018-11-27 18:24:56 +00:00
Craig Topper 5fb34b5498 [X86] Add cascade lake arch in X86 target.
This is skylake-avx512 with the addition of avx512vnni ISA.

Patch by Jianping Chen

Differential Revision: https://reviews.llvm.org/D54785

llvm-svn: 347681
2018-11-27 18:05:00 +00:00
Stanislav Mekhanoshin 443a7f9788 [AMDGPU] Disable DAG combine at -O0
Differential Revision: https://reviews.llvm.org/D54358

llvm-svn: 347659
2018-11-27 15:13:37 +00:00
Craig Topper 196fd31e33 [X86] Use getUnpackl/getUnpackh instead of directly creating UNPCKL/UNPCKH nodes.
llvm-svn: 347642
2018-11-27 06:24:56 +00:00
Craig Topper 4325505f05 [X86] Prevent DAG combine from folding a bitcast from vXi1 to iX with a store on pre-AVX512 targets.
If we fold the bitcast into the store we'll end up creating a truncating store to vXi1 that will get scalarized. Instead allow the bitcast to be turned into a movmsk.

We probably need to do something if the store itself is a vXi1 type, but I'll leave that til a testcase appears.

llvm-svn: 347632
2018-11-27 02:57:27 +00:00
Sterling Augustine 9cc1ffadc5 Notify the linker when a TU compiled with split-stack has a function without a prologue.
More context here: https://go-review.googlesource.com/c/go/+/148819/

llvm-svn: 347614
2018-11-26 23:26:31 +00:00
David Blaikie 1fecbec5fa AArch64ISelLowering: Remove a return-of-assignment to allow NRVO
Patch by Arthur O'Dwyer!

llvm-svn: 347609
2018-11-26 22:57:18 +00:00
Mircea Trofin 183df14520 Add new passes to X86 pipeline tests
Summary: Fixes test failures introduced by rL347596.

Reviewers: davidxl

Reviewed By: davidxl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D54916

llvm-svn: 347607
2018-11-26 22:49:17 +00:00
Fangrui Song 82ddb8154e [X86] Add dependency from X86 to ProfileData after rL347596
llvm-svn: 347606
2018-11-26 22:16:19 +00:00
Evandro Menezes 6a38a5effe [AArch64] Refactor the scheduling predicates (3/3) (NFC)
Refactor the scheduling predicates based on `MCInstPredicate`.  In this
case, `AArch64InstrInfo::hasExtendedReg()`.

Differential revision: https://reviews.llvm.org/D54822

llvm-svn: 347599
2018-11-26 21:47:46 +00:00
Evandro Menezes 56368c6fa5 [AArch64] Refactor the scheduling predicates (2/3) (NFC)
Refactor the scheduling predicates based on `MCInstPredicate`.  In this
case, `AArch64InstrInfo::hasShiftedReg()`.

Differential revision: https://reviews.llvm.org/D54820

llvm-svn: 347598
2018-11-26 21:47:41 +00:00
Evandro Menezes b02ac8bd21 [AArch64] Refactor the scheduling predicates (1/3) (NFC)
Refactor the scheduling predicates based on `MCInstPredicate`.  In this
case, `AArch64InstrInfo::isScaledAddr()`

Differential revision: https://reviews.llvm.org/D54777

llvm-svn: 347597
2018-11-26 21:47:28 +00:00
Mircea Trofin cfbc1788d6 Support for inserting profile-directed cache prefetches
Summary:
Support for profile-driven cache prefetching (X86)

This change is part of a larger system, consisting of a cache prefetches recommender, create_llvm_prof (https://github.com/google/autofdo), and LLVM.

A proof of concept recommender is DynamoRIO's cache miss analyzer. It processes memory access traces obtained from a running binary and identifies patterns in cache misses. Based on them, it produces a csv file with recommendations. The expectation is that, by leveraging such recommendations, we can reduce the amount of clock cycles spent waiting for data from memory. A microbenchmark based on the DynamoRIO analyzer is available as a proof of concept: https://goo.gl/6TM2Xp.

The recommender makes prefetch recommendations in terms of:

* the binary offset of an instruction with a memory operand;
* a delta;
* and a type (nta, t0, t1, t2)

meaning: a prefetch of that type should be inserted right before the instrution at that binary offset, and the prefetch should be for an address delta away from the memory address the instruction will access.

For example:

0x400ab2,64,nta

and assuming the instruction at 0x400ab2 is:

movzbl (%rbx,%rdx,1),%edx

means that the recommender determined it would be beneficial for a prefetchnta instruction to be inserted right before this instruction, as such:

prefetchnta 0x40(%rbx,%rdx,1)
movzbl (%rbx, %rdx, 1), %edx

The workflow for prefetch cache instrumentation is as follows (the proof of concept script details these steps as well):

1. build binary, making sure -gmlt -fdebug-info-for-profiling is passed. The latter option will enable the X86DiscriminateMemOps pass, which ensures instructions with memory operands are uniquely identifiable (this causes ~2% size increase in total binary size due to the additional debug information).

2. collect memory traces, run analysis to obtain recommendations (see above-referenced DynamoRIO demo as a proof of concept).

3. use create_llvm_prof to convert recommendations to reference insertion locations in terms of debug info locations.

4. rebuild binary, using the exact same set of arguments used initially, to which -mllvm -prefetch-hints-file=<file> needs to be added, using the afdo file obtained at step 3.

Note that if sample profiling feedback-driven optimization is also desired, that happens before step 1 above. In this case, the sample profile afdo file that was used to produce the binary at step 1 must also be included in step 4.

The data needed by the compiler in order to identify prefetch insertion points is very similar to what is needed for sample profiles. For this reason, and given that the overall approach (memory tracing-based cache recommendation mechanisms) is under active development, we use the afdo format as a syntax for capturing this information. We avoid confusing semantics with sample profile afdo data by feeding the two types of information to the compiler through separate files and compiler flags. Should the approach prove successful, we can investigate improvements to this encoding mechanism.

Reviewers: davidxl, wmi, craig.topper

Reviewed By: davidxl, wmi, craig.topper

Subscribers: davide, danielcdh, mgorny, aprantl, eraman, JDevlieghere, llvm-commits

Differential Revision: https://reviews.llvm.org/D54052

llvm-svn: 347596
2018-11-26 21:36:18 +00:00
Matt Arsenault 88ce3dcbc8 AMDGPU: Record SGPR spills when restoring too
It's possible in some cases to have a restore present
without a corresponding spill. Due to an apparent bug
in D54366 <https://reviews.llvm.org/D54366>, only the
restore for a register was emitted. It's probably
always a bug for this to happen, but due to how SGPR
spilling is implemented, this makes the issues appear
worse than it is.

llvm-svn: 347595
2018-11-26 21:28:40 +00:00
Craig Topper b955bf382c [LegalizeVectorTypes][X86][ARM][AArch64][PowerPC] Don't use SplitVecOp_TruncateHelper for FP_TO_SINT/UINT.
SplitVecOp_TruncateHelper tries to promote the result type while splitting FP_TO_SINT/UINT. It then concatenates the result and introduces a truncate to the original result type. But it does this without inserting the AssertZExt/AssertSExt that the regular result type promotion would insert. Nor does it turn FP_TO_UINT into FP_TO_SINT the way normal result type promotion for these operations does. This is bad on X86 which doesn't support FP_TO_SINT until AVX512.

This patch disables the use of SplitVecOp_TruncateHelper for these operations and just lets normal promotion handle it. I've tweaked a couple things in X86ISelLowering to avoid a few obvious regressions there. I believe all the changes on X86 are improvements. The other targets look neutral.

Differential Revision: https://reviews.llvm.org/D54906

llvm-svn: 347593
2018-11-26 21:12:39 +00:00
Than McIntosh 30c804bbb1 [CodeGen] Support custom format of stack maps
Summary:
Add a hook to the GCMetadataPrinter for emitting stack maps in
custom format. The hook will be called at stack map generation
time. The default stack map format is used if there is no hook.

For this to be useful a few data structures and accessors are
exposed from the StackMaps class, so the custom printer can
access the stack map data.

This patch authored by Cherry Zhang <cherryyz@google.com>.

Reviewers: thanm, apilipenko, reames

Reviewed By: reames

Subscribers: reames, apilipenko, nemanjai, javed.absar, kbarton, jsji, llvm-commits

Differential Revision: https://reviews.llvm.org/D53892

llvm-svn: 347584
2018-11-26 18:43:48 +00:00
Matt Arsenault 105fc1a5f3 AMDGPU: Don't optimize exec masks at -O0
llvm-svn: 347573
2018-11-26 17:02:02 +00:00
Matt Arsenault 6384d9ea31 AMDGPU: Only add implicit super-reg def for first subreg
llvm-svn: 347572
2018-11-26 17:02:01 +00:00
Sanjay Patel d31220e0de [x86] promote all multiply i8 by constant to i32
We have these 2 "isDesirable" promotion hooks (I'm not sure why we need both of them, but that's 
independent of this patch), and we can adjust them to promote "mul i8 X, C" to i32. Then, all of 
our existing LEA and other multiply expansion magic happens as it would for i32 ops.

Some of the test diffs show that we could end up with an actual 32-bit mul instruction here 
because we choose not to expand to simpler ops. That instruction could be slower depending on the 
subtarget. On the plus side, this means we don't need a separate instruction to load the constant 
operand and possibly an extra instruction to move the result. If we need to tune mul i32 further, 
we could add a later transform that tries to shrink it back to i8 based on subtarget timing.

I did not bother to duplicate all of the 32-bit test file RUNs and target settings that exist to 
test whether LEA expansion is cheap or not. The diffs here assume a default target, so that means 
LEA is generally cheap.

Differential Revision: https://reviews.llvm.org/D54803

llvm-svn: 347557
2018-11-26 15:22:30 +00:00
Diana Picus 0528e2cfb3 [ARM GlobalISel] Support G_CTLZ and G_CTLZ_ZERO_UNDEF
We can now select CLZ via the TableGen'erated code, so support G_CTLZ
and G_CTLZ_ZERO_UNDEF throughout the pipeline for types <= s32.

Legalizer:
If the CLZ instruction is available, use it for both G_CTLZ and
G_CTLZ_ZERO_UNDEF. Otherwise, use a libcall for G_CTLZ_ZERO_UNDEF and
lower G_CTLZ in terms of it.

In order to achieve this we need to add support to the LegalizerHelper
for the legalization of G_CTLZ_ZERO_UNDEF for s32 as a libcall (__clzsi2).

We also need to allow lowering of G_CTLZ in terms of G_CTLZ_ZERO_UNDEF
if that is supported as a libcall, as opposed to just if it is Legal or
Custom. Due to a minor refactoring of the helper function in charge of
this, we will also allow the same behaviour for G_CTTZ and G_CTPOP.
This is not going to be a problem in practice since we don't yet have
support for treating G_CTTZ and G_CTPOP as libcalls (not even in
DAGISel).

Reg bank select:
Map G_CTLZ to GPR. G_CTLZ_ZERO_UNDEF should not make it to this point.

Instruction select:
Nothing to do.

llvm-svn: 347545
2018-11-26 11:07:02 +00:00
Sam Parker 5338f7aae4 [ARM] Prevent parallel macs for unsigned values
Both zext and sext are currently allowed during the search for narrow
sequences and sexts operands are later added to the mac candidates.
But operands of muls are also added, without checking whether they're
sext or zext, which means we can generate a signed smlad when we
shouldn't.

Differential Revision: https://reviews.llvm.org/D54790

llvm-svn: 347542
2018-11-26 10:22:55 +00:00
Kang Zhang 840e98f9f1 Revert "[PowerPC] Fix inconsistent ImmMustBeMultipleOf for same instruction"
This reverts commits r347532. Forget add the option 
-mtriple powerpc64-unknown-linux-gnu. So other platform is error except
for PowerPC.

llvm-svn: 347534
2018-11-26 07:15:31 +00:00
Kang Zhang e98d4f511c [PowerPC] Fix inconsistent ImmMustBeMultipleOf for same instruction
Summary:
There are 4 instructions which have Inconsistent ImmMustBeMultipleOf in the
function PPCInstrInfo::instrHasImmForm, they are LFS, LFD, STFS, STFD.
These four instructions should set the ImmMustBeMultipleOf to 1 instead of 4.

Reviewed By: nemanjai

Differential Revision: https://reviews.llvm.org/D54738

llvm-svn: 347532
2018-11-26 06:03:25 +00:00
Sanjay Patel 7336e7c67a [x86] limit transform for select-of-fp-constants
This should likely be adjusted to limit this transform
further, but these diffs should be clear wins.

If we have blendv/conditional move, then we should assume 
those are cheap ops. The loads become independent of the
compare, so those can be speculated before we need to use 
the values in the blend/mov.

llvm-svn: 347526
2018-11-25 17:27:02 +00:00
Fangrui Song 220f2a9cac [ARM] Add dependency from ARMAsmParser to ARMAsmPrinter after r347494
This fixes -DBUILD_SHARED_LIBS=on

llvm-svn: 347506
2018-11-23 23:43:46 +00:00
Luke Cheeseman 6db3a6a4a7 Revert r347490 as it breaks address sanitizer builds
llvm-svn: 347499
2018-11-23 17:13:06 +00:00
Oliver Stannard 173bc2bb7f [ARM][AsmParser] Improve debug printing of parsed asm operands
In ARMOperand::print:
- Print human-readable register names, instead of numbers.
- Print the correct names for IT condition masks (these were in the wrong order
  before).
- Print all parts of memory operands, not just the base register.

This makes the output of llvm-mc -show-inst-operands more readable.

Differential revision: https://reviews.llvm.org/D54850

llvm-svn: 347494
2018-11-23 14:27:21 +00:00
Luke Cheeseman d6dbd64104 Revert r343341
- Cannot reproduce the build failure locally and the build logs have
  been deleted.

llvm-svn: 347490
2018-11-23 11:01:47 +00:00
John Brawn d6e0ebea10 [AArch64] Fix SelectionDAG infinite loop for v1i64 SCALAR_TO_VECTOR
A consequence of r347274 is that SCALAR_TO_VECTOR can be converted into
BUILD_VECTOR by SimplifyDemandedBits, but LowerBUILD_VECTOR can turn
BUILD_VECTOR into SCALAR_TO_VECTOR so we get an infinite loop.

Fix this by making LowerBUILD_VECTOR not do this transformation for those
vectors that would get transformed back, i.e. BUILD_VECTOR of a single-element
constant vector. Doing that means we get a DUP, which we then need to recognise
in ISel as a copy.

llvm-svn: 347456
2018-11-22 11:45:23 +00:00
Jonas Paulsson 96782c2c0b [SystemZTTIImpl] Give correct cost values for vector bswap intrinsics.
Implement getIntrinsicInstrCost() and return costs reflecting that bswap can
be done with a vperm per vector register.

Review: Ulrich Weigand
https://reviews.llvm.org/D54789

llvm-svn: 347445
2018-11-22 07:17:29 +00:00
Stefan Pintilie 9d6445d34c [PowerPC][NFC] Split PPCMCCodeEmitter into header and cpp file.
This is further cleanup for PPCMCCodeEmitter. The class had been contained
within the cpp file alone. Now it has been split up between a header file and
a cpp file which allows other classes to make use of the functions in this class
if required.

llvm-svn: 347428
2018-11-21 21:23:50 +00:00
Stefan Pintilie 46e3cd76e2 [PowerPC][NFC] Minor Code Cleaup for PPCMCCodeEmitter.
llvm-svn: 347422
2018-11-21 20:47:59 +00:00
Sanjay Patel cadf62f360 [x86] fix predicate for avoiding vblendv
It only makes sense to produce the logic ops when 1 of the
constants is +0.0. Otherwise, go with vblendv to reduce code.

llvm-svn: 347403
2018-11-21 18:02:50 +00:00
Vladimir Stefanovic 64ad1cf24b [mips][mc] Add basic support for R_MIPS_JALR/R_MICROMIPS_JALR
R_MIPS_JALR/R_MICROMIPS_JALR can now be parsed in .s files and emitted to .o.
They are still not generated with JALR.

Differential revision: https://reviews.llvm.org/D54721

llvm-svn: 347398
2018-11-21 16:38:34 +00:00
Michal Gorny 71e902101d [nios2] Add missing Nios2CodeGen -> Nios2AsmPrinter linkage
Add missing linkage from Nios2CodeGen library to Nios2AsmPrinter
library.  The missing dependency causes shared-lib build to fail with
the following reason:

  lib/Target/Nios2/CMakeFiles/LLVMNios2CodeGen.dir/Nios2AsmPrinter.cpp.o: In function `(anonymous namespace)::Nios2AsmPrinter::PrintAsmMemoryOperand(llvm::MachineInstr const*, unsigned int, unsigned int, char const*, llvm::raw_ostream&)':
  Nios2AsmPrinter.cpp:(.text._ZN12_GLOBAL__N_115Nios2AsmPrinter21PrintAsmMemoryOperandEPKN4llvm12MachineInstrEjjPKcRNS1_11raw_ostreamE+0x2b): undefined reference to `llvm::Nios2InstPrinter::getRegisterName(unsigned int)'
  lib/Target/Nios2/CMakeFiles/LLVMNios2CodeGen.dir/Nios2AsmPrinter.cpp.o: In function `(anonymous namespace)::Nios2AsmPrinter::PrintAsmOperand(llvm::MachineInstr const*, unsigned int, unsigned int, char const*, llvm::raw_ostream&)':
  Nios2AsmPrinter.cpp:(.text._ZN12_GLOBAL__N_115Nios2AsmPrinter15PrintAsmOperandEPKN4llvm12MachineInstrEjjPKcRNS1_11raw_ostreamE+0x97): undefined reference to `llvm::Nios2InstPrinter::getRegisterName(unsigned int)'
  collect2: error: ld returned 1 exit status

Differential Revision: https://reviews.llvm.org/D47810

llvm-svn: 347387
2018-11-21 11:25:01 +00:00
Simon Pilgrim 66bae9aee8 [X86][AVX] Remove BROADCAST if we only need the 0'th element
We don't catch this with target shuffle simplification if the src/dst types are different.

llvm-svn: 347386
2018-11-21 11:00:09 +00:00
Craig Topper e9b4001a82 [X86] In getScalarMaskingNode, replace scalar_to_vector with a bitcast to v8i1 and an extract_subvector to convert i8 to v1i1.
The bitcast can be nicely merged with any i8 loads that exist for argument passing in 32 mode for example.

llvm-svn: 347380
2018-11-21 07:01:22 +00:00
Nemanja Ivanovic 5cf902ccd4 [PowerPC] Do not use vectors to codegen bswap with Altivec turned off
We have efficient codegen on P9 for lowering bswap that involves moving
the value into a vector reg and moving it back. However, the check under
which we custom lowered it did not adequately reflect the actual requirements.
It required only that the subtarget be an implementation of ISA 3.0 since all
compliant implementations have to provide the vector instructions.
However, the kernel builds have a valid use case for -mno-altivec -mcpu=pwr9
(i.e. don't emit vector code, don't have to save vector regs for context
switch). So we should require the correct features for this lowering.
Fixes https://bugs.llvm.org/show_bug.cgi?id=39334

llvm-svn: 347376
2018-11-21 02:53:50 +00:00
Craig Topper 27a5896fe8 [X86] Correct 256 vpmovzx/vpmovsx isel patterns to check HasAVX2 instead of HasAVX to prevent fast-isel from using them incorrectly.
These are AVX2 instructions, but have been incorrectly marked in tablegen for a while. This wasn't a problem until r346784 switched the patterns to use target independent ISD opcodes. This made the patterns visible to fast isel.

Fixes PR39733

llvm-svn: 347375
2018-11-21 01:39:38 +00:00
Craig Topper aa52ee2770 [X86] Emit a PACKUS instead of a VECTOR_SHUFFLE from LowerTRUNCATE for v16i16->v16i8.
We can't guarantee that demanded bits passing through the vector shuffle won't cause the AND in front of this to be removed. This would prevent the PACKUS from being matched during shuffle lowering.

Unfortunately, this adds a packuswb to one of the vector-reduce-mul.ll tests since we were removing the shuffle via SimplifyDemandedVectorElts. We appear to have similar issues with vpmovwb on the same test case on other targets.

llvm-svn: 347361
2018-11-20 22:57:48 +00:00
Craig Topper 24b346da42 [X86] Emit a single shuffle for the v16i8->v4i32 step of a SIGN_EXTEND_VECTOR_INREG lowering on pre-sse4.1 targets.
Previously we emitted to separate shuffles, one for unpcklbw and one for unpcklwd. Instead emit a single shuffle equivalent to both of the original shuffles. Shuffle lowering seems able to handle it. This avoids a bitcast between the two shuffles which seems helpful to DAG combine.

Remove the custom type legalization for v8i8->v8i32. I had put that in to avoid some almost duplicate punpcklbw instructions I was seeing, but this lowering change seems to fix that. It also fixes some duplicate shuffles seen in vector-sext.ll

llvm-svn: 347348
2018-11-20 21:21:52 +00:00
Sam Clegg 4791a668f5 [WebAssembly] WebAssemblyLowerEmscriptenEHSjLj: use getter/setter for accessing tempRet0
Rather than assuming that `tempRet0` exists in linear memory only assume
the getter/setter functions exist.  This avoids conflicting with
binaryen which declares a wasm global for this purpose and defines it's
own getter and setter for that.

The other advantage of doing things this way is that it leaving
it up to the linker/finalizer to decide how to actually store this
temporary.  As it happens binaryen uses a wasm global which is more
appropriate since it is thread safe.

This also allows us to change the way this is stored in the future
(memory, TLS memory, wasm global) without modifying LLVM.

This is part of a 4 part change:
LLVM: https://reviews.llvm.org/D53240
fastcomp: https://github.com/kripken/emscripten-fastcomp/pull/237
emscripten: https://github.com/kripken/emscripten/pull/7358
binaryen: https://github.com/WebAssembly/binaryen/pull/1709

Differential Revision: https://reviews.llvm.org/D53240

llvm-svn: 347340
2018-11-20 19:25:07 +00:00
Jinsong Ji 9a0ed20072 [PowerPC] Add Itineraries for STWU/STWUX etc
When doing some instruction scheduling work, we noticed some missing itineraries.

Before we switch to machine scheduler, those missing itineraries might not have impact to actually scheduling, 
because we can still get same latency due to default values.

With machine scheduler, however, itineraries will have impact to scheduling.
eg: NumMicroOps will default to be 0 if there is NO itineraries for specific instruction class.
And most of the instruction class with itineraries will have NumMicroOps default to 1.

This will has impact on the count of RetiredMOps, affects the Pending/Available Queue, 
then causing different scheduling or suboptimal scheduling further.

This patch is for STWU/STWUX (IIC_LdStStoreUpd ) for P8.

Since there are already multiple IIC for store update, this patch also merge
IIC_LdStSTDU/IIC_LdStStoreUpd to IIC_LdStSTU
IIC_LdStSTDUX to IIC_LdStSTUX

and we add a new testcase in https://reviews.llvm.org/D54699 to show the difference.

Differential Revision: https://reviews.llvm.org/D54700

llvm-svn: 347311
2018-11-20 15:11:42 +00:00
Simon Pilgrim c9cc6cca42 Fix MSVC 'truncation of constant value' warning. NFCI.
llvm-svn: 347308
2018-11-20 14:29:40 +00:00
Simon Pilgrim ee8b96f253 [X86][SSE] Add computeKnownBits/ComputeNumSignBits support for PACKSS/PACKUS instructions.
Pull out getPackDemandedElts demanded elts remapping helper from computeKnownBitsForTargetNode and use in computeKnownBits/ComputeNumSignBits.

llvm-svn: 347303
2018-11-20 13:23:37 +00:00
Simon Pilgrim ed7e2fda18 [X86][SSE] XFormVExtractWithShuffleIntoLoad - getVectorShuffle won't accept SM_SentinelZero
Noticed while working on improving demanded elts target shuffle shuffle combining

llvm-svn: 347302
2018-11-20 12:17:50 +00:00
Simon Pilgrim a6fb85ffa7 [X86][SSE] Lower immediately to PACKUS instead of VECTOR_SHUFFLE.
As discussed on rL347240, this avoids some regressions on D54679 and also helps some combines to kick in a bit earlier.

llvm-svn: 347300
2018-11-20 11:46:37 +00:00
Simon Pilgrim 7198506ba8 [X86][SSE] Add SimplifyDemandedVectorElts support for PACKSS/PACKUS instructions.
As discussed on rL347240.

llvm-svn: 347299
2018-11-20 11:09:46 +00:00
Craig Topper 17fa42a69b [X86] Preserve undef information when creating a punpckl/hbw from a v16i8 where all the even or odd elements are undef.
Previously if V2 was unused we ended up using V1 for both inputs as part of the code that follows the new code. By using lowerVectorShuffleWithUNPCK we keep the undef nature of V2 in the output.

As near as I can tell this makes v16i8 behavior consistent with every other VT now.

This does mean that we give the register allocator freedom to fill in random registers now and create false dependencies. But like I said we're already doing that for other types.

llvm-svn: 347296
2018-11-20 09:04:01 +00:00
Craig Topper b06d1aa3a1 [X86] Add custom type legalization for v8i8->v8i32 sign extend pre-SSE4.1
This helps with a future patch and makes us less reliant on DAG combine merging shuffles.

llvm-svn: 347295
2018-11-20 09:03:58 +00:00
Craig Topper c733c7bf94 [X86] Replace more calls to getZeroVector with regular getConstant.
getZeroVector produces a specifically canonicalized zero vector, but we can just let DAG legalization take care of it.

The test changes are because MULH lowering happens later than it should and this change gave us the opportunity to constant fold away a multiply during a DAG combine before the build_vector got legalized with a bitcast.

llvm-svn: 347290
2018-11-20 06:54:01 +00:00
Nemanja Ivanovic 9b393909e2 [PowerPC] Don't combine to bswap store on 1-byte truncating store
Turns out that there was no check for a store that truncates down
to a single byte when combining a (store (bswap...)) into a byte-swapping
store. This patch just adds that check.

Fixes https://bugs.llvm.org/show_bug.cgi?id=39478.

llvm-svn: 347288
2018-11-20 04:42:31 +00:00
Craig Topper 808d0dd689 [X86] Rename combineVSZext->combineExtendVectorInreg. NFC
Now that we no longer have target specific vector extend nodes let's make the function name match the nodes we do use.

llvm-svn: 347268
2018-11-19 22:18:47 +00:00
Konstantin Zhuravlyov 700b1ef54d AMDGPU: Fix V_FMA_F16 selection on GFX9
GFX9 should select opsel version.

Differential Revision: https://reviews.llvm.org/D54545

llvm-svn: 347265
2018-11-19 21:10:16 +00:00
Stanislav Mekhanoshin 8bafbae889 [AMDGPU] Restored selection of scalar_to_vector (v2x16)
This works if DAG combiner is enabled, but without combining
we cannot select scalar_to_vector of <2 x half> and <2 x i16>.

Differential Revision: https://reviews.llvm.org/D54718

llvm-svn: 347259
2018-11-19 19:58:13 +00:00
Craig Topper a5e0380c30 [X86][CostModel] Don't lookup intrinsic cost tables if the intrinsic isn't one we care about
We're seeing some issues internally where we sent some intrinsics into the cost model that the getTypeLegalizationCost call fails on, but X86 specific tables don't care about. Our base class implementation takes care of them. We'd just like X86 backend to ignore them.

This patch makes sure the switch returned something X86 cares about and skips the table lookups and type legalization call if not. Probably more efficient too since we don't go scanning the tables for every intrinsic we could possibly see.

Differential Revision: https://reviews.llvm.org/D54711

llvm-svn: 347248
2018-11-19 18:57:31 +00:00
Simon Pilgrim c4861ab170 [X86][SSE] Remove unnecessary bit-and in pshufb vector ctlz (PR39703)
SSE PSHUFB vector ctlz lowering works at the i4 nibble level. As detailed in PR39703, we were masking the lower nibble off but we only actually use it in the case where the upper nibble is known to be zero, making it safe to remove the mask and save an instruction.

Differential Revision: https://reviews.llvm.org/D54707

llvm-svn: 347242
2018-11-19 18:40:59 +00:00
Craig Topper 311bbcd535 [X86] Attempt to improve v32i8/v64i8 multiply lowering by applying the v16i8 non-avx2 algorithm to each 128-bit lane.
Previously we split the vectors in half to allow the two halves to be any extended then concatenated the results back together.

This patch instead instead extends the v16i8 sse algorithm to extend half of each 128-bit lane using punpcklbw/punpckhbw. Multiplies all the low half lanes and high half lanes together in separate operations. Then merges the half lane results back together using packuswb.

Unfortunately, some of the cases in vector-reduce-mul.ll regress because we aren't narrowing the vector width of the multiplies as we reduce. The splitting was somewhat making up for that before by causing halves to be discarded after the split.

Differential Revision: https://reviews.llvm.org/D54668

llvm-svn: 347240
2018-11-19 18:32:53 +00:00
Fangrui Song d83a5526d5 [AMDGPU] Fix -Wunused-variable
llvm-svn: 347234
2018-11-19 17:54:27 +00:00
Stanislav Mekhanoshin 054f8101f1 [AMDGPU] Convert insert_vector_elt into set of selects
This allows to avoid scratch use or indirect VGPR addressing for
small vectors.

Differential Revision: https://reviews.llvm.org/D54606

llvm-svn: 347231
2018-11-19 17:39:20 +00:00
Wouter van Oortmerssen 49482f824a [WebAssembly] replaced .param/.result by .functype
Summary:
This makes it easier/cleaner to generate a single signature from
this directive. Also:
- Adds the symbol name, such that we don't depend on the location
  of this directive anymore.
- Actually constructs the signature in the assembler, and make the
  assembler own it.
- Refactor the use of MVT vs ValType in the streamer and assembler
  to require less conversions overall.
- Changed 700 or so tests to use it.

Reviewers: sbc100, dschuff

Subscribers: jgravelle-google, eraman, aheejin, sunfish, jfb, llvm-commits

Differential Revision: https://reviews.llvm.org/D54652

llvm-svn: 347228
2018-11-19 17:10:36 +00:00
David Stuttard be3d7ba9fb [AMDGPU] Derive GCNSubtarget from MF to get overridden target features
Summary:
AMDGPUAsmPrinter has a getSTI function that derives a GCNSubtarget from the
TM. However, this means that overridden target features are not detected and can
result in incorrect behaviour.

Switch to using STM which is a GCNSubtarget derived from the MF (used elsewhere
in the same function).

Change-Id: Ib6328ad667b7fcdc87e9c06344e59859207db9b0

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D54301

llvm-svn: 347221
2018-11-19 15:44:20 +00:00
Martin Elshuber fef3036d37 Subject: [PATCH] [CodeGen] Add pass to combine interleaved loads.
This patch defines an interleaved-load-combine pass. The pass searches
for ShuffleVector instructions that represent interleaved loads. Matches are
converted such that they will be captured by the InterleavedAccessPass.

The pass extends LLVMs capabilities to use target specific instruction
selection of interleaved load patterns (e.g.: ld4 on Aarch64
architectures).

Differential Revision: https://reviews.llvm.org/D52653

llvm-svn: 347208
2018-11-19 14:26:10 +00:00
Nicolai Haehnle c548d91419 AMDGPU/InsertWaitcnts: Some more const-correctness
Reviewers: msearles, rampitec, scott.linder, kanarayan

Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits, hakzsam

Differential Revision: https://reviews.llvm.org/D54225

llvm-svn: 347192
2018-11-19 12:03:11 +00:00
Sam Parker e7c42dd7e2 [ARM] Remove trunc sinks in ARM CGP
Truncs are treated as sources if their produce a value of the same
type as the one we currently trying to promote. Truncs used to be
considered as a sink if their operand was the same value type.
    
We now allow smaller types in the search, so we should search through
truncs that produce a smaller value. These truncs can then be
converted to an AND mask.
    
This leaves sinks as being:
  - points where the value in the register is being observed, such as
    an icmp, switch or store.
  - points where value types have to match, such as calls and returns.
  - zext are included to ease the transformation and are generally
    removed later on.
    
During this change, it also became apart from truncating sinks was
broken: if a sink used a source, its type information had already
been lost by the time the truncation happens. So I've changed the
method of caching the type information.

Differential Revision: https://reviews.llvm.org/D54515

llvm-svn: 347191
2018-11-19 11:34:40 +00:00
Anton Korobeynikov 4df19b75c0 [MSP430] Optimize srl/sra in case of A >> (8 + N)
There is no variable-length shifts on MSP430. Therefore
"eat" 8 bits of shift via bswap & ext.

Path by Kristina Bessonova!

Differential Revision: https://reviews.llvm.org/D54623

llvm-svn: 347187
2018-11-19 10:43:02 +00:00
Craig Topper 8b22bcd39f [X86] Use a pcmpgt with 0 instead of psrad 31, to fill elements with the sign bit in v4i32 MULH lowering.
The shift requires a copy to avoid clobbering a register. Comparing with 0 uses an xor to produce 0 that will be overwritten with the compare results. So still requires 2 instructions, but should be one byte shorter since it doesn't need to encode an immediate.

llvm-svn: 347185
2018-11-19 07:22:26 +00:00
Craig Topper 3616891046 [X86] Use compare with 0 to fill an element with sign bits when sign extending to v2i64 pre-sse4.1
Previously we used an arithmetic shift right by 31, but that requires a copy to preserve the input. So we might as well materialize a zero and compare to it since the comparison will overwrite the register that contains the zeros. This should be one byte shorter.

llvm-svn: 347181
2018-11-19 04:33:20 +00:00
Craig Topper 053f1eea96 [X86] Remove most of the SEXTLOAD Custom setOperationAction calls under -x86-experimental-vector-widening-legalization.
Leave just the v4i8->v4i64 and v8i8->v8i64, but only enable them on pre-sse4.1 targets when 64-bit mode is enabled. In those cases we end up creating sext loads that get scalarized to code that looks better than what we get from loading into a vector register and doing a multiple step sign extend using unpacks and shifts.

llvm-svn: 347180
2018-11-19 00:33:16 +00:00
Simon Pilgrim 7f92efa5a9 [X86][SSE] Add SimplifyDemandedVectorElts support for SSE packed i2fp conversions.
llvm-svn: 347177
2018-11-18 22:13:31 +00:00
Craig Topper 0468c860b7 [X86] Add custom type legalization for extending v4i8/v4i16->v4i64.
Pre-SSE4.1 sext_invec for v2i64 is complicated because we don't have a v2i64 sra instruction. So instead we sign extend to i32 using unpack and sra, then copy the elements and do a v4i32 sra to fill with sign bits, then interleave the i32 sign extend and the sign bits. So really we're doing to two sign extends but only using half of the v4i32 intermediate result.

When the result is more than 128 bits, default type legalization would prefer to split the destination type all the way down to v2i64 with shuffles followed by v16i8/v8i16->v2i64 sext_inreg operations. This results in more instructions than necessary because we are only utilizing the lower 2 elements of the v4i32 intermediate result. Instead we can custom split a v4i8/v4i16->v4i64 sign_extend. Then we can sign extend v4i8/v4i16->v4i32 invec producing a full v4i32 result. Create the sign bit vector as a v4i32 then split and interleave with the sign bits using an punpackldq and punpackhdq.

llvm-svn: 347176
2018-11-18 21:28:50 +00:00
Simon Pilgrim b31bdbd2e9 [X86][SSE] Add SimplifyDemandedVectorElts support for SSE splat-vector-shifts.
SSE vector shifts only use the bottom 64-bits of the shift amount vector.

llvm-svn: 347173
2018-11-18 20:21:52 +00:00
Craig Topper 11d50948e2 [X86] Disable combineToExtendVectorInReg under -x86-experimental-vector-widening-legalization. Add custom type legalization for extends.
If we widen illegal types instead of promoting, we should be able to rely on the type legalizer to create the vector_inreg operations for us with some caveats.

This patch disables combineToExtendVectorInReg when we are using widening.

I've enabled custom legalization for v8i8->v8i64 extends under avx512f since the type legalizer would want to create a vector_inreg with a v64i8 input type which isn't legal without avx512bw. So we go to v16i8 with custom code using the relaxation of rules we get from D54346.

I've also enable custom legalization of v8i64 and v16i32 operations with with AVX. When the input type is 128 bits, the default splitting legalization would extend first 128->256, then do the a split to two 128 pieces. Extend each half to 256 and then concat the result. The custom legalization I've added instead uses a 128->256 bit vector_inreg extend that only reads the lower 64-bits for the low half of the split. Then shuffles the high 64-bits to the low 64-bits and does another vector_inreg extend.

llvm-svn: 347172
2018-11-18 18:11:25 +00:00
Craig Topper bc8148f7b0 [X86] Lower v16i16->v8i16 truncate using an 'and' with 255, an extract_subvector, and a packuswb instruction.
Summary: This is an improvement over the two pshufbs and punpcklqdq we'd get otherwise.

Reviewers: RKSimon, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D54671

llvm-svn: 347171
2018-11-18 17:59:28 +00:00
Simon Pilgrim ec808cf541 Remove unused variable. NFCI.
llvm-svn: 347169
2018-11-18 17:24:59 +00:00
Simon Pilgrim 50828c75d0 [X86][SSE] Split IsSplatValue into GetSplatValue and IsSplatVector
Refactor towards making this recursive (necessary for PR38243 rotation splat detection).
IsSplatVector returns the original vector source of the splat and the splat index.
GetSplatValue returns the scalar splatted value as an extraction from IsSplatVector.

llvm-svn: 347168
2018-11-18 17:15:06 +00:00
Simon Pilgrim fec9f8657b [X86][SSE] Relax IsSplatValue - remove the 'variable shift' limit on subtracts.
Means we don't use the per-lane-shifts as much when we can cheaply use the older splat-variable-shifts.

llvm-svn: 347162
2018-11-18 15:52:08 +00:00
Simon Pilgrim cc1f5d2407 [X86][SSE] Use raw shuffle mask decode in SimplifyDemandedVectorEltsForTargetNode (PR39549)
We were using the 'normalized' shuffle mask from resolveTargetShuffleInputs, which replaces zero/undef inputs with sentinel values. For SimplifyDemandedVectorElts we need the raw mask so we can correctly demand those 'zero' inputs that got normalized away, this requires an extra bit of logic to locally normalize undef inputs.

llvm-svn: 347158
2018-11-18 13:34:53 +00:00
Heejin Ahn e0f8b9bfc6 [WebAssembly] Add null streamer support
Summary: Now `llc -filetype=null` works.

Reviewers: eush

Subscribers: dschuff, jgravelle-google, sbc100, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D54660

llvm-svn: 347155
2018-11-18 11:58:47 +00:00
Craig Topper cd94a7c227 [X86] Add -x86-experimental-vector-widening-legalization check to combineSelect and combineSetCC to cover vXi16/vXi8 promotion without BWI.
I don't yet have any test cases for this, but its the right thing to do based on log file inspection.

llvm-svn: 347151
2018-11-18 08:30:09 +00:00
Craig Topper b03f80a21c [X86] Rename WidenMaskArithmetic->PromoteMaskArithmetic since we usually use widen to refer to adding elements not making elements larger. NFC
llvm-svn: 347150
2018-11-18 07:35:08 +00:00
Craig Topper f56a57518d [X86] Don't use a pmaddwd for vXi32 multiply if the inputs are zero extends from i8 or smaller without SSE4.1. Prefer to shrink the mul instead.
The zero extend will require two stages of unpacks to implement. So its better to shrink the multiply using pmullw and then extend that result back to v4i32 using a single unpack.

llvm-svn: 347149
2018-11-18 05:53:21 +00:00
Craig Topper 0438d791fa [X86] Add support for matching PACKUSWB from a v64i8 shuffle.
llvm-svn: 347143
2018-11-17 18:54:43 +00:00
Craig Topper dd61f11642 [X86] Don't extend v32i8 multiplies to v32i16 with avx512bw and prefer-vector-width=256.
llvm-svn: 347131
2018-11-17 02:36:07 +00:00
Craig Topper b05ea28f1f [X86] Use getUnpackl/getUnpackh instead of hardcoding a shuffle mask.
llvm-svn: 347127
2018-11-17 02:18:12 +00:00
Fangrui Song 7570932977 Use llvm::copy. NFC
llvm-svn: 347126
2018-11-17 01:44:25 +00:00
Craig Topper ee0333b4a9 [X86] Add custom promotion of narrow fp_to_uint/fp_to_sint operations under -x86-experimental-vector-widening-legalization.
This tries to force the result type to vXi32 followed by a truncate. This can help avoid scalarization that would otherwise occur.

There's some annoying examples of an avx512 truncate instruction followed by a packus where we should really be able to just use one truncate. But overall this is still a net improvement.

llvm-svn: 347105
2018-11-16 22:53:00 +00:00
Craig Topper 87bc07b3dd [X86] Qualify part of the masked gather handling in ReplaceNodeResults with a getTypeAction call to know if we can use default legalization.
If we managed to switch to -x86-experimental-vector-widening-legalization this block can be removed.

llvm-svn: 347100
2018-11-16 22:04:29 +00:00
Craig Topper 567aaeb40d [X86] Remove a branch on SSE4.1 from LowerLoad
We should be able to use getExtendInVec with or without sse4.1 to produce a SIGN_EXTEND_VECTOR_INREG.

llvm-svn: 347095
2018-11-16 21:05:00 +00:00
Craig Topper 7fff9a9aef [X86] In LowerLoad, fix assert messages and rename a variable that use Zize instead of Size. NFC
llvm-svn: 347093
2018-11-16 21:04:56 +00:00
Peter Collingbourne 527024469a AArch64: Emit a call frame instruction for the shadow call stack register.
When unwinding past a function that uses shadow call stack, we must
subtract 8 from the value of the x18 register. This patch causes us
to emit a call frame instruction that causes that to happen.

Differential Revision: https://reviews.llvm.org/D54609

llvm-svn: 347089
2018-11-16 20:08:54 +00:00
Anton Korobeynikov e5cb1c35b4 [MSP430] Add RTLIB::[SRL/SRA/SHL]_I32 lowering to EABI lib calls
Patch by Kristina Bessonova!

Differential Revision: https://reviews.llvm.org/D54626

llvm-svn: 347080
2018-11-16 19:36:15 +00:00
Rong Xu 3a38175723 [X86] Disable Condbr_merge pass
Disable Condbr_merge pass for now due to PR39658.
Will reenable the pass once the bug is fixed.

llvm-svn: 347079
2018-11-16 19:35:00 +00:00
Stefan Pintilie 9004444d81 Revert "[PowerPC] Make no-PIC default to match GCC - LLVM"
This reverts commit r347069

llvm-svn: 347076
2018-11-16 19:24:23 +00:00
Anton Korobeynikov 883c70959d [MSP430] Use R_MSP430_16_BYTE type for FK_Data_2 fixup
Linker fails to link example like this (simplified case from newlib
sources):

$ cat test.c

extern const char _ctype_b[];
struct _t { char *ptr; };
struct _t T = { ((char *) _ctype_b + 3) };
$ cat ctype.c

char _ctype_b[4] = { 0, 0, 0, 0 };
LD: test.o:(.data+0x0): warning: internal error: unsupported relocation error

We also follow gnu toolchain here, where 2-byte relocation mapped to
R_MSP430_16_BYTE, instead of R_MSP430_16.

Patch by Kristina Bessonova!

Differential Revision: https://reviews.llvm.org/D54620

llvm-svn: 347074
2018-11-16 19:20:51 +00:00
Sam Clegg 74f5fd4e32 [WebAssembly] Default to static reloc model
Differential Revision: https://reviews.llvm.org/D54637

llvm-svn: 347073
2018-11-16 18:59:51 +00:00
Stefan Pintilie 046eff502f [PowerPC] Make no-PIC default to match GCC - LLVM
Set -fno-PIC as the default option.

Differential Revision: https://reviews.llvm.org/D53383

llvm-svn: 347069
2018-11-16 18:36:21 +00:00
Simon Pilgrim 66f42ea6e1 [SelectionDAG] Move (repeated) SDTIntShiftDOp double shift node def to common code. NFCI.
Prep work for PR39467.

llvm-svn: 347067
2018-11-16 17:50:59 +00:00
Simon Pilgrim bcd6631a2a [X86][SSE] Move number of input limit out of resolveTargetShuffleInputs.
Only combineX86ShufflesRecursively needs this limit.

llvm-svn: 347054
2018-11-16 15:01:05 +00:00
Roman Lebedev 90c5b3f78e [X86] X86DAGToDAGISel::matchBitExtract(): extract 'lshr' from `X`
Summary:
As discussed in previous review, and noted in the FIXME, if `X` is actually an `lshr Y, Z` (logical!),
we can fold the `Z` into 'control`, and let the `BEXTR` do this too.
We could just insert those 8 bits of shift amount into control,
but it is better to instead zero-extend them, and 'or' them in place.

We can only do this for `lshr`, not `ashr`, because we do not know that the mask cover only the bits of `Y`,
and not any of the sign-extended bits.

The obvious question is, is this actually legal to do?
I believe it is. Relevant quotes, from `Intel® 64 and IA-32 Architectures Software Developer’s Manual`, `BEXTR — Bit Field Extract`:
* `Bit 7:0 of the second source operand specifies the starting bit position of bit extraction.`
* `A START value exceeding the operand size will not extract any bits from the second source operand.`
* `Only bit positions up to (OperandSize -1) of the first source operand are extracted.`
* `All higher order bits in the destination operand (starting at bit position LENGTH) are zeroed.`
* `The destination register is cleared if no bits are extracted.`

FIXME: if we can do this, i wonder if we should prefer `BEXTR` over `BZHI` in such cases.

Reviewers: RKSimon, craig.topper, spatel, andreadb

Reviewed By: RKSimon, craig.topper, andreadb

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D54095

llvm-svn: 347048
2018-11-16 13:04:54 +00:00
Alex Bradbury b4a64cede8 [RISCV][NFC] Define and use the new CA instruction format
The RISC-V ISA manual was updated on 2018-11-07 (commit 00557c3) to define a 
new compressed instruction format, RVC format CA (no actual instruction 
encodings were changed). This patch updates the RISC-V backend to define the 
new format, and to use it in the relevant instructions.

Differential Revision: https://reviews.llvm.org/D54302
Patch by Luís Marques.

llvm-svn: 347043
2018-11-16 10:33:23 +00:00
Alex Bradbury 2146e8fb1e [RISCV] Constant materialisation for RV64I
This commit introduces support for materialising 64-bit constants for RV64I,
making use of the RISCVMatInt::generateInstSeq helper in order to share logic
for immediate materialisation with the MC layer (where it's used for the li
pseudoinstruction).

test/CodeGen/RISCV/imm.ll is updated to test RV64, and gains new 64-bit
constant tests. It would be preferable if anyext constant returns were sign
rather than zero extended (see PR39092). This patch simply adds an explicit
signext to the returns in imm.ll.

Further optimisations for constant materialisation are possible, most notably
for mask-like values which can be generated my loading -1 and shifting right.
A future patch will standardise on the C++ codepath for immediate selection on
RV32 as well as RV64, and then add further such optimisations to
RISCVMatInt::generateInstSeq in order to benefit both RV32 and RV64 for
codegen and li expansion.

Differential Revision: https://reviews.llvm.org/D52962

llvm-svn: 347042
2018-11-16 10:14:16 +00:00
Anton Korobeynikov 411773d227 [MSP430] Add support for .refsym directive
Introduces support for '.refsym' assembler directive.

From GCC docs (for MSP430):
'.refsym' - This directive instructs assembler to add an undefined reference
to the symbol following the directive. No relocation is created for this symbol;
it will exist purely for pulling in object files from archives.

Patch by Kristina Bessonova!

Differential Revision: https://reviews.llvm.org/D54618

llvm-svn: 347041
2018-11-16 09:50:24 +00:00
Craig Topper 079c37da58 [X86] Add custom type legalization for v2i8/v4i8/v8i8 mul under -x86-experimental-vector-widening.
By early promoting the multiply to use an i16 element type we can avoid op legalization emit a second multiply for the 8 upper elements of the v16i8 type we would otherwise get.

llvm-svn: 347032
2018-11-16 06:15:21 +00:00
Matt Arsenault eabb8dd015 AMDGPU: Fix analyzeBranch failing with pseudoterminators
If a block had one of the _term instructions used for gluing
exec modifying instructions to the end of the block,
analyzeBranch would fail, preventing the verifier from catching
a broken successor list.

llvm-svn: 347027
2018-11-16 05:03:02 +00:00
Craig Topper 5802b82b40 [X86] Use ANY_EXTEND instead of SIGN_EXTEND in the AVX2 and later path for legalizing vXi8 multiply.
We aren't going to use the upper bits of the multiply result that the extend would effect. So we don't need a specific type of extend.

This makes some reduction test cases shorter because we were previously trying to sign_extend a truncate which we can't eliminate.

llvm-svn: 347011
2018-11-16 01:16:59 +00:00
Craig Topper 1acafd863f [X86] Update a couple comments to remove a mention of a sign extending that no longer happens. NFC
llvm-svn: 347010
2018-11-16 01:16:51 +00:00
Ron Lieberman cac749ac88 [AMDGPU] Add FixupVectorISel pass, currently Supports SREGs in GLOBAL LD/ST
Add a pass to fixup various vector ISel issues.
Currently we handle converting GLOBAL_{LOAD|STORE}_*
and GLOBAL_Atomic_* instructions into their _SADDR variants.
This involves feeding the sreg into the saddr field of the new instruction.

llvm-svn: 347008
2018-11-16 01:13:34 +00:00
Heejin Ahn 095796a391 [WebAssembly] Split BBs after throw instructions
Summary:
`throw` instruction is a terminator in wasm, but BBs were not splitted
after `throw` instructions, causing machine instruction verifier to
fail.

This patch
- Splits BBs after `throw` instructions in WasmEHPrepare and adding an
  unreachable instruction after `throw`, which will be deleted in
  LateEHPrepare pass
- Refactors WasmEHPrepare into two member functions
- Changes the semantics of `eraseBBsAndChildren` in LateEHPrepare pass
  to match that of WasmEHPrepare pass, which is newly added. Now
  `eraseBBsAndChildren` does not delete BBs with remaining predecessors.
- Fixes style nits, making static function names conform to clang-tidy
- Re-enables the test temporarily disabled by rL346840 && rL346845

Reviewers: dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D54571

llvm-svn: 347003
2018-11-16 00:47:18 +00:00
Ron Lieberman 2f5683e6b0 [AMDGPU] NFC Test commit
llvm-svn: 347002
2018-11-16 00:46:51 +00:00
Konstantin Zhuravlyov af7b5d7092 AMDHSA: More code object v3 fixes:
- Make sure IsaInfo::hasCodeObjectV3 returns true only
    for AMDHSA
  - Update assembler metadata tests to use v2 by default

llvm-svn: 347001
2018-11-15 23:14:23 +00:00
Craig Topper 22bfa99448 [X86] Remove ANY_EXTEND special case from canReduceVMulWidth
Removing this code doesn't affect any lit tests so it doesn't appear to be tested anymore. I assume it was when it was added, but I guess something else changed? Code coverage report also says its unused.

I mostly didn't like that it seemed to count the sign bits as if it was a sign_extend, but then set isPositive as if it was a zero_extend. It feels like we should have picked one interpretation?

Differential Revision: https://reviews.llvm.org/D54596

llvm-svn: 346995
2018-11-15 21:19:32 +00:00
Craig Topper b144c7a6fb [X86] Minor cleanup to getExtendInVec. NFCI
Use unsigned to calculate the subvector index to avoid a cast.

Remove an unnecessary condition and replace it with a stronger assert.

Use the InVT variable we updated when we extracted instead of grabbing it from the In SDValue.

llvm-svn: 346983
2018-11-15 19:20:22 +00:00
Craig Topper 73bb04ab6f [X86] Add -x86-experimental-vector-widening support to reduceVMULWidth and combineMulToPMADDWD
In reduceVMULWidth, we no longer need to worry about extending the vector to 128 bits first. Regular widening of extends, muls and shuffles will take care of that for us.

In combineMulToPMADDWD, we can handle v2i32 multiplies and allow the VPMADDWD to be widened to v4i32 during type legalization by adding custom widening like we do have for AVG/ADDUS/SUBUS. I had to modify that code a little to allow different and output VTs.

Differential Revision: https://reviews.llvm.org/D54512

llvm-svn: 346980
2018-11-15 18:59:31 +00:00
Thomas Lively fc3163b67a [WebAssembly] Fix return type of nextByte
Summary:
The old return type did not allow for correct error reporting and was
causing a compiler warning.

Reviewers: aheejin

Subscribers: dschuff, sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D54586

llvm-svn: 346979
2018-11-15 18:56:49 +00:00
Simon Pilgrim 0db8cb0147 [X86] Fix MCNullStreamer support for modules with a CodeView flag
This fixes -filetype=null support when compiling for a Win32 target and the module has a CodeView flag.

The only places changed are the uses of getTargetStreamer function - this patch guards both of them with null checks.

Committed on behalf of @eush (Eugene Sharygin)

Differential Revision: https://reviews.llvm.org/D54008

llvm-svn: 346962
2018-11-15 15:17:15 +00:00
Alex Bradbury f809d89980 [RISCV] Mark C.EBREAK instruction as having side effects
C.EBREAK was defined with hasSideEffects = 0, which is incorrect and 
inconsistent with the non-compressed instruction form. This patch corrects 
this oversight.

This wouldn't cause codegen issues, as compressed instructions are only ever 
generated by converting the non-compressed form as an MCInst. But having 
correct flags is still worthwhile.

Differential Revision: https://reviews.llvm.org/D54256
Patch by Luís Marques.

llvm-svn: 346959
2018-11-15 14:52:24 +00:00
Alex Bradbury 7727240438 [RISCV] Mark FREM as Expand
Mark the FREM SelectionDAG node as Expand, which is necessary in order to 
support the frem IR instruction on RISC-V. This is expanded into a library 
call. Adds the corresponding test. Previously, this would have triggered an 
assertion at instruction selection time.

Differential Revision: https://reviews.llvm.org/D54159
Patch by Luís Marques.

llvm-svn: 346958
2018-11-15 14:46:11 +00:00
Anton Korobeynikov f0001f4186 Add missed files from prev. commit
llvm-svn: 346949
2018-11-15 12:35:04 +00:00
Anton Korobeynikov 49045c6a0d [MSP430] Add MC layer
Reapply r346374 with the fixes for modules build.

Original summary:

This change implements assembler parser, code emitter, ELF object writer
and disassembler for the MSP430 ISA.  Also, more instruction forms are added
to the target description.

Patch by Michael Skvortsov!

llvm-svn: 346948
2018-11-15 12:29:43 +00:00
Alex Bradbury 22c091fc3c [RISCV] Introduce the RISCVMatInt::generateInstSeq helper
Logic to load 32-bit and 64-bit immediates is currently present in
RISCVAsmParser::emitLoadImm in order to support the li pseudoinstruction. With
the introduction of RV64 codegen, there is a greater benefit of sharing
immediate materialisation logic between the MC layer and codegen. The
generateInstSeq helper allows this by producing a vector of simple structs
representing the chosen instructions. This can then be consumed in the MC
layer to produce MCInsts or at instruction selection time to produce
appropriate SelectionDAG node. Sharing this logic means that both the li
pseudoinstruction and codegen can benefit from future optimisations, and
that this logic can be used for materialising constants during RV64 codegen.

This patch does contain a behaviour change: addi will now be produced on RV64
when no lui is necessary to materialise the constant. In that case addiw takes
x0 as the source register, so is semantically identical to addi.

Differential Revision: https://reviews.llvm.org/D52961

llvm-svn: 346937
2018-11-15 10:11:31 +00:00
Craig Topper 553ac560aa [X86] Add some custom type legalization rules for truncate with -x86-experimental-vector-widening-legalization.
This avoids some nasty shuffles when we have avx512. It will also prevent using zmm truncate instructions when a ymm instruction that zeroes part of an xmm register will do. Also avoid using avx512 truncate instructions when the input is 128 bits or less. These instructions are 2 uops on skx so we can probably find a better single uop shuffle like pshufb.

llvm-svn: 346936
2018-11-15 08:23:40 +00:00
Thomas Lively 77b33c86f5 [WebAssembly] Renumber SIMD bitwise instructions
Summary: Changed to match https://github.com/WebAssembly/simd/pull/54.

Reviewers: aheejin

Subscribers: dschuff, sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D54561

llvm-svn: 346931
2018-11-15 03:38:59 +00:00
Konstantin Zhuravlyov a25e0524c0 AMDGPU: Enable code object v3 for AMDHSA only
Differential Revision: https://reviews.llvm.org/D54186

llvm-svn: 346923
2018-11-15 02:32:43 +00:00
Craig Topper ea6ced9d1a [X86] Don't mark SEXTLOADS with narrow types as Custom with -x86-experimental-vector-widening-legalization.
The narrow types end up requesting widening, but generic legalization will end up scalaring and using a build_vector to do the widening.

llvm-svn: 346916
2018-11-15 00:21:41 +00:00
Benjamin Kramer 6b7d6fe079 [X86] Remove unused variable
llvm-svn: 346909
2018-11-14 23:13:27 +00:00
Craig Topper 0b2089da4b [X86] Support v2i32/v4i16/v8i8 load/store using f64 on 32-bit targets under -x86-experimental-vector-widening-legalization.
On 64-bit targets the type legalizer will use i64 to legalize these. But when i64 isn't legal, the type legalizer won't try an FP type. So do it manually instead.

There are a few regressions in here due to some v2i32 operations like mul and div now being reassembled into a full vector just to store instead of storing the pieces. But this was already occuring in 64-bit mode so its not a new issue.

llvm-svn: 346908
2018-11-14 23:02:09 +00:00
Jessica Paquette 27e1754fc9 [MachineOutliner][NFC] Don't compute liveness if X16/X17/NZCV are unused
Using the MBB flags, we can tell if X16/X17/NZCV are unused in a block,
and also not live out.

If this holds for all MBBs, then we can avoid checking for liveness on
that candidate. Furthermore, if it holds for an individual candidate's
MBB, then we can avoid checking for liveness on that candidate.

llvm-svn: 346901
2018-11-14 22:23:38 +00:00
Nirav Dave 1241dcb3cf Bias physical register immediate assignments
The machine scheduler currently biases register copies to/from
physical registers to be closer to their point of use / def to
minimize their live ranges. This change extends this to also physical
register assignments from immediate values.

This causes a reduction in reduction in overall register pressure and
minor reduction in spills and indirectly fixes an out-of-registers
assertion (PR39391).

Most test changes are from minor instruction reorderings and register
name selection changes and direct consequences of that.

Reviewers: MatzeB, qcolombet, myatsina, pcc

Subscribers: nemanjai, jvesely, nhaehnle, eraman, hiraditya,
  javed.absar, arphaman, jfb, jsji, llvm-commits

Differential Revision: https://reviews.llvm.org/D54218

llvm-svn: 346894
2018-11-14 21:11:53 +00:00
Aakanksha Patil 1a60116b5c AMDGPU: Additional pattern for i16 median3 matching
min(max(a, b), max(min(a, b), c))

Differential Revision: https://reviews.llvm.org/D54494

llvm-svn: 346886
2018-11-14 20:10:41 +00:00
Craig Topper 6c94264b1f [X86] Allow pmulh to be formed from narrow vXi16 vectors under -x86-experimental-vector-widening-legalization
Narrower vectors will be widened to 128 bits without changing the element size. And generic type legalization can already handle widening mulhu/mulhs.

Differential Revision: https://reviews.llvm.org/D54513

llvm-svn: 346879
2018-11-14 18:16:21 +00:00
Simon Pilgrim cdb170794b [CostModel] Add generic expansion funnel shift cost support
Add support for the expansion of funnelshift/rotates to getIntrinsicInstrCost.

This also required us to move the X86 fshl/fshr costs to the same place as the rotates to avoid expansion and get correct scalarization vs vectorization costs.

llvm-svn: 346854
2018-11-14 12:24:50 +00:00
Simon Pilgrim 7501780ec6 [X86][AVX512] Remove constant pool shuffle decoding from SelectionDAG
This patch removes the last use of the constant pool shuffle decode helper and consistently uses the 'getTargetShuffleMaskIndices' versions instead. The constant pool versions are now purely used for assembly comments.

The avx512vbmi intrinsic upgrades had to be altered as they were being decoded as broadcasts, similar to what I fixed in rL346032. I don't think the change is critical - although its annoying that we lose the {k}{z} instruction test coverage as they are tricky to generate....

Differential Revision: https://reviews.llvm.org/D54083

llvm-svn: 346850
2018-11-14 11:26:35 +00:00
Heejin Ahn da419bdb5e [WebAssembly] Add support for the event section
Summary:
This adds support for the 'event section' specified in the exception
handling proposal. (This was named 'exception section' first, but later
renamed to 'event section' to take possibilities of other kinds of
events into consideration. But currently we only store exception info in
this section.)

The event section is added between the global section and the export
section. This is for ease of validation per request of the V8 team.

This patch:
- Creates the event symbol type, which is a weak symbol
- Makes 'throw' instruction take the event symbol '__cpp_exception'
- Adds relocation support for events
- Adds WasmObjectWriter / WasmObjectFile (Reader) support
- Adds obj2yaml / yaml2obj support
- Adds '.eventtype' printing support

Reviewers: dschuff, sbc100, aardappel

Subscribers: jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D54096

llvm-svn: 346825
2018-11-14 02:46:21 +00:00
Zi Xuan Wu 6a3c279d1c [PowerPC] Enhance the selection(ISD::VSELECT) of vector type
To make ISD::VSELECT available(legal) so long as there are altivec instruction, otherwise it's default behavior is expanding,
which is legalized at type-legalization phase. Use xxsel to match vselect if vsx is open, or use vsel.


Differential Revision: https://reviews.llvm.org/D49531

llvm-svn: 346824
2018-11-14 02:34:45 +00:00
Jessica Paquette 4e97ec94d9 [MachineOutliner][NFC] Use flags set in all candidates to check for calls
If we keep track of if the ContainsCalls bit is set in the MBB flags for each
candidate, then we have a better chance of not checking the candidate for calls
at all.

This saves quite a few checks in some CTMark tests (~200 in Bullet, for
example.)

llvm-svn: 346816
2018-11-13 23:41:31 +00:00
Jessica Paquette cad864d49e [MachineOutliner][NFC] Use MBB flags to avoid call checks in getOutliningInfo
We already determine a bunch of information about an MBB in
getMachineOutlinerMBBFlags. We can reuse that information to avoid calculating
things that must be false/true.

The first thing we can easily check is if an outlined sequence could ever
contain calls. There's no reason to walk over the outlined range, checking for
calls, if we already know that there are no calls in the block containing the
sequence.

llvm-svn: 346809
2018-11-13 23:01:34 +00:00
Jessica Paquette b2d53c5d7d [MachineOutliner][NFC] Exit getOutliningType if there are < 2 candidates
Since we never outline anything with fewer than 2 occurrences, there's no
reason to compute cost model information if there's less than that.

llvm-svn: 346803
2018-11-13 22:16:27 +00:00
Stanislav Mekhanoshin bcb34ac2ea [AMDGPU] combine extractelement into several selects
An extractelement with non-constant index will be lowered either to
scratch or movrel loop in most cases. This patch converts such
instruction into a set of selects if vector size is not too big.

Differential Revision: https://reviews.llvm.org/D54351

llvm-svn: 346800
2018-11-13 21:18:21 +00:00
Craig Topper aca8390216 [SelectionDAG][X86] Relax restriction on the width of an input to *_EXTEND_VECTOR_INREG. Use them and regular *_EXTEND to replace the X86 specific VSEXT/VZEXT opcodes
Previously, the extend_vector_inreg opcode required their input register to be the same total width as their output. But this doesn't match up with how the X86 instructions are defined. For X86 the input just needs to be a legal type with at least enough elements to cover the output.

This patch weakens the check on these nodes and allows them to be used as long as they have more input elements than output elements. I haven't changed type legalization behavior so it will still create them with matching input and output sizes.

X86 will custom legalize these nodes by shrinking the input to be a 128 bit vector and once we've done that we treat them as legal operations. We still have one case during type legalization where we must custom handle v64i8 on avx512f targets without avx512bw where v64i8 isn't a legal type. In this case we will custom type legalize to a *extend_vector_inreg with a v16i8 input. After that the input is a legal type so type legalization should ignore the node and doesn't need to know about the relaxed restriction. We are no longer allowed to use the default expansion for these nodes during vector op legalization since the default expansion uses a shuffle which required the widths to match. Custom legalization for all types will prevent us from reaching the default expansion code.

I believe DAG combine works correctly with the released restriction because it doesn't check the number of input elements.

The rest of the patch is changing X86 to use either the vector_inreg nodes or the regular zero_extend/sign_extend nodes. I had to add additional isel patterns to handle any_extend during isel since simplifydemandedbits can create them at any time so we can't legalize to zero_extend before isel. We don't yet create any_extend_vector_inreg in simplifydemandedbits.

Differential Revision: https://reviews.llvm.org/D54346

llvm-svn: 346784
2018-11-13 19:45:21 +00:00
Sam Clegg f98ba05f3d [WebAssembly] Fix broken assumption that all bitcasts are to functions types
Specifically, we can bitcast to void.

Fixes PR39591

Differential Revision: https://reviews.llvm.org/D54447

llvm-svn: 346778
2018-11-13 19:14:02 +00:00
Simon Pilgrim e827fe09b3 [CostModel][X86] Fix constant vector XOP rights shifts
We'll constant fold these cases so they are as cheap as vector left shift cases.

Noticed while improving funnel shift costs.

llvm-svn: 346760
2018-11-13 16:40:10 +00:00
Simon Pilgrim 72a7fbc1a3 Fix comment for XOP rotates. NFCI.
llvm-svn: 346753
2018-11-13 12:09:27 +00:00
Alexander Richardson 4eb93907f7 Fix modules build of AVRAsmParser.cpp
Summary:
Without this change I get the following error:

lib/Target/AVR/AVRGenAsmMatcher.inc:1135:1: error: redundant #include of module 'LLVM_Utils.Support.Format' appears within namespace 'llvm' [-Wmodules-import-nested-redundant]

Reviewers: dylanmckay

Reviewed By: dylanmckay

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D53425

llvm-svn: 346750
2018-11-13 10:54:44 +00:00
Jonas Paulsson f9b2b5e67e [SystemZ] Increase the number of VLREPs
If a loaded value is replicated it is best to combine these two operations
into a VLREP (load and replicate), but isel will not produce this if the load
has other users as well.

This patch handles this by putting the other users of the load to use the
REPLICATE 0-element instead of the load. This way the load has only the
REPLICATE node as user, and we get a VLREP.

Review: Ulrich Weigand
https://reviews.llvm.org/D54264

llvm-svn: 346746
2018-11-13 08:37:09 +00:00
Jessica Paquette 106946329d [MachineOutliner][NFC] Simplify isMBBSafeToOutlineFrom check in AArch64 outliner
Turns out it's way simpler to do this check with one LRU. Instead of
maintaining two, just keep one. Check if each of the registers is available,
and then check if it's a live out from the block. If it's a live out, but
available in the block, we know we're in an unsafe case.

llvm-svn: 346721
2018-11-13 00:32:09 +00:00
Jessica Paquette 82d9c0a3fa [MachineOutliner][NFC] Change getMachineOutlinerMBBFlags to isMBBSafeToOutlineFrom
Instead of returning Flags, return true if the MBB is safe to outline from.

This lets us check for unsafe situations, like say, in AArch64, X17 is live
across a MBB without being defined in that MBB. In that case, there's no point
in performing an instruction mapping.

llvm-svn: 346718
2018-11-12 23:51:32 +00:00
Simon Pilgrim e565e5a962 [X86][SSE] Add lowerVectorShuffleAsByteRotateAndPermute (PR39387)
This patch adds the ability to use a PALIGNR to rotate a pair of inputs to select a range containing all the referenced elements, followed by a single input permute to put them in the right location.

Differential Revision: https://reviews.llvm.org/D54267

llvm-svn: 346706
2018-11-12 21:12:38 +00:00
Aakanksha Patil a992c694c6 AMDGPU: Adding more median3 patterns
min(max(a, b), max(min(a, b), c)) -> med3 a, b, c

Differential Revision: https://reviews.llvm.org/D54331

llvm-svn: 346704
2018-11-12 21:04:06 +00:00
Wouter van Oortmerssen cc75e77df5 [WebAssembly] Added WasmAsmParser.
Summary:
This is to replace the ELFAsmParser that WebAssembly was using, which
so far was a stub that didn't do anything, and couldn't work correctly
with wasm.

This new class is there to implement generic directives related to
wasm as a binary format. Wasm target specific directives are still
parsed in WebAssemblyAsmParser as before. The two classes now
cooperate more correctly too.

Also implemented .result which was missing. Any unknown directives
will now result in errors.

Reviewers: dschuff, sbc100

Subscribers: mgorny, jgravelle-google, eraman, aheejin, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D54360

llvm-svn: 346700
2018-11-12 20:15:01 +00:00
Craig Topper c48712b341 [X86] In LowerMULH, use generic truncate and vector shuffle nodes instead of directly emitting PACKUS.
Truncate and shuffle lowering are already capable of matching to PACKUS using known bits analysis.

This features one test change where we now prefer to extend v16i16->v16i32 then trunc v16i32->v16i8 over extract_subvector+packus when avx512f is available, but avx512bw is not.

llvm-svn: 346697
2018-11-12 19:37:29 +00:00
Stanislav Mekhanoshin e86c8d33b1 [AMDGPU] Optimize S_CBRANCH_VCC[N]Z -> S_CBRANCH_EXEC[N]Z
Sometimes after basic block placement we end up with a code like:

  sreg = s_mov_b64 -1
  vcc = s_and_b64 exec, sreg
  s_cbranch_vccz

This happens as a join of a block assigning -1 to a saved mask and
another block which consumes that saved mask with s_and_b64 and a
branch.

This is essentially a single s_cbranch_execz instruction when moved
into a single new basic block.

Differential Revision: https://reviews.llvm.org/D54164

llvm-svn: 346690
2018-11-12 18:48:17 +00:00
Simon Pilgrim 93c64e5c76 [CostModel][X86] Add funnel shift rotation special case costs
When we repeat the 2 shifting operands then this is a bit rotation - annoyingly this has to be done in the other getIntrinsicInstrCost than most intrinsics as we need to check the operands are the same.

llvm-svn: 346688
2018-11-12 18:27:54 +00:00
Simon Pilgrim 49e93d2f0e [CostModel][X86] Add SHLD/SHRD scalar funnel shift costs
The costs match the typical reg-reg cases - the RMW case can be a lot slower but we don't model that at this level

llvm-svn: 346683
2018-11-12 17:56:59 +00:00
Simon Pilgrim f4cd292ba2 [CostModel][X86] SK_ExtractSubvector is cheap if the (legal) subvector is aligned within the source vector
llvm-svn: 346664
2018-11-12 15:48:06 +00:00
Jonas Paulsson 5cea85dd59 [SystemZ::TTI] Improve accuracy of costs for vector fp <-> int conversions
Improve getCastInstrCost() by respecting the different types of Src and Dst
for vector integer <-> fp conversions.

This means that extracting from integer becomes more expensive (by the
extraction penalty), and the extraction from fp becomes cheaper (no longer
has a false extraction penalty).

Review: Ulrich Weigand
https://reviews.llvm.org/D54423

llvm-svn: 346663
2018-11-12 15:32:27 +00:00
Alex Bradbury 9c03e4cacd [RISCV] Support .option relax and .option norelax
This extends the .option support from D45864 to enable/disable the relax 
feature flag from D44886

During parsing of the relax/norelax directives, the RISCV::FeatureRelax 
feature bits of the SubtargetInfo stored in the AsmParser are updated 
appropriately to reflect whether relaxation is currently enabled in the 
parser. When an instruction is parsed, the parser checks if relaxation is 
currently enabled and if so, gets a handle to the AsmBackend and sets the 
ForceRelocs flag. The AsmBackend uses a combination of the original 
RISCV::FeatureRelax feature bits set by e.g -mattr=+/-relax and the 
ForceRelocs flag to determine whether to emit relocations for symbol and 
branch diffs. Diff relocations should therefore only not be emitted if the 
relax flag was not set on the command line and no instruction was ever parsed 
in a section with relaxation enabled to ensure correct diffs are emitted.

Differential Revision: https://reviews.llvm.org/D46423
Patch by Lewis Revill.

llvm-svn: 346655
2018-11-12 14:25:07 +00:00
Jonas Paulsson c0ee028dc3 [SystemZ] Replicate the load with most uses in buildVector()
Iterate over all elements and count the number of uses among them for each
used load. Then make sure to REPLICATE the load which has the most uses in
order to minimize the number of needed element insertions.

Review: Ulrich Weigand
https://reviews.llvm.org/D54322

llvm-svn: 346637
2018-11-12 08:12:20 +00:00
Craig Topper 2eab39f77b [X86] Use DAG.getConstant instead of getZeroVector.
llvm-svn: 346605
2018-11-11 07:24:36 +00:00
Craig Topper ef33a190bc [X86] Replace calls to getOnesVector/getZeroVector with getConstant.
getConstant will create a BUILD_VECTOR for us and use a legal type if necessary. So just create the simple node and let BUILD_VECTOR legalization do the canonicalization.

llvm-svn: 346603
2018-11-11 01:40:04 +00:00
Sanjay Patel 0a515595a7 [x86] allow vector load narrowing with multi-use values
This is a long-awaited follow-up suggested in D33578. Since then, we've picked up even more
opportunities for vector narrowing from changes like D53784, so there are a lot of test diffs.
Apart from 2-3 strange cases, these are all wins.

I've structured this to be no-functional-change-intended for any target except for x86
because I couldn't tell if AArch64, ARM, and AMDGPU would improve or not. All of those
targets have existing regression tests (4, 4, 10 files respectively) that would be
affected. Also, Hexagon overrides the shouldReduceLoadWidth() hook, but doesn't show
any regression test diffs. The trade-off is deciding if an extra vector load is better
than a single wide load + extract_subvector.

For x86, this is almost always better (on paper at least) because we often can fold
loads into subsequent ops and not increase the official instruction count. There's also
some unknown -- but potentially large -- benefit from using narrower vector ops if wide
ops are implemented with multiple uops and/or frequency throttling is avoided.

Differential Revision: https://reviews.llvm.org/D54073

llvm-svn: 346595
2018-11-10 20:05:31 +00:00
Benjamin Kramer 37c691e867 [X86] Remove unused variable
llvm-svn: 346592
2018-11-10 18:11:11 +00:00
Craig Topper 7956a256e9 [X86] Remove apparently unneeded code from combineVSZext.
No lit tests fail with this code removed.

This is a pre-commit for D54346.

llvm-svn: 346590
2018-11-10 17:44:28 +00:00
Simon Pilgrim d3ca710ec9 [CostModel][X86] SK_ExtractSubvector costs must only be tested for vector types (PR39615)
llvm-svn: 346589
2018-11-10 17:37:52 +00:00
Roman Lebedev b428b8b214 [X86][BdVer2] Fix loads/stores throughput for Piledriver (PR39465)
There are two AGU units, and per 1cy, there can be either two loads,
or a load and a store; but not two stores, or two loads and a store.

Additionally, loads shouldn't affect the store scheduler and vice versa.
(but *should* affect the PdEX scheduler.)

Required rL346545.
Fixes https://bugs.llvm.org/show_bug.cgi?id=39465

llvm-svn: 346587
2018-11-10 14:31:43 +00:00
Craig Topper a1b6667c6a [X86] Use a MOVSX instruction instead of a MOVZX instruction in isel for an any_extend of the remainder from an 8-bit sdivrem.
The sdivrem will emit its own MOVSX to move %ah to the low byte of a register. By using a MOVSX for an any_extend this allows a post-isel peephole to merge them.

llvm-svn: 346581
2018-11-10 06:04:33 +00:00
Craig Topper 0364085281 [X86] In LowerHorizontalByteSum, emit vector_shuffle nodes instead of directly using X86ISD::UNPCKL/X86ISD::UNPCKH.
This gives shuffle lowering the freedom to use zero_extend_vector_inreg for the unpckl shuffle. Shuffle combining usually makes this swap later, but not when AVX512 is enabled it seems.

While there also use DAG.getConstant to create a 0 vector instead of using the helper the forces a specific BUILD_VECTOR. I don't think that helper is usually needed. We're basically free to create a constant build_vector anytime and it will be legalized on its own.

llvm-svn: 346574
2018-11-10 00:26:42 +00:00
Thomas Lively 936734b777 [WebAssembly] Update bleeding-edge cpu features
Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, sunfish, jfb, llvm-commits

Differential Revision: https://reviews.llvm.org/D54362

llvm-svn: 346570
2018-11-10 00:11:14 +00:00
Eli Friedman ad1151cf6a [ARM64] [Windows] Handle funclets
This patch adds support for funclets in frame lowering and ISel
lowering. Together with D50288 and D50166, it enables C++ exception
handling.

Patch by Sanjin Sijaric, with some fixes by me.

Differential Revision: https://reviews.llvm.org/D51524

llvm-svn: 346568
2018-11-09 23:33:30 +00:00
Eli Friedman 0bbb0d0720 [ARM] Add MemOperand to LDRcp to enable DCE.
LDRcp should be deleted when the dest register is dead in register
coalescing. Without MemOp, dead LDRcp will cause dead constant pool
value which references to non-existing label.

Patch by Yin Ma.

Differential Revision: https://reviews.llvm.org/D54173

llvm-svn: 346563
2018-11-09 23:09:17 +00:00
Craig Topper 17d64c71c5 [X86] Move the promotion of v16i16->v16i8 for avx512f but not avx512bw from lowering to isel. Change to use vpmovzx instead of vpmovsx.
With avx512f but not avx512bw we need to extend to v16i32 then truncate that to to v16i8. Previously we emitted both nodes during lowering, but I'm trying to switch to using target independent nodes and with that switched the extend+truncate wou

This patch changes the implementation to what will be necessary with that patch which helps minimize test diffs.

llvm-svn: 346552
2018-11-09 20:09:53 +00:00
Bryan Chan 123553921f [AArch64] Support HiSilicon's TSV110 processor
Reviewers: t.p.northover, SjoerdMeijer, kristof.beyls

Reviewed By: kristof.beyls

Subscribers: olista01, javed.absar, kristof.beyls, kristina, llvm-commits

Differential Revision: https://reviews.llvm.org/D53908

llvm-svn: 346546
2018-11-09 19:32:08 +00:00
Fangrui Song 60b7fb46e1 [Hexagon] Fix some -Wunused-function with LLVM_DUMP_METHOD and -Wunused-variable
llvm-svn: 346543
2018-11-09 19:24:48 +00:00
Craig Topper 731ea7dbc1 [X86] Turn X86ISD::VSEXT into X86ISD::VZEXT if the upper bits aren't demanded.
This makes X86ISD::VSEXT more similar to ISD::SIGN_EXTEND and ISD::ZERO_EXTEND.

I'm hoping to replace X86ISD::VSEXT/VZEXT with target independent nodes. Making the target specific nodes similar to the target independent nodes helps minimize test diffs in that patch.

llvm-svn: 346539
2018-11-09 19:05:51 +00:00
Simon Pilgrim fc8f1d7da7 [CostModel][X86] SK_ExtractSubvector is free if the subvector is at the start of the source vector
llvm-svn: 346538
2018-11-09 19:04:27 +00:00
Jordan Rupprecht c1741a5a8a [Hexagon] Fix unused variable warning in release builds
llvm-svn: 346537
2018-11-09 18:54:27 +00:00
Fangrui Song 4955066366 [WebAssembly] Hotfix of WebAssemblyInstructionTableSize after rL346465
llvm-svn: 346535
2018-11-09 18:32:20 +00:00
Brendon Cahoon ac8fed68d5 [Hexagon] Implement noreturn optimization
Eliminate the stack frame in functions with the noreturn nounwind
attributes, and when the noreturn-stack-elim target feature is
enabled. This reduces the code and stack space needed for noreturn
functions.

Differential Revision: https://reviews.llvm.org/D54210

llvm-svn: 346532
2018-11-09 18:16:24 +00:00
Stanislav Mekhanoshin 13d3371e68 [AMDGPU] Always pass TRI into findRegister[Use/Def]OperandIdx
This only covers AMDGPU BE, hopefully all occurrences.

Differential Revision: https://reviews.llvm.org/D54235

llvm-svn: 346528
2018-11-09 17:58:59 +00:00
Krzysztof Parzyszek 8567de0871 [Hexagon] Place globals with explicit .sdata section in small data
Both -fPIC and -G0 disable placement of globals in small data section,
but if a global has an explicit section assigmnent placing it in small
data, it should go there anyway.

llvm-svn: 346523
2018-11-09 17:31:22 +00:00
Zaara Syeda 5c179bf14b [Power9] Allow gpr callee saved spills in prologue to vectors registers
Currently in llvm, CalleeSavedInfo can only assign a callee saved register to
stack frame index to be spilled in the prologue. We would like to enable
spilling gprs to vector registers. This patch adds the capability to spill to
other registers aside from just the stack. It also adds the changes for power9
to spill gprs to volatile vector registers when they are available.
This happens only for leaf functions when using the option
-ppc-enable-pe-vector-spills.

Differential Revision: https://reviews.llvm.org/D39386

llvm-svn: 346512
2018-11-09 16:36:24 +00:00
Alexey Bataev 93d018a916 Revert "[DEBUGINFO, NVPTX]DO not emit ',debug' option if no debug info or only debug directives are requested."
This reverts commit r345972. Need to update the description + possibly
to update the patch itself after discussion with Eric Christofer.

llvm-svn: 346508
2018-11-09 16:22:35 +00:00
Jonas Paulsson 458b7c0b39 [SystemZ] Avoid inserting same value after replication
A minor improvement of buildVector() that skips creating an
INSERT_VECTOR_ELT for a Value which has already been used for the
REPLICATE.

Review: Ulrich Weigand
https://reviews.llvm.org/D54315

llvm-svn: 346504
2018-11-09 15:44:28 +00:00
Sam Parker 2804f32ec4 [ARM] Don't promote i1 types in ARM CGP
Now that we have mixed type sizes, i1 values need to be explicitly
handled as we want to avoid promoting these values.

Differential Revision: https://reviews.llvm.org/D54308

llvm-svn: 346499
2018-11-09 15:06:33 +00:00
Sanjay Patel fa1c0fe478 [x86] try to form broadcast before widening shuffle elements
I noticed that we weren't generating broadcasts as much I thought we would with 
D54271, and this is part of the problem.

Widening the shuffle elements means adding bitcasts and hiding the relationship 
between a splatted scalar and the vector. If we can form a broadcast, do that 
before going through the rest of the shuffle lowering because broadcasts should 
be cheap and can often be load-folded.

Differential Revision: https://reviews.llvm.org/D54280

llvm-svn: 346498
2018-11-09 14:54:58 +00:00
Alex Bradbury 1cc2d0b9fb [RISCV] Avoid unnecessary XOR for seteq/setne 0
Differential Revision: https://reviews.llvm.org/D53492

Patch by James Clarke.

llvm-svn: 346497
2018-11-09 14:47:36 +00:00
Petar Avramovic 2cefaa2747 [MIPS GlobalISel] narrowScalar G_CONSTANT
Legalize s64 G_CONSTANT using narrowScalar on MIPS 32.

Differential Revision: https://reviews.llvm.org/D54255

llvm-svn: 346495
2018-11-09 14:21:16 +00:00
Simon Pilgrim ea51f98b9b [X86] Add Subtarget to more lowerVectorShuffle functions. NFCI.
This will be necessary for an update to D54267

llvm-svn: 346490
2018-11-09 13:19:03 +00:00
Clement Courbet eee2e06e2a [llvm-exegesis][NFC] Add a way to declare the default counter binding for unbound CPUs for a target.
Summary:
This simplifies the code and moves everything to tablegen for consistency. This
also prepares the ground for adding issue counters.

Reviewers: gchatelet, john.brawn, jsji

Subscribers: nemanjai, mgorny, javed.absar, kbarton, tschuett, llvm-commits

Differential Revision: https://reviews.llvm.org/D54297

llvm-svn: 346489
2018-11-09 13:15:32 +00:00
Clement Courbet e6b727e552 [X86] Fix VZEROUPPER scheduling info on SNB,HSW,BDW,SXL,SKX.
Summary:
Starting from SNB, VZEROUPPER is handled by the renamer and uses no proc resources.
After HSW, it also has zero latency.

This fixes PR35606.

To reproduce:
Uops:
  llvm-exegesis -mode=uops -opcode-name=VZEROUPPER
Latency:
  echo -e '#LLVM-EXEGESIS-DEFREG XMM0 1\n#LLVM-EXEGESIS-DEFREG XMM1 1\nvzeroupper' | /tmp/llvm-exegesis -mode=latency -snippets-file=-
  echo -e '#LLVM-EXEGESIS-DEFREG XMM0 1\n#LLVM-EXEGESIS-DEFREG XMM1 1\nvzeroupper\naddps %xmm0, %xmm1' | /tmp/llvm-exegesis -mode=latency -snippets-file=-

Reviewers: RKSimon, craig.topper, andreadb

Subscribers: gbedwell, llvm-commits

Differential Revision: https://reviews.llvm.org/D54107

llvm-svn: 346482
2018-11-09 09:49:06 +00:00
Sam Parker 08979cd125 [ARM] Enable mixed types in ARM CGP
Previously, during the search, all values had to have the same
'TypeSize', which is equal to number of bits of the integer type of
the icmp operand. All values in the tree had to match this size;
meaning that, if we searched from i16, we wouldn't accept i8s. A
change in type size requires zext and truncs to perform the casts so,
to allow mixed narrow types, the handling of these instructions is
now slightly different:

- we allow casts if their result or operand is <= TypeSize.
- zexts are sinks if their result > TypeSize.
- truncs are still sinks if their operand == TypeSize.
- truncs are still sources if their result == TypeSize.

The transformation bails on finding an icmp that operates on data
smaller than the current TypeSize.

Differential Revision: https://reviews.llvm.org/D54108

llvm-svn: 346480
2018-11-09 09:28:27 +00:00
Sam Parker 453ba916a0 [ARM] Small reorganisation in ARMParallelDSP
A few code movement things:

- AreSymmetrical is now a method of BinOpChain.
- Created a lambda in CreateParallelMACPairs to reduce loop nesting.
- A Reduction object now gets pasted in a couple of places instead,
  including CreateParallelMACPairs so it doesn't need to return a
  value.
I've also added RecordSequentialLoads, which is run before the
transformation begins, and caches the interesting loads. This can then
be queried later instead of cross checking many load values.

Differential Revision: https://reviews.llvm.org/D54254

llvm-svn: 346479
2018-11-09 09:18:00 +00:00
Mandeep Singh Grang 397765bc51 [COFF, ARM64] Add support for MSVC buffer security check
Reviewers: rnk, mstorsjo, compnerd, efriedma, TomTan

Reviewed By: rnk

Subscribers: javed.absar, kristof.beyls, chrib, llvm-commits

Differential Revision: https://reviews.llvm.org/D54248

llvm-svn: 346469
2018-11-09 02:48:36 +00:00
Thomas Lively 2faf079494 [WebAssembly] Read prefixed opcodes as ULEB128s
Summary: Depends on D54126.

Reviewers: aheejin, dschuff, aardappel

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D54138

llvm-svn: 346465
2018-11-09 01:57:00 +00:00
Thomas Lively 4ddd22581e [WebAssembly][NFC] Reorder SIMD section
Summary:
Reorders the sections in the SIMD tablegen file to roughly match the
new opcode ordering. Depends on D54126.

Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D54134

llvm-svn: 346464
2018-11-09 01:49:19 +00:00
Thomas Lively 299d214aba [WebAssembly] Renumber and LEB128-encode SIMD opcodes
Reviewers: aheejin, dschuff, aardappel

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D54126

llvm-svn: 346463
2018-11-09 01:45:56 +00:00
Thomas Lively 38c902bc2e [WebAssembly] Lower select for vectors
Summary:

Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D53675

llvm-svn: 346462
2018-11-09 01:38:44 +00:00
Heejin Ahn 0c68a875fa [WebAssembly] Fix LowerEmscriptenEHSjLj when there's only longjmp
Summary:
The pass incorrectly assumed if there's a longjmp declaration in the
module, there is also a setjmp function declaration. Fixed it, and now
the pass only converts longjmp and does not do any other transformation
when there's no setjmp declaration in the module.

Fixes PR39562.

Reviewers: jgravelle-google, sbc100

Subscribers: dschuff, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D54273

llvm-svn: 346445
2018-11-08 22:56:26 +00:00
Sanjay Patel b5535dc7b3 [x86] use shuffles for scalar insertion into high elements of a constant vector
As discussed in D54073, we have a potential regression from more aggressive vector narrowing here, so let's try to avoid that by changing build-vector lowering slightly.

Insert-vector-element lowering always does this since there's no "pinsr" for ymm/zmm:

// If the vector is wider than 128 bits, extract the 128-bit subvector, insert
// into that, and then insert the subvector back into the result.

...but we can sometimes do better for insert-into-constant-vector by using shuffle lowering.

Differential Revision: https://reviews.llvm.org/D54271

llvm-svn: 346433
2018-11-08 19:16:27 +00:00
Davide Italiano ac8279ab8b Revert "[MSP430] Add MC layer"
This commit broke the module buildbots.
Error:

lib/Target/MSP430/MSP430GenAsmMatcher.inc:1027:1: error: redundant
namespace 'llvm' [-Wmodules-import-nested-redundant]
^

llvm-svn: 346410
2018-11-08 16:21:29 +00:00
Jonas Paulsson 1993894c03 [SystemZ] Bugfix in shouldCoalesce()
It was discovered in randomized testing that the SystemZ implementation of
shouldCoalesce() could be caused to crash when subreg liveness was
enabled. This was because an undef use of the virtual register was copied
outside current MBB at the point of shouldCoalesce() being called. For more
details, see https://bugs.llvm.org/show_bug.cgi?id=39276.

This patch changes the check for MBB locality from livein/liveout checks to
do checks for all instructions of both intervals being inside MBB. This
avoids the cases with dead defs / undef uses outside MBB, which are not
affecting liveness in/out of MBB.

The original test case included as a reduced .mir test case.

Review: Ulrich Weigand
https://reviews.llvm.org/D54197

llvm-svn: 346406
2018-11-08 15:29:48 +00:00
Petr Pavlu 7c84b2e3ab [ARM] Enable spilling of the hGPR register class in Thumb2
Generalize code in Thumb2InstrInfo::storeRegToStackSlot() and
loadRegToStackSlot() to allow the GPR class or any of its sub-classes
(including hGPR) to be stored/loaded by ARM::t2STRi12/ARM::t2LDRi12.

Differential Revision: https://reviews.llvm.org/D51927

llvm-svn: 346401
2018-11-08 13:02:10 +00:00
Anton Korobeynikov 5eb3d339d3 [MSP430] Fix encodeInstruction() for big endian hosts
Reviewers: asl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D54251

llvm-svn: 346391
2018-11-08 10:17:52 +00:00
Thomas Lively 897171902b [WebAssembly] Add V128 to WebAssemblyInstrInfo::copyPhysReg
Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D53872

llvm-svn: 346384
2018-11-08 02:35:28 +00:00
Stanislav Mekhanoshin 6cc8b2fc65 [AMDGPU] Extend promote alloca vectorization
Promote alloca can vectorize a small array by bitcasting it to a
vector type. Extend vectorization for the case when alloca is
already a vector type. We still want to replace GEPs with an
insert/extract element instructions in this case.

Differential Revision: https://reviews.llvm.org/D54219

llvm-svn: 346376
2018-11-08 00:16:23 +00:00
Anton Korobeynikov 09dff53840 [MSP430] Add MC layer
Summary:
This change implements assembler parser, code emitter, ELF object writer
and disassembler for the MSP430 ISA.  Also, more instruction forms are added
to the target description.

Reviewers: asl

Reviewed By: asl

Subscribers: pftbest, krisb, mgorny, llvm-commits

Differential Revision: https://reviews.llvm.org/D53661

llvm-svn: 346374
2018-11-08 00:03:45 +00:00
Eli Friedman 0917d0c80c [AArch64] [Windows] Address post-commit review comment on r346358.
In this context, usesWindowsCFI() is basically the same thing as
isOSWindows(), but it makes the relevant property of the target
more explicit.

llvm-svn: 346366
2018-11-07 22:30:56 +00:00
Nicolai Haehnle bc233f5523 Revert "AMDGPU: Divergence-driven selection of scalar buffer load intrinsics"
This reverts commit r344696 for now (except for some test additions).

See https://bugs.freedesktop.org/show_bug.cgi?id=108611.

llvm-svn: 346364
2018-11-07 21:53:43 +00:00
Nicolai Haehnle 61396ff67c AMDGPU/InsertWaitcnts: Cleanup some old cruft (NFCI)
Summary: Remove redundant logic and simplify control flow.

Reviewers: msearles, rampitec, scott.linder, kanarayan

Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D54086

llvm-svn: 346363
2018-11-07 21:53:36 +00:00
Nicolai Haehnle 0ab31c9c44 AMDGPU/InsertWaitcnts: Remove kill-related logic
Summary:
This is not needed, because we don't actually insert relevant branches
for KILLs that late in the compilation flow.

Besides, this was always checking for the wrong kill opcode anyway...

Reviewers: msearles, rampitec, scott.linder, kanarayan

Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D54085

llvm-svn: 346362
2018-11-07 21:53:29 +00:00
Konstantin Zhuravlyov 15e90e331c AMDGPU/NFC: Split FLAT_Global_Atomic_Pseudo into RTN/NO_RTN multiclasses
llvm-svn: 346361
2018-11-07 21:42:13 +00:00
Eli Friedman d00fb2e0a8 [AArch64] [Windows] Trap after noreturn calls.
Like the comment says, this isn't the most efficient fix in terms of
codesize, but it works.

Differential Revision: https://reviews.llvm.org/D54129

llvm-svn: 346358
2018-11-07 21:31:14 +00:00
Konstantin Zhuravlyov 7f1959ebb3 AMDGPU/NFC: Split MUBUF_Pseudo_Atomics into RTN/NO_RTN multiclasses
llvm-svn: 346357
2018-11-07 21:21:32 +00:00
Eli Friedman 7d7d41debc [ARM] Fix CPSR liveness in tMOVCCr_pseudo lowering.
The lowering was missing live-ins in certain cases, like a sequence of
multiple tMOVCCr_pseudo instructions.  This would lead to a verifier
failure, and on pre-v6 Thumb CPSR would be incorrectly clobbered.

For reasons I don't completely understand, it's hard to get a sequence
of multiple tMOVCCr_pseudo instructions; the issue only seems to show up
with 64-bit comparisons where the result is zero-extended. I added some
extra testcases in case that changes in the future. Probably some
optimization opportunities here if anyone is interested. (@test_slt_not
is the case that was getting miscompiled.)

The code to check the liveness of CPSR was stolen from
X86ISelLowering.cpp; maybe it could be refactored into common helper,
but I have no idea where to put it.

Differential Revision: https://reviews.llvm.org/D54192

llvm-svn: 346355
2018-11-07 21:08:13 +00:00
Matt Arsenault 8ba740a5a8 Allow subclassing ExternalAA
This allows testing AMDGPU alias analysis like any
other alias analysis pass. This fixes the existing
test pointlessly running opt -O3 when it really
just wants to run the one analysis.

Before there was no way to test this using -aa-eval
with opt, since the default constructed pass
is run. The wrapper subclass allows the
default constructor to pass the necessary callback.

llvm-svn: 346353
2018-11-07 20:26:42 +00:00
Than McIntosh 5bcdea5118 [X86] improve split-stack machine BB placement
Summary:
The conditional branch created to support -fsplit-stack for X86 is
left unbiased/unhinted, resulting in less than ideal block placement:
the __morestack call block is kept on the main hot path. Bias the
branch to insure that the stack allocation block is treated as a
"cold" block during machine basic block placement.

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D54123

llvm-svn: 346336
2018-11-07 17:41:57 +00:00
Sanjay Patel de58e93666 fix typos aggressively; NFC
llvm-svn: 346316
2018-11-07 14:35:36 +00:00
Andrea Di Biagio 4ae974e745 [X86][FixupLEA] Avoid checking target features for every single processed instruction. NFCI
llvm-svn: 346309
2018-11-07 12:26:00 +00:00
Petar Avramovic 2624c8db68 [MIPS GlobalISel] Set operand order for G_MERGE and G_UNMERGE
Set operands order for G_MERGE_VALUES and G_UNMERGE_VALUES so
that least significant bits always go first, regardless of endianness.

Differential Revision: https://reviews.llvm.org/D54098

llvm-svn: 346305
2018-11-07 11:45:43 +00:00
Evandro Menezes f1a0d93b1d [PATCH] [AArch64] Refactor helper functions (NFC)
Refactor helper functions in AArch64InstrInfo to be static methods.

llvm-svn: 346273
2018-11-06 22:17:14 +00:00
Yaxun Liu 73bf0af32f AMDGPU: Add an option -disable-promote-alloca-to-lds
Add this option for debugging and providing workaround.

By default it is off so no behavior change in backend.

Differential Revision: https://reviews.llvm.org/D54158

llvm-svn: 346267
2018-11-06 21:28:17 +00:00
Craig Topper 6428a2cd9a [X86] Add custom promotion of v2i8/v2i16 fp_to_sint to avoid over promotion to v2i64 which would force scalarization.
llvm-svn: 346259
2018-11-06 19:24:21 +00:00
Matthias Braun c6613879ce LivePhysRegs/IfConversion: Change some types from unsigned to MCPhysReg; NFC
Change the type in a couple of lists and sets that only store physical
registers from unsigned to MCPhysRegs. The later is only 16bits and
saves us a bit of memory.

llvm-svn: 346254
2018-11-06 19:00:11 +00:00
Simon Atanasyan bb36aea1d5 [mips] Support sigrie instruction
The `sigrie` instruction signals a Reserved Instruction Exception.
This patch adds support for assembling / disassembling the instruction.

Differential Revision: http://reviews.llvm.org/D53861

llvm-svn: 346230
2018-11-06 14:37:24 +00:00
Clement Courbet 54a1184fff [X86][NFC] Fix comment.
llvm-svn: 346226
2018-11-06 13:48:56 +00:00
Matthias Braun 96d12513a1 AArch64: Cleanup CCMP code; NFC
Cleanup CCMP pattern matching code in preparation for review/bugfix:
- Rename `isConjunctionDisjunctionTree()` to `canEmitConjunction()`
  (it won't accept arbitrary disjunctions and is really about whether we
   can transform the subtree into a conjunction that we can emit).
- Rename `emitConjunctionDisjunctionTree()` to `emitConjunction()`

llvm-svn: 346203
2018-11-06 03:15:22 +00:00
Sam Clegg 5292d17ec8 Revert "[WebAssembly] Fixup `main` signature by default"
This reverts rL345880.  It caused some test failures on the
webassembly waterfall.  e.g. binaryen2.test_mainenv fails due
the fact that `envp` ends up being undef rather than 0.

Differential Revision: https://reviews.llvm.org/D54117

llvm-svn: 346187
2018-11-06 00:31:02 +00:00
Matthias Braun 7a75a91b5b MachineFunction: Store more specific reference to LLVMTargetMachine; NFC
MachineFunction can only be used in code using lib/CodeGen, hence we
can keep a more specific reference to LLVMTargetMachine rather than just
TargetMachine around.

Do the same for references in ScheduleDAG and RegUsageInfoCollector.

llvm-svn: 346183
2018-11-05 23:49:14 +00:00
Craig Topper 0b5f8169b0 [TargetLowering] Change TargetLoweringBase::getPreferredVectorAction to take an MVT instead of an EVT. NFC
The main caller of this already has an MVT and several targets called getSimpleVT inside without checking isSimple. This makes the simpleness explicit.

llvm-svn: 346180
2018-11-05 23:26:13 +00:00
Konstantin Zhuravlyov 108927b944 AMDGPU: Add sram-ecc feature
Differential Revision: https://reviews.llvm.org/D53222

llvm-svn: 346177
2018-11-05 22:44:19 +00:00
Craig Topper def82a81af [X86] Don't turn any_extend from a mask register into a sign_extend during lowering. Add patterns to match any_extend during isel instead.
SimplifyDemandedBits can turn a sign_extend back into an any_extend and trigger an infinite loop. So instead legalize it the same way as a sign_extend, but preserve the opcode. Then just pattern match it the same as sign_extend during isel.

I don't have a reduced test case for such an infinite loop yet.

llvm-svn: 346170
2018-11-05 22:08:17 +00:00
Zaara Syeda 7509880b54 [Power9] Add support for stxvw4x.be and stxvd2x.be intrinsics
On Power9, we don't have patterns to select the following intrinsics:
llvm.ppc.vsx.stxvw4x.be
llvm.ppc.vsx.stxvd2x.be

This patch adds support for these.

Differential Revision: https://reviews.llvm.org/D53581

llvm-svn: 346148
2018-11-05 17:31:26 +00:00
Stefan Maksimovic 8d7c351799 [Mips] Supplement long branch pseudo instructions
Expand on LONG_BRANCH_LUi and LONG_BRANCH_(D)ADDiu pseudo
instructions by creating variants which support
less operands/accept GPR64Opnds as their operand in order
to appease the machine verifier pass.

Differential Revision: https://reviews.llvm.org/D53977

llvm-svn: 346133
2018-11-05 14:37:41 +00:00
Neil Henning 233a02d0ed [AMDGPU] Fix the new atomic optimizer in pixel shaders.
The new atomic optimizer I previously added in D51969 did not work
correctly when a pixel shader was using derivatives, and had helper
lanes active.

To fix this we add an llvm.amdgcn.ps.live call that guards a branch
around the entire atomic operation - ensuring that all helper lanes are
inactive within the wavefront when we compute our atomic results.

I've added a test case that can cause derivatives, and exposes the
problem.

Differential Revision: https://reviews.llvm.org/D53930

llvm-svn: 346128
2018-11-05 12:04:48 +00:00
Sam Parker fec793c98f [ARM] Turn assert into condition in ARMCGP
Turn the assert in PrepareConstants into a conditon so that we can
handle mul instructions with negative immediates.

Differential Revision: https://reviews.llvm.org/D54094

llvm-svn: 346126
2018-11-05 11:26:04 +00:00
Sam Parker fcd8adab30 [ARM][ARMCGP] Remove unecessary zexts and truncs
r345840 slightly changed the way promotion happens which could
result in zext and truncs having the same source and destination
types. This fixes that issue.

We can now also remove the zext and trunc in the following case:
(zext (trunc (promoted op)), i32)

This means that we can no longer treat a value, that is only used by
a sink, to be safe to promote.

I've also added in some extra asserts and replaced a cast for a
dyn_cast.

Differential Revision: https://reviews.llvm.org/D54032

llvm-svn: 346125
2018-11-05 10:58:37 +00:00
Dylan McKay 4c5a5c8db6 [AVR] Fix a backend bug that left extraneous operands after expansion
This patch fixes a bug in the AVR FRMIDX expansion logic.

The expansion would leave a leftover operand from the original FRMIDX,
but now attached to a MOVWRdRr instruction. The MOVWRdRr instruction
did not expect this operand and so LLVM rejected the machine
instruction.

This would trigger an assertion:

    Assertion failed: ((isImpReg || Op.isRegMask() || MCID->isVariadic() ||
                        OpNo < MCID->getNumOperands() || isMetaDataOp) &&
                        "Trying to add an operand to a machine instr that is already done!"),
    function addOperand, file llvm/lib/CodeGen/MachineInstr.cpp

Tim fixed this so that now the FRMIDX is expanded correctly into
a well-formed MOVWRdRr.

Patch by Tim Neumann

llvm-svn: 346117
2018-11-05 05:49:04 +00:00
Craig Topper 30b627e5c9 [X86] Custom type legalize v2i8/v2i16/v2i32 mul to use to pmuludq.
v2i8/v2i16/v2i32 are promoted to v2i64. pmuludq takes a v2i64 input and produces a v2i64 output. Since we don't about the upper bits of the type legalized multiply we can use the pmuludq to produce the multiply result for the bits we do care about.

llvm-svn: 346115
2018-11-05 05:02:12 +00:00
Dylan McKay 9a9ae99b30 [AVR] Disallow the LDDWRdPtrQ instruction with Z as the destination
This is an AVR-specific workaround for a limitation of the register
allocator that only exposes itself on targets with high register
contention like AVR, which only has three pointer registers.

The three pointer registers are X, Y, and Z.
In most nontrivial functions, Y is reserved for the frame pointer,
as per the calling convention. This leaves X and Z. Some instructions,
such as LPM ("load program memory"), are only defined for the Z
register. Sometimes this just leaves X.

When the backend generates a LDDWRdPtrQ instruction with Z as the
destination pointer, it usually trips up the register allocator
with this error message:

  LLVM ERROR: ran out of registers during register allocation

This patch is a hacky workaround. We ban the LDDWRdPtrQ instruction
from ever using the Z register as an operand. This gives the
register allocator a bit more space to allocate, fixing the
regalloc exhaustion error.

Here is a description from the patch author Peter Nimmervoll

  As far as I understand the problem occurs when LDDWRdPtrQ uses
  the ptrdispregs register class as target register. This should work, but
  the allocator can't deal with this for some reason. So from my testing,
  it seams like (and I might be totally wrong on this) the allocator reserves
  the Z register for the ICALL instruction and then the register class
  ptrdispregs only has 1 register left and we can't use Y for source and
  destination. Removing the Z register from DREGS fixes the problem but
  removing Y register does not.

More information about the bug can be found on the avr-rust issue
tracker at https://github.com/avr-rust/rust/issues/37.

A bug has raised to track the removal of this workaround and a proper
fix; PR39553 at https://bugs.llvm.org/show_bug.cgi?id=39553.

Patch by Peter Nimmervoll

llvm-svn: 346114
2018-11-05 05:00:44 +00:00
Craig Topper ed6a0a817f [X86] Add vector shift by immediate to SimplifyDemandedBitsForTargetNode.
Summary: This also enables some constant folding from KnownBits propagation. This helps on some cases vXi64 case in 32-bit mode where constant vectors appear as vXi32 and a bitcast. This can prevent getNode from constant folding sra/shl/srl.

Reviewers: RKSimon, spatel

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D54069

llvm-svn: 346102
2018-11-04 17:31:27 +00:00
Craig Topper 1ba86188cf [SelectionDAG] Remove special methods for creating *_EXTEND_VECTOR_INREG nodes. Move asserts into getNode.
These methods were just wrappers around getNode with additional asserts (identical and repeated 3 times). But getNode already has a switch that can be used to hold these asserts that allows them to be shared for all 3 opcodes. This also enables checking on the places that create these nodes without using the wrappers.

The rest of the patch is just changing all callers to use getNode directly.

llvm-svn: 346087
2018-11-04 02:10:18 +00:00
Craig Topper 7aed9e600b [X86] Update comment I forgot to change in r346043. NFC
llvm-svn: 346073
2018-11-03 19:49:13 +00:00
Reid Kleckner 2bcb288ade [codeview] Let the X86 backend tell us the VFRAME offset adjustment
Use MachineFrameInfo's OffsetAdjustment field to pass this information
from the target to CodeViewDebug.cpp. The X86 backend doesn't use it for
any other purpose.

This fixes PR38857 in the case where there is a non-aligned quantity of
CSRs and a non-aligned quantity of locals.

llvm-svn: 346062
2018-11-03 00:41:52 +00:00
Craig Topper f7108aef14 [X86] In LowerEXTEND_VECTOR_INREG, emit a vector shuffle instead of directly using X86ISD::UNPCKL
The majority of the changes are because the rest of shuffle lowering/combining prefers to replace the undef input with the other operand. Using UNPCKL directly seemed to avoid this and just grabbed a randomish register for the undef which can create false dependencies.

llvm-svn: 346050
2018-11-02 22:48:02 +00:00
Wouter van Oortmerssen de28b5d17f [WebAssembly] Parsing missing directives to produce valid .o
Summary:
The assembler was able to assemble and then dump back to .s, but
was failing to parse certain directives necessary for valid .o
output:
- .type directives are now recognized to distinguish function symbols
  and others.
- .size is now parsed to provide function size.
- .globaltype (introduced in https://reviews.llvm.org/D54012) is now
  recognized to ensure symbols like __stack_pointer have a proper type
  set for both .s and .o output.

Also added tests for the above.

Reviewers: sbc100, dschuff

Subscribers: jgravelle-google, aheejin, dexonsmith, kristina, llvm-commits, sunfish

Differential Revision: https://reviews.llvm.org/D53842

llvm-svn: 346047
2018-11-02 22:04:33 +00:00
Craig Topper 60c202a494 [X86] Don't emit *_extend_vector_inreg nodes when both the input and output types are legal with AVX1
We already have custom lowering for the AVX case in LegalizeVectorOps. So its better to keep the regular extend op around as long as possible.

I had to qualify one place in DAG combine that created illegal vector extending load operations. This change by itself had no effect on any tests which is why its included here.

I've made a few cleanups to the custom lowering. The sign extend code no longer creates an identity shuffle with undef elements. The zero extend code now emits a zero_extend_vector_inreg instead of an unpckl with a zero vector.

For the high half of the custom lowering of zero_extend/any_extend, we're now using an unpckh with a zero vector or undef. Previously we used used a pshufd to move the upper 64-bits to the lower 64-bits and then used a zero_extend_vector_inreg. I think the zero vector should require less execution resources and be smaller code size.

Differential Revision: https://reviews.llvm.org/D54024

llvm-svn: 346043
2018-11-02 21:09:49 +00:00
Alex Bradbury 52c27785ce [RISCV] Add some missing expansions for floating-point intrinsics
A number of intrinsics, such as llvm.sin.f32, would result in a failure to 
select. This patch adds expansions for the relevant selection DAG nodes, as 
well as exhaustive testing for all f32 and f64 intrinsics.

The codegen for FMA remains a TODO item, pending support for the various 
RISC-V FMA instruction variants.

The llvm.minimum.f32.* and llvm.maximum.* tests are commented-out, pending 
upstream support for target-independent expansion, as discussed in 
http://lists.llvm.org/pipermail/llvm-dev/2018-November/127408.html.

Differential Revision: https://reviews.llvm.org/D54034
Patch by Luís Marques.

llvm-svn: 346034
2018-11-02 19:50:38 +00:00
Heejin Ahn 5b023e07ea [WebAssembly] Fix bugs in rethrow depth counting and InstPrinter
Summary:
EH stack depth is incremented at `try` and decremented at `catch`. When
there are more than two catch instructions for a try instruction, we
shouldn't count non-first catches when calculating EH stack depths.

This patch fixes two bugs:
- CFGStackify: Exclude `catch_all` in the terminate catch pad when
  calculating EH pad stack, because when we have multiple catches for a
  try we should count only the first catch instruction when calculating
  EH pad stack.
- InstPrinter: The initial intention was also to exclude non-first
  catches, but it didn't account nested try-catches, so it failed on
  this case:
```
try
  try
  catch
  end
catch    <-- (1)
end
```
In the example, when we are at the catch (1), the last seen EH
instruction is not `try` but `end_try`, violating the wrong assumption.

We don't need these after we switch to the second proposal because there
is gonna be only one `catch` instruction. But anyway before then these
bugfixes are necessary for keep trunk in working state.

Reviewers: dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D53819

llvm-svn: 346029
2018-11-02 18:38:52 +00:00
Matthias Braun 5f7cb79e94 ARMExpandPseudoInsts: Fix CMP_SWAP expansion adding a kill flag to a def
llvm-svn: 346026
2018-11-02 18:22:15 +00:00
Jonas Paulsson cced2a2775 [SystemZ::TTI] Improve cost handling of uint/sint to fp conversions.
Let i8/i16 uint/sint to fp conversions cost 1 if operand is a load.

Since the load already does the extension, there is no extra cost (previously
returned 2).

Review: Ulrich Weigand
https://reviews.llvm.org/D54028

llvm-svn: 346009
2018-11-02 17:53:31 +00:00
Sylvestre Ledru df92dabaef Fixed inclusion of M_PI fow MinGW-w64
Patch by KOLANICH

llvm-svn: 346000
2018-11-02 17:25:40 +00:00
Jonas Paulsson 79f2441eee [SystemZ] Rework getInterleavedMemoryOpCost()
Model this function more closely after the BasicTTIImpl version, with
separate handling of loads and stores. For loads, the set of actually loaded
vectors is checked.

This makes it more readable and just slightly more accurate generally.

Review: Ulrich Weigand
https://reviews.llvm.org/D53071

llvm-svn: 345998
2018-11-02 17:15:36 +00:00
Krzysztof Parzyszek f070544f8e [Hexagon] Do not reduce load size for globals in small-data
Small-data (i.e. GP-relative) loads and stores allow 16-bit scaled
offset. For a load of a value of type T, the small-data area is
equivalent to an array "T sdata[65536]". This implies that objects
of smaller sizes need to be closer to the beginning of sdata,
while larger objects may be farther away, or otherwise the offset
may be insufficient to reach it. Similarly, an object of a larger
size should not be accessed via a load of a smaller size.

llvm-svn: 345975
2018-11-02 14:17:47 +00:00
Alexey Bataev 8831ef7a16 [DEBUGINFO, NVPTX]DO not emit ',debug' option if no debug info or only debug directives are requested.
Summary:
If the output of debug directives only is requested, we should drop
emission of ',debug' option from the target directive. Required for
supporting of nvprof profiler.

Reviewers: probinson, echristo, dblaikie

Subscribers: Hahnfeld, jholewinski, llvm-commits, JDevlieghere, aprantl

Differential Revision: https://reviews.llvm.org/D46061

llvm-svn: 345972
2018-11-02 13:47:47 +00:00
Neil Henning 7d1b77df57 [AMDGPU] UBSan bug fix for r345710
UBSan detected an error in our ISelLowering that is exposed only when
you have a dmask == 0x1. Fix this by adding in an explicit check to
ensure we don't do the UBSan detected shl << 32.

llvm-svn: 345962
2018-11-02 10:24:57 +00:00
Matt Arsenault 8e0269ba0b AMDGPU: Fix assertion with bitcast from i64 constant to v4i16
llvm-svn: 345922
2018-11-02 02:43:55 +00:00
Wouter van Oortmerssen 3231e518a3 [WebAssembly] Added a .globaltype directive to .s output.
Summary:
Assembly output can use globals like __stack_pointer implicitly,
but has no way of indicating the type of such a global, which makes
it hard for tools processing it (such as the MC Assembler) to
reconstruct this information.

The improved assembler directives parsing (in progress in
https://reviews.llvm.org/D53842) will make use of this information.

Also deleted code for the .import_global directive which was unused.

New test case in userstack.ll

Reviewers: dschuff, sbc100

Subscribers: jgravelle-google, aheejin, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D54012

llvm-svn: 345917
2018-11-02 00:45:00 +00:00
Thomas Lively b2382c8bf7 [WebAssembly] General vector shift lowering
Summary: Adds support for lowering non-splat shifts.

Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D53625

llvm-svn: 345916
2018-11-02 00:39:57 +00:00
Thomas Lively fb84fd7c8e [WebAssembly] Expand inserts and extracts with variable indices
Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D53964

llvm-svn: 345913
2018-11-02 00:06:56 +00:00
Mandeep Singh Grang 547a0d765a [COFF, ARM64] Implement Intrinsic.sponentry for AArch64
Summary: This patch adds Intrinsic.sponentry. This intrinsic is required to correctly support setjmp for AArch64 Windows platform.

Patch by: Yin Ma (yinma@codeaurora.org)

Reviewers: mgrang, ssijaric, eli.friedman, TomTan, mstorsjo, rnk, compnerd, efriedma

Reviewed By: efriedma

Subscribers: efriedma, javed.absar, kristof.beyls, chrib, llvm-commits

Differential Revision: https://reviews.llvm.org/D53996

llvm-svn: 345909
2018-11-01 23:22:25 +00:00
Farhana Aleen 5853762e5a [AMDGPU] Handle the idot8 pattern generated by FE.
Summary: Different variants of idot8 codegen dag patterns are not generated by llvm-tablegen due to a huge
         increase in the compile time. Support the pattern that clang FE generates after reordering the
         additions in integer-dot8 source language pattern.

Author: FarhanaAleen

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D53937

llvm-svn: 345902
2018-11-01 22:48:19 +00:00
Mandeep Singh Grang df19e57a1c [COFF, ARM64] Implement llvm.addressofreturnaddress intrinsic
Reviewers: rnk, mstorsjo, efriedma, TomTan

Reviewed By: efriedma

Subscribers: javed.absar, kristof.beyls, chrib, llvm-commits

Differential Revision: https://reviews.llvm.org/D53962

llvm-svn: 345892
2018-11-01 21:23:47 +00:00
Heejin Ahn 2e398976ba [WebAssembly] Fix signature parsing for 'try' in AsmParser
Summary:
Like `block` or `loop`, `try` can take an optional signature which can
be omitted. This patch allows `try`'s signature to be omitted. Also
added some tests for EH instructions.

Reviewers: aardappel

Subscribers: dschuff, sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D53873

llvm-svn: 345888
2018-11-01 20:32:15 +00:00
Reid Kleckner 4af6025f09 [Hexagon] Remove unintended fallthrough from MC duplex code
I added these annotations in r345878 because I wasn't sure if the
fallthrough was intended. Krzysztof Parzyszek confirmed that they should
be breaks, so that's what this patch does.

Reviewers: kparzysz

Differential Revision: https://reviews.llvm.org/D53991

llvm-svn: 345883
2018-11-01 19:59:27 +00:00
Reid Kleckner 4dc0b1ac60 Fix clang -Wimplicit-fallthrough warnings across llvm, NFC
This patch should not introduce any behavior changes. It consists of
mostly one of two changes:
1. Replacing fall through comments with the LLVM_FALLTHROUGH macro
2. Inserting 'break' before falling through into a case block consisting
   of only 'break'.

We were already using this warning with GCC, but its warning behaves
slightly differently. In this patch, the following differences are
relevant:
1. GCC recognizes comments that say "fall through" as annotations, clang
   doesn't
2. GCC doesn't warn on "case N: foo(); default: break;", clang does
3. GCC doesn't warn when the case contains a switch, but falls through
   the outer case.

I will enable the warning separately in a follow-up patch so that it can
be cleanly reverted if necessary.

Reviewers: alexfh, rsmith, lattner, rtrieu, EricWF, bollu

Differential Revision: https://reviews.llvm.org/D53950

llvm-svn: 345882
2018-11-01 19:54:45 +00:00
Sam Clegg ddf049869a [WebAssembly] Fixup `main` signature by default
Differential Revision: https://reviews.llvm.org/D53396

llvm-svn: 345880
2018-11-01 19:38:44 +00:00
Reid Kleckner bebc53f838 Annotate possibly unintended fallthroughs in Hexagon MC code, NFC
Clang's -Wimplicit-fallthrough check fires on these switch cases. GCC
does not warn when a case body that ends in a switch falls through to a
case label of an outer switch.

It's not clear if these fall throughs are truly intended.  The Hexagon
tests pass regardless of whether these case blocks fall through or
break.

For now, I have applied the intended fallthrough annotation macro with a
FIXME comment to unblock enabling the warning. I will send a follow-up
patch that converts them to breaks to the Hexagon maintainers.

llvm-svn: 345878
2018-11-01 19:32:04 +00:00
Volkan Keles 0a8dc9eb0f [GlobalISel] Fix a bug in LegalizeRuleSet::clampMaxNumElements
Summary:
This function was causing a crash when `MaxElements == 1` because
it was trying to create a single element vector type.

Reviewers: dsanders, aemerson, aditya_nandakumar

Reviewed By: dsanders

Subscribers: rovka, kristof.beyls, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D53734

llvm-svn: 345875
2018-11-01 19:01:53 +00:00
Simon Pilgrim b34a052852 [LegalizeDAG] Add generic vector CTPOP expansion (PR32655)
This patch adds support for expanding vector CTPOP instructions and removes the x86 'bitmath' lowering which replicates the same expansion.

Differential Revision: https://reviews.llvm.org/D53258

llvm-svn: 345869
2018-11-01 18:22:11 +00:00
Reid Kleckner ba982b5f8f [Hexagon] Fix MO_JumpTable const extender conversion
Previously this case fell through to unreachable, so it is clearly not
covered by any test case in LLVM. It may be dynamically unreachable, in
fact. However, if it were to run, this is what it would logically do.
The assert suggests that the intended behavior was not to allow folding
offsets from jump table indices, which makes sense.

llvm-svn: 345868
2018-11-01 18:14:45 +00:00
Reid Kleckner eb56894a4b [AArch64] Fix unintended fallthrough and strengthen cast
This was added in r330630. GCC's -Wimplicit-fallthrough seems to not
fire when the previous case contains a switch itself.

This fallthrough was bening because the helper function implementing the
case used dyn_cast to re-check the type of the node in question. After
fixing the fallthrough, we can strengthen the cast.

llvm-svn: 345864
2018-11-01 18:02:27 +00:00
Mandeep Singh Grang b0cdf56dd7 Revert "[COFF, ARM64] Implement Intrinsic.sponentry for AArch64"
This reverts commit 585b6667b4712e3c7f32401e929855b3313b4ff2.

llvm-svn: 345863
2018-11-01 17:53:57 +00:00
Sam Parker 48fbf752b0 [ARM] Attempt to fix ppc64be buildbot
llvm-svn: 345850
2018-11-01 16:44:45 +00:00
Sam Parker 84a2f8b364 [ARM][CGP] Negative constant operand handling
While mutating instructions, we sign extended negative constant
operands for binary operators that can safely overflow. This was to
allow instructions, such as add nuw i8 %a, -2, to still be able to
perform a subtraction. However, the code to handle constants doesn't
take into consideration that instructions, such as sub nuw i8 -2, %a,
require the i8 -2 to be converted into i32 254.

This is a relatively simple fix, but I've taken the time to
reorganise the code a bit - mainly that instructions that can be
promoted are cached and splitting up the Mutate function.

Differential Revision: https://reviews.llvm.org/D53972

llvm-svn: 345840
2018-11-01 15:23:42 +00:00
Simon Pilgrim d5d7224355 [X86][X86FixupLEA] Rename processInstructionForSLM to processInstructionForSlowLEA (NFCI)
The function isn't SLM specific (its driven by the FeatureSlowLEA flag).

Minor tidyup prior to PR38225.

llvm-svn: 345836
2018-11-01 14:57:07 +00:00
Aleksandar Beserminji b9c840c9f0 [mips][micromips] Fix JmpLink to TargetExternalSymbol
When matching MipsISD::JmpLink t9, TargetExternalSymbol:i32'...',
wrong JALR16_MM is selected. This patch adds missing pattern for
JmpLink, so that JAL instruction is selected.

Differential Revision: https://reviews.llvm.org/D53366

llvm-svn: 345830
2018-11-01 13:57:54 +00:00
Chad Rosier 1546efd4a7 [AArch64] Add support for ARMv8.4 in Saphira.
llvm-svn: 345827
2018-11-01 13:45:16 +00:00
Simon Pilgrim 1f0a8421ad [X86][SSE] Move 2-input limit up from getFauxShuffleMask to resolveTargetShuffleInputs (reapplied)
Reapplying an updated version of rL345395 (reverted in rL345451), now the issues noticed in PR39483 have been fixed. 

This patch allows resolveTargetShuffleInputs to remove UNDEF inputs from cases where we have more than 2 inputs.

llvm-svn: 345824
2018-11-01 11:52:09 +00:00
Stefan Maksimovic cd0c50e3d2 [Mips] Conditionally remove successor block
In MipsBranchExpansion::splitMBB, upon splitting
a block with two direct branches, remove the successor
of the newly created block (which inherits successors from
the original block) which is pointed to by the last
branch in the original block only if the targets of two
branches differ.

This is to fix the failing test when ran with
-verify-machineinstrs enabled.

Differential Revision: https://reviews.llvm.org/D53756

llvm-svn: 345821
2018-11-01 10:10:42 +00:00
Jonas Paulsson 6749c24f40 [SystemZ::TTI] Recognize the higher cost of scalar i1 -> fp conversion
Scalar i1 to fp conversions are done with a branch sequence, so it should
have a higher cost.

Review: Ulrich Weigand
https://reviews.llvm.org/D53924

llvm-svn: 345818
2018-11-01 09:05:32 +00:00
Jonas Paulsson f15a53bc81 [SystemZ::TTI] Accurate costs for i1->double vector conversions
This factors out a new method getBoolVecToIntConversionCost() containing the
code for vector sext/zext of i1, in order to reuse it for i1 to double vector
conversions.

Review: Ulrich Weigand
https://reviews.llvm.org/D53923

llvm-svn: 345817
2018-11-01 09:01:51 +00:00
Li Jia He 03170a904f [PowerPC] Support constraint 'wi' in asm
From the gcc manual, we can see that the specific limit of wi inline asm is “FP or VSX register to hold 64-bit integers for VSX insns or NO_REGS”. The link is https://gcc.gnu.org/onlinedocs/gcc-8.2.0/gcc/Machine-Constraints.html#Machine-Constraints. We should accept this constraint.

Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D53265

llvm-svn: 345810
2018-11-01 02:35:17 +00:00
Matthias Braun a9f900561e X86: Consistently declare pass initializers in X86.h; NFC
This avoids declaring them twice: in X86TargetMachine.cpp and the file
implementing the pass.

llvm-svn: 345801
2018-11-01 00:38:01 +00:00
Thomas Lively d4891a1b7a [WebAssembly] Lower vselect
Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D53630

llvm-svn: 345797
2018-11-01 00:01:02 +00:00
Thomas Lively b61232eacd [WebAssembly] Process p2align operands for SIMD loads and stores
Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D53886

llvm-svn: 345795
2018-10-31 23:58:20 +00:00
Thomas Lively 6ff31fe34d [WebAssembly] Handle vector IMPLICIT_DEFs.
Summary:
Also reduce the test case for implicit defs and test it with all
register classes.

Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D53855

llvm-svn: 345794
2018-10-31 23:50:53 +00:00
Mandeep Singh Grang 88ad9ac720 [COFF, ARM64] Implement Intrinsic.sponentry for AArch64
Summary: This patch adds Intrinsic.sponentry. This intrinsic is required to correctly support setjmp for AArch64 Windows platform.

Reviewers: mgrang, TomTan, rnk, compnerd, mstorsjo, efriedma

Reviewed By: efriedma

Subscribers: majnemer, chrib, javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D53673

llvm-svn: 345791
2018-10-31 23:16:20 +00:00
Evandro Menezes 3a06c46470 [AArch64] Sort switch cases (NFC)
llvm-svn: 345786
2018-10-31 21:56:49 +00:00