Commit Graph

47199 Commits

Author SHA1 Message Date
Craig Topper bc895a3afc [X86] Enable popcnt false dependency breaking on Silvermont and Goldmont.
Silvermont and Goldmont have the same issue on popcnt as Sandy Bridge, Haswell, Broadwell, and Skylake. Believe it is fixed in Goldmont Plus.

llvm-svn: 330358
2018-04-19 19:25:24 +00:00
Simon Pilgrim 4ba057dbd1 [X86][SLM] Fix typo using SandyBridge resources.
Luckily this was on instructions not supported on Silvermont....

llvm-svn: 330351
2018-04-19 18:01:52 +00:00
Craig Topper b5f2659130 [X86] Correct the scheduling data for register forms of XCHG and XADD on Intel CPUs.
The XCHG16rr/XCHG32rr/XCHG64rr instructions should be 3 uops just like XCHG8rr. I believe they're just implemented as 3 move uops with a temporary register.

XADD is probably 2 moves and an add also using a temporary register.

Change the latency for both from 2 cycles to 3 cycles. Only 2 of the uops are serialized in their execution, the move into the temporary and the move out of the temporary. The move from one GPR to the other should be able to go in parallel with this if there are ALU resources available.

llvm-svn: 330349
2018-04-19 18:00:17 +00:00
Simon Pilgrim 5e492d29a3 [X86] Merge some MMX instregex
There's a lot more but I'd prefer focussing on removing unnecessary InstRWs first.

llvm-svn: 330347
2018-04-19 17:32:10 +00:00
Krzysztof Parzyszek 2a9a83cd3f [Hexagon] Use legal types when lowering CONCAT_VECTORS via BUILD_VECTOR
llvm-svn: 330344
2018-04-19 17:11:58 +00:00
Mark Searles 1bc6e71f32 [AMDGPU] Do not only rely on BB number when finding bottom loop
We should also check that the "bottom" basic block of a loopis a successor of the "header" basic block, otherwise we don't propagate the information correctly when the CFG is complex. This fixes an important rendering problem with Wolfsentein 2, because of one vector-memory wait was missing.

Differential Revision: https://reviews.llvm.org/D43831

llvm-svn: 330337
2018-04-19 15:42:30 +00:00
Krzysztof Parzyszek d92c37e090 [Hexagon] Generate code for vector bswap intrinsics
llvm-svn: 330333
2018-04-19 14:46:44 +00:00
Simon Pilgrim f21ace6cdd [X86][BtVer2] Remove SSE4A EXTRQ/EXTRQI InstRW overrides.
These are already handled identically by WriteALU.

llvm-svn: 330332
2018-04-19 14:38:36 +00:00
Krzysztof Parzyszek 23bcf06a15 [Hexagon] Add/fix patterns for 32/64-bit vector compares and logical ops
llvm-svn: 330330
2018-04-19 14:24:31 +00:00
Simon Dardis 5d61c8b225 [mips] Correct the definitions of the unaligned word memory operation instructions
These instructions lacked the correct predicates, were not marked
as loads and stores and lacked the proper instruction mapping information.

In the case of microMIPS sw(l|r)e (EVA) these instructions were using the load
EVA description.

Reviewers: abeserminji, smaksimovic, atanasyan

Differential Revision: https://reviews.llvm.org/D45626

llvm-svn: 330326
2018-04-19 13:33:51 +00:00
Alexander Ivchenko e8fed1546e Lowering x86 adds/addus/subs/subus intrinsics (llvm part)
This is the patch that lowers x86 intrinsics to native IR
in order to enable optimizations. The patch also includes folding
of previously missing saturation patterns so that IR emits the same
machine instructions as the intrinsics.

Patch by tkrupa

Differential Revision: https://reviews.llvm.org/D44785

llvm-svn: 330322
2018-04-19 12:13:30 +00:00
Simon Pilgrim 3c06617f0e [X86][FMA] Remove FMA reg-reg InstRW scheduler overrides.
These are all already handled identically by WriteFMA.

llvm-svn: 330319
2018-04-19 11:37:26 +00:00
Simon Pilgrim 33dede9075 [X86][BtVer2] Remove 128-bit F16C InstRW overrides.
These are already handled identically by WriteCvtF2F.

llvm-svn: 330318
2018-04-19 11:16:33 +00:00
Simon Dardis fdc052686c [mips] Guard some macro expansions properly
Reviewers: atanasyan, abeserminji

Differential Revision: https://reviews.llvm.org/D45565

llvm-svn: 330315
2018-04-19 09:45:04 +00:00
Sander de Smalen 50d8702f26 [AArch64][AsmParser] NFC: Cleanup parsing of scalar registers.
Summary:
- Renamed tryParseRegister to tryParseScalarRegister, which
  now returns an OperandMatchResultTy.
- Moved matching of certain aliases into matchRegisterNameAlias.
- Changed type of most 'Reg' variables to 'unsigned'.

This is patch [1/4] in a series to add assembler/disassembler support for
SVE's contiguous LD1 (scalar+scalar) instructions:
- Patch [1/4]: https://reviews.llvm.org/D45687
- Patch [2/4]: https://reviews.llvm.org/D45688
- Patch [3/4]: https://reviews.llvm.org/D45689
- Patch [4/4]: https://reviews.llvm.org/D45690

Reviewers: fhahn, rengolin, javed.absar, huntergr, SjoerdMeijer, t.p.northover, echristo, evandro, samparker

Reviewed By: samparker

Subscribers: samparker, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D45687

llvm-svn: 330311
2018-04-19 07:35:08 +00:00
Craig Topper f846e2d1b1 [X86] Scrub scheduling information for MUL/IMUL on Intel CPUs.
This removes a bunch of unnecessary InstRW overrides. It also cleans up the missing information from the Sandy Bridge model. Other fixes to other models.

llvm-svn: 330308
2018-04-19 05:34:05 +00:00
Bob Haarman cb80a3fce0 Fix data race in X86FloatingPoint.cpp ASSERT_SORTED
Summary:
ASSERT_SORTED checks if a table is sorted, and uses a boolean to
prevent the check from being run again if it was earlier determined
that the table is in fact sorted. Unsynchronized reads and writes of
that boolean triggered ThreadSanitizer's data race detection. This
change rewrites the code to use std::atomic<bool> instead.

Fixes PR36922.

Reviewers: rnk

Reviewed By: rnk

Subscribers: llvm-commits, hiraditya

Differential Revision: https://reviews.llvm.org/D45742

llvm-svn: 330301
2018-04-18 23:04:09 +00:00
Craig Topper ebf52e80c1 [X86] Correct the Defs, Uses, hasSideEffects, mayLoad, mayStore for XCHG and XADD instructions.
I don't think we emit any of these from codegen except for using XCHG16ar as 2 byte NOP.

llvm-svn: 330298
2018-04-18 22:07:53 +00:00
Artem Belevich 0ae8590354 [NVPTX, CUDA] Added support for m8n32k16 and m32n8k16 variants of wmma instructions.
The new instructions were added added for sm_70+ GPUs in CUDA-9.1.

Differential Revision: https://reviews.llvm.org/D45068

llvm-svn: 330296
2018-04-18 21:51:48 +00:00
Alex Bradbury 3ff2022bb9 [RISCV] Introduce pattern for materialising immediates with 0 for lower 12 bits
These immediates can be materialised with just an lui, rather than an lui+addi 
pair.

llvm-svn: 330293
2018-04-18 20:34:23 +00:00
Craig Topper 04244cbf45 [X86] Fix the Uses/Defs,mayLoad,mayStore,hasSideEffects flags for the CMPXCHG instructions.
The compiler only emits the locked version of these which use different instruction definitions. The versions fixed here are only used by the assembler/disassembler.

llvm-svn: 330287
2018-04-18 20:15:00 +00:00
Alex Bradbury 099c720426 Revert "[RISCV] implement li pseudo instruction"
Reverts rL330224, while issues with the C extension and missed common
subexpression elimination opportunities are addressed. Neither of these issues
are visible in current RISC-V backend unit tests, which clearly need
expanding.

llvm-svn: 330281
2018-04-18 19:02:31 +00:00
Lei Huang 192c6ccf6d [Power9]Legalize and emit code for converting Unsigned HWord/Char to Quad-Precision
Legalize and emit code for converting unsigned HWord/Char to QP:

xscvsdqp
xscvudqp

Only covering patterns for unsigned forms cause we don't have part-word
sign-extending integer loads into VSX registers.

Differential Revision: https://reviews.llvm.org/D45494

llvm-svn: 330278
2018-04-18 17:41:46 +00:00
Amara Emerson 9de072f8ae [AArch64] Add isel pattern for v8i8->v2f32 NVCASTs.
rdar://39454635

llvm-svn: 330276
2018-04-18 17:10:19 +00:00
Lei Huang 198e678576 [Power9]Legalize and emit code for converting (Un)Signed Word to Quad-Precision
Legalize and emit code for converting (Un)Signed Word to quad-precision via:

xscvsdqp
xscvudqp

Differential Revision: https://reviews.llvm.org/D45389

llvm-svn: 330273
2018-04-18 16:34:22 +00:00
Alexey Bataev 242706b8d1 [DEBUG] Initial adaptation of NVPTX target for debug info emission.
Summary:
Patch adds initial emission of the debug info for NVPTX target.
Currently, only .file and .loc directives are emitted, everything else is
commented out to not break the compilation of Cuda.

Reviewers: echristo, jlebar, tra, jholewinski

Subscribers: mgorny, aprantl, JDevlieghere, llvm-commits

Differential Revision: https://reviews.llvm.org/D41827

llvm-svn: 330271
2018-04-18 16:13:41 +00:00
Chandler Carruth ccd3ecb95a [x86] Switch EFLAGS copy lowering to use reg-reg form of testing for
a zero register.

Previously I tried this and saw LLVM unable to transform this to fold
with memory operands such as spill slot rematerialization. However, it
clearly works as shown in this patch. We turn these into `cmpb $0,
<mem>` when useful for folding a memory operand without issue. This form
has no disadvantage compared to `testb $-1, <mem>`. So overall, this is
likely no worse and may be slightly smaller in some cases due to the
`testb %reg, %reg` form.

Differential Revision: https://reviews.llvm.org/D45475

llvm-svn: 330269
2018-04-18 15:52:50 +00:00
Chandler Carruth 1f87618f8f [x86] Fix PR37100 by teaching the EFLAGS copy lowering to rewrite uses
across basic blocks in the limited cases where it is very straight
forward to do so.

This will also be useful for other places where we do some limited
EFLAGS propagation across CFG edges and need to handle copy rewrites
afterward. I think this is rapidly approaching the maximum we can and
should be doing here. Everything else begins to require either heroic
analysis to prove how to do PHI insertion manually, or somehow managing
arbitrary PHI-ing of EFLAGS with general PHI insertion. Neither of these
seem at all promising so if those cases come up, we'll almost certainly
need to rewrite the parts of LLVM that produce those patterns.

We do now require dominator trees in order to reliably diagnose patterns
that would require PHI nodes. This is a bit unfortunate but it seems
better than the completely mysterious crash we would get otherwise.

Differential Revision: https://reviews.llvm.org/D45673

llvm-svn: 330264
2018-04-18 15:13:16 +00:00
David Stuttard 31f482c26b [AMDGPU] Fix issues for backend divergence tracking
Summary:
A change to use divergence analysis in the AMDGPU backend was getting formal
arguments incorrect (not tagged as divergent) unless they were VGPR0, VGPR1 or
VGPR2

For graphics shaders it is possible to have more than these passed in as VGPR

Modified the checking code to check for any VGPR registers passed in as formal
arguments.

Also, some intrinsics that are sources of divergence may have been lowered
during instruction selection and are missed on subsequent calls to
isSDNodeSourceOfDivergence - added the relevant AMDGPUISD checks as well.

Finally, the FunctionLoweringInfo tracks virtual registers that are live across
basic block boundaries. This is used to check for divergence of CopyFromRegister
registers using the DivergenceAnalysis analysis. For multiple blocks the lazily
evaluated inverted map VirtReg2Value was not cleared when the ValueMap map was.

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D45372

Change-Id: I112f3bd6dfe0f62e63ce9b43b893982778e4bee3
llvm-svn: 330257
2018-04-18 13:53:31 +00:00
Craig Topper dfccafe18a [X86][Broadwell] Remove some unnecessary InstRW overrides and add some FIXMEs.
llvm-svn: 330241
2018-04-18 06:41:25 +00:00
Craig Topper 513e11bb70 [X86] Give CMOV 2 cycle latency on SLM.
llvm-svn: 330239
2018-04-18 06:04:30 +00:00
Craig Topper 8704612481 [X86] Don't crash on bad operand modifiers in inline assembly
Summary: Previously if a modifer was placed on a non-GPR register class we would hit an assert or crash.

Reviewers: echristo

Reviewed By: echristo

Subscribers: eraman, llvm-commits

Differential Revision: https://reviews.llvm.org/D45751

llvm-svn: 330238
2018-04-18 05:15:24 +00:00
Stanislav Mekhanoshin 8b20b7dc2b [AMDGPU] Enabled v2.16 literals for VOP3P
Literal encoding needs op_sel_hi to select low 16 bit in this case.

Differential Revision: https://reviews.llvm.org/D45745

llvm-svn: 330230
2018-04-17 23:09:05 +00:00
Alex Bradbury 480b7bc906 [RISCV] implement li pseudo instruction
The implementation follows the MIPS backend and expands the
pseudo instruction directly during asm parsing. As the result, only
real MC instructions are emitted to the MCStreamer. Additionally,
PseudoLI instructions are emitted during codegen. The actual
expansion to real instructions is performed during MI to MC lowering
and is similar to the expansion performed by the GNU Assembler.

Differential Revision: https://reviews.llvm.org/D41949
Patch by Mario Werner.

llvm-svn: 330224
2018-04-17 21:56:40 +00:00
Keith Wyss 3d86823f3d [XRay] Typed event logging intrinsic
Summary:
Add an LLVM intrinsic for type discriminated event logging with XRay.
Similar to the existing intrinsic for custom events, but also accepts
a type tag argument to allow plugins to be aware of different types
and semantically interpret logged events they know about without
choking on those they don't.

Relies on a symbol defined in compiler-rt patch D43668. I may wait
to submit before I can see demo everything working together including
a still to come clang patch.

Reviewers: dberris, pelikan, eizan, rSerge, timshen

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D45633

llvm-svn: 330219
2018-04-17 21:30:29 +00:00
Heejin Ahn 2b8158f441 [WebAssembly] Add an assertion for an invalid CFG
Summary:
It was not easy to provide a test case for D45648 (rL330079) because the bug
didn't manifest itself in the set of currently valid IRs. Added an assertion to
check this faster, thanks to @dblaikie's suggestion.

Reviewers: dblaikie

Subscribers: jfb, dschuff, sbc100, jgravelle-google, llvm-commits, dblaikie

Differential Revision: https://reviews.llvm.org/D45711

llvm-svn: 330217
2018-04-17 21:19:21 +00:00
Dan Gohman 4576dc06be [WebAssembly] Teach fast-isel to gracefully recover from illegal return types.
Fixes PR36564.

llvm-svn: 330215
2018-04-17 20:46:42 +00:00
Craig Topper e56a2fc5e7 [X86] Add separate scheduling class for PSADBW instruction.
llvm-svn: 330204
2018-04-17 19:35:19 +00:00
Craig Topper 655e1db722 [X86] Remove unnecessary InstRW overrides. Add somes FIXMEs/TODOs.
llvm-svn: 330203
2018-04-17 19:35:14 +00:00
Krzysztof Parzyszek cc71291731 [Hexagon] Do not merge initializers for stack and non-stack expressions
Stack addressing needs addressing modes that provide an offset field
immediately following the frame index. An initializer from a non-stack
addressing could force the stack address to use a form that does not
provide an offset field.

llvm-svn: 330191
2018-04-17 15:23:09 +00:00
Simon Pilgrim 86e3c26924 [X86] Add FP comparison scheduler classes
Split VCMP/VMAX/VMIN instructions off to WriteFCmp and VCOMIS instructions off to WriteFCom instead of assuming they match WriteFAdd

Differential Revision: https://reviews.llvm.org/D45656

llvm-svn: 330179
2018-04-17 07:22:44 +00:00
Mandeep Singh Grang 88a8b269b4 [RISCV] Fix assert message operator
Summary:
Specifying assert message with an || operator makes the compiler interpret it
 as a bool. Changed it to &&.

Reviewers: asb, apazos

Reviewed By: asb

Subscribers: rbar, johnrusso, simoncook, jordy.potman.lists, sabuasal, niosHD, kito-cheng, shiva0217, zzheng, llvm-commits

Differential Revision: https://reviews.llvm.org/D45660

llvm-svn: 330148
2018-04-16 18:56:10 +00:00
Krzysztof Parzyszek c546434e08 [Hexagon] Turn off flag enabling auto-vectorization
It was turned on for testing and was accidentally left on in the commit.

llvm-svn: 330139
2018-04-16 17:35:30 +00:00
Lei Huang 42ab1d3d03 [NFC] Move verificaiton check for f128 conversion into LowerINT_TO_FP()
Move veriication check for legal conversions to f128 into LowerINT_TO_FP()
and fix some indentations to match other sections of the code for readability.

llvm-svn: 330138
2018-04-16 17:30:24 +00:00
Dmitry Preobrazhensky 4c45e6ff0e [AMDGPU][MC][VI][GFX9] Added support of SDWA/DPP for v_cndmask_b32
See bug 36356: https://bugs.llvm.org/show_bug.cgi?id=36356

Differential Revision: https://reviews.llvm.org/D45446

Reviewers: artem.tamazov, arsenm, timcorringham
llvm-svn: 330123
2018-04-16 12:41:38 +00:00
Sander de Smalen 7a210db81e [AArch64][SVE] Asm: Support for structured LD4 (scalar+imm) load instructions.
Reviewers: fhahn, rengolin, javed.absar, huntergr, SjoerdMeijer, t.p.northover, echristo, evandro

Reviewed By: rengolin

Subscribers: tschuett, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D45624

llvm-svn: 330120
2018-04-16 10:46:18 +00:00
Sander de Smalen d239eb3ce3 [AArch64][SVE] Asm: Support for structured LD3 (scalar+imm) load instructions.
Reviewers: fhahn, rengolin, javed.absar, huntergr, SjoerdMeijer, t.p.northover, echristo, evandro

Reviewed By: rengolin

Subscribers: tschuett, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D45623

llvm-svn: 330116
2018-04-16 10:10:48 +00:00
Stefan Maksimovic 86d638ecdf [mips] Restrict certain trap instructions for micromipsr6
Instructions removed from micromipsr6:
teqi, tgei, tgeiu, tlti, tltiu, tnei

Differential Revision: https://reviews.llvm.org/D45318

llvm-svn: 330114
2018-04-16 09:22:20 +00:00
Gabor Buella 8f1646b579 [X86] Introduce archs: goldmont-plus & tremont
Using Goldmont's cost tables for these two upcoming
atom archs.

Reviewers: craig.topper

Reviewed By: craig.topper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D45612

llvm-svn: 330109
2018-04-16 07:47:35 +00:00
Sander de Smalen f836af869d [AArch64][SVE] Asm: Support for structured LD2 (scalar+imm) load instructions.
Reviewers: fhahn, rengolin, javed.absar, huntergr, SjoerdMeijer, t.p.northover, echristo, evandro

Reviewed By: rengolin

Subscribers: tschuett, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D45622

llvm-svn: 330108
2018-04-16 07:09:29 +00:00
Craig Topper 53f9558903 [X86] Use uint32_t instead of unsigned in GetLo32XForm for readability. NFC
GetLo8XForm right next to it uses uint8_t so uint32_t is consistent.

llvm-svn: 330104
2018-04-15 19:11:24 +00:00
Simon Pilgrim b8adf558f8 [X86][MMX] Set PAVG/PHADD/PMIN/PMAX/PSIGN instructions to use same scheduler classes as SSE/AVX
llvm-svn: 330085
2018-04-14 13:06:38 +00:00
Hiroshi Inoue ae17900997 [NFC] fix trivial typos in document and comments
"not not" -> "not" etc

llvm-svn: 330083
2018-04-14 08:59:00 +00:00
Heejin Ahn 92401cc1cf [WebAssembly] Fix a bug in MachineBasicBlock::findDebugLoc() call
Summary:
InsertPos is within the bacic block `Header`, so `findDebugLoc()` should
be called on not `MBB` but `Header` instead.

Reviewers: yurydelendik

Subscribers: jfb, dschuff, aprantl, sbc100, jgravelle-google, sunfish, JDevlieghere, llvm-commits

Differential Revision: https://reviews.llvm.org/D45648

llvm-svn: 330079
2018-04-14 00:12:12 +00:00
Craig Topper 95f421cfbf [X86] Add the bizarro movsww and movzww mnemonics for the disassembler.
The destination size of the movzx/movsx instruction is controlled by the normal operand size mechanisms. Only the input type is fixed.

This means that a 0x66 prefix on the encoding for zext/sext 16->32 should really produce a 16->16 instruction. Functionally this is equivalent to a GR16->GR16 move since bits 16 and above will be preserved. So nothing is actually extended.

llvm-svn: 330078
2018-04-13 23:57:54 +00:00
Tim Northover 271d3d2771 MachO: trap unreachable instructions
Debugability is more important than saving 4 bytes to let us to fall
through to nonense.

llvm-svn: 330073
2018-04-13 22:25:20 +00:00
Krzysztof Parzyszek 4bdf1aa416 [Hexagon] Initial instruction cost model for auto-vectorization
llvm-svn: 330065
2018-04-13 20:46:50 +00:00
Peter Collingbourne ab40e44ba1 Revert r329956, "AArch64: Introduce a DAG combine for folding offsets into addresses."
Caused a hang and eventually an assertion failure in LTO builds
of 7zip-benchmark on aarch64 iOS targets.
http://green.lab.llvm.org/green/job/lnt-ctmark-aarch64-O3-flto/2024/

llvm-svn: 330063
2018-04-13 20:21:00 +00:00
Krzysztof Parzyszek dfed941eec [LV] Introduce TTI::getMinimumVF
The function getMinimumVF(ElemWidth) will return the minimum VF for
a vector with elements of size ElemWidth bits. This value will only
apply to targets for which TTI::shouldMaximizeVectorBandwidth returns
true. The value of 0 indicates that there is no minimum VF.

Differential Revision: https://reviews.llvm.org/D45271

llvm-svn: 330062
2018-04-13 20:16:32 +00:00
Stefan Pintilie 118b8675c5 [Power9] Add the TLS store instructions to the Power 9 model
The Power 9 scheduler model should now include the TLS instructions.
We can now, once again, mark the model as complete.
From now on, if instructions are added to Power 9 but are not
added to the model the build should produce an error. Hopefully
that will alert the developer who is adding new instructions
that they should also be added to the scheulder model.

llvm-svn: 330060
2018-04-13 19:49:58 +00:00
Simon Dardis 9ec9f444cc [mips] Materialize constants for multiplication
Previously, the MIPS backend would alwyas break down constant multiplications
into a series of shifts, adds, and subs. This patch changes that so the cost of
doing so is estimated.

The cost is estimated against worst case constant materialization and retrieving
the results from the HI/LO registers.

For cases where the value type of the multiplication is not legal, the cost of
legalization is estimated and is accounted for before performing the
optimization of breaking down the constant

This resolves PR36884.

Thanks to npl for reporting the issue!

Reviewers: abeserminji, smaksimovic

Differential Revision: https://reviews.llvm.org/D45316

llvm-svn: 330037
2018-04-13 16:09:07 +00:00
Simon Pilgrim 0e74e50401 [X86] Remove remaining itinerary support from instructions and target (PR37093)
llvm-svn: 330035
2018-04-13 15:37:56 +00:00
Sjoerd Meijer 834f7dc7ab [ARM] FP16 vmaxnm/vminnm scalar instructions
This adds code generation support for the FP16 vmaxnm/vminnm scalar
instructions.

Differential Revision: https://reviews.llvm.org/D44675

llvm-svn: 330034
2018-04-13 15:34:26 +00:00
Yan Luo 545a0c2fb0 [ARC] Add LImm support for J/JL
llvm-svn: 330031
2018-04-13 15:10:34 +00:00
Simon Pilgrim a3a9d81231 [X86] Generalize X86FixupLEAs to work with TargetSchedModel
Similar to rL329834, don't rely on itinerary scheduler model to determine latencies for LEA thresholds, use the generic TargetSchedModel::computeInstrLatency call.

llvm-svn: 330030
2018-04-13 15:09:39 +00:00
Simon Pilgrim 01637c473f Remove comment reference to itineraries. NFCI.
llvm-svn: 330025
2018-04-13 14:42:48 +00:00
Sander de Smalen 5b12db5d23 [AArch64][SVE] Asm: Support for contiguous LD1 (scalar+imm) load instructions
Reviewers: fhahn, rengolin, javed.absar, huntergr, SjoerdMeijer, t.p.northover, echristo, evandro

Reviewed By: rengolin

Subscribers: tschuett, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D45618

llvm-svn: 330024
2018-04-13 14:41:36 +00:00
Simon Pilgrim fe3d59e98b [X86][AVX512] UNPCKL/H PS and PD should be scheduled with WriteFShuffle not WriteFAdd
llvm-svn: 330023
2018-04-13 14:41:05 +00:00
Simon Pilgrim 21e89795cc [X86] Remove remaining OpndItins/SizeItins from all instruction defs (PR37093)
llvm-svn: 330022
2018-04-13 14:36:59 +00:00
Simon Pilgrim e0c7868ded Remove comment references to itineraries. NFCI.
llvm-svn: 330021
2018-04-13 14:31:57 +00:00
Simon Pilgrim 963bf4de2b Remove out of data comment. NFCI.
llvm-svn: 330019
2018-04-13 14:24:06 +00:00
Sander de Smalen 5c62598b0d [AArch64][SVE] Asm: Support for contiguous ST1 (scalar+imm) store instructions.
Summary:
Added instructions for contiguous stores, ST1, with scalar+imm addressing
modes and corresponding tests. The patch also adds parsing of
'mul vl' as needed for the VL-scaled immediate.

This is patch [6/6] in a series to add assembler/disassembler support for
SVE's contiguous ST1 (scalar+imm) instructions.

Reviewers: fhahn, rengolin, javed.absar, huntergr, SjoerdMeijer, t.p.northover, echristo, evandro

Reviewed By: rengolin

Subscribers: tschuett, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D45432

llvm-svn: 330014
2018-04-13 12:56:14 +00:00
Simon Pilgrim ae0c2711b6 [X86] Remove OpndItins/SizeItins from all sse instruction defs (PR37093)
llvm-svn: 330013
2018-04-13 12:50:31 +00:00
Ivan A. Kosarev f533a6e5aa [NEON] Support intrinsic for scalar and vector versions of the VRINTN instruction
Differential Revision: https://reviews.llvm.org/D45514

llvm-svn: 330011
2018-04-13 12:45:12 +00:00
Hiroshi Inoue 372ffa15cb [NFC] fix trivial typos in comments
"the the" -> "the", "we we" -> "we", etc

llvm-svn: 330006
2018-04-13 11:37:06 +00:00
Sander de Smalen ea626e37ba [AArch64][SVE] Asm: Add support for parsing and printing SVE vector lists.
Summary:
Added Z_(b|h|s|d) vector list RegisterOperands along with support to
add/print the vector lists.

This is patch [5/6] in a series to add assembler/disassembler support for
SVE's contiguous ST1 (scalar+imm) instructions.

Reviewers: fhahn, rengolin, javed.absar, huntergr, SjoerdMeijer, t.p.northover, echristo, evandro

Reviewed By: fhahn

Subscribers: tschuett, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D45431

llvm-svn: 330000
2018-04-13 09:11:53 +00:00
Gabor Buella 604be4424b [X86] Introduce cldemote instruction
Hint to hardware to move the cache line containing the
address to a more distant level of the cache without
writing back to memory.

Reviewers: craig.topper, zvi

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D45256

llvm-svn: 329992
2018-04-13 07:35:08 +00:00
Craig Topper 254ed028a4 [X86] Remove the pmuldq/pmuldq intrinsics and replace with native IR.
This completes the work started in r329604 and r329605 when we changed clang to no longer use the intrinsics.

We lost some InstCombine SimplifyDemandedBit optimizations through this change as we aren't able to fold 'and', bitcast, shuffle very well.

llvm-svn: 329990
2018-04-13 06:07:18 +00:00
Simon Pilgrim 1f070c334c [X86] Remove unused MoveLoadStoreItins/ShiftOpndItins schedule class wrappers.
Was being used to move around empty/unused itineraries...

llvm-svn: 329970
2018-04-12 22:57:34 +00:00
Simon Pilgrim 6551d405dc [X86] Remove x86 InstrItinClass entries (PR37093)
This removes the last of the x86 schedule itineraries, I'm intending to cleanup the remaining uses of NoItinerary/OpndItins/etc. before resolving PR37093.

llvm-svn: 329967
2018-04-12 22:44:47 +00:00
Peter Collingbourne 00db326b0d AArch64: Introduce a DAG combine for folding offsets into addresses.
This is a code size win in code that takes offseted addresses
frequently, such as C++ constructors that typically need to compute
an offseted address of a vtable. This reduces the size of Chromium
for Android's .text section by 108KB.

Differential Revision: https://reviews.llvm.org/D45199

llvm-svn: 329956
2018-04-12 21:23:55 +00:00
Simon Pilgrim 0e45634f4e [X86] Remove InstrItinClass entries from all x86 instruction defs (PR37093)
llvm-svn: 329953
2018-04-12 20:47:34 +00:00
Simon Pilgrim e9376b9fdc [X86] Remove InstrItinClass entries from SSE/AVX instructions defs (PR37093)
llvm-svn: 329945
2018-04-12 19:59:35 +00:00
Simon Pilgrim 577ae24feb [X86] Remove explicit SSE/AVX schedule itineraries from defs (PR37093)
llvm-svn: 329940
2018-04-12 19:25:07 +00:00
Sameer AbuAsal e8b7ff30e2 [RISCV] Add c.mv rs1, rs2 pattern for addi rs1, rs2, 0
Summary:
GCC compresses the pseudo instruction "mv rd, rs",  which is an alias of
"addi rd, rs, 0", to "c.mv rd, rs".

In LLVM we rely on the canonical MC instruction (MCInst) to do our compression
checks and since there is no rule to compress "addi rd, rs, 0" --> "c.mv
rd, rs" we lose this compression opportunity to gcc.

 In this patch we fix that by adding an addi to c.mv compression pattern, the
 instruction "mv rd, rs" will be compressed to "c.mv rd, rs" just like
 gcc does.

Patch by Zhaoshi Zheng (zzheng) and Sameer (sabuasal).

Reviewers: asb, apazos, zzheng, mgrang, shiva0217

Reviewed By: asb

Subscribers: rbar, johnrusso, simoncook, jordy.potman.lists, niosHD, kito-cheng, llvm-commits

Differential Revision: https://reviews.llvm.org/D45583

llvm-svn: 329939
2018-04-12 19:22:40 +00:00
Simon Pilgrim 35935c0632 [X86] Remove remaining gpr schedule itineraries (PR37093)
llvm-svn: 329938
2018-04-12 18:46:15 +00:00
Gabor Buella 297c138798 [X86] Introduce LLVM wbinvd intrinsic
A previously missing intrinsic for an old instruction.

Reviewers: craig.topper, echristo

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D45312

llvm-svn: 329936
2018-04-12 18:38:18 +00:00
Simon Pilgrim dec781c141 [X86] Remove gpr shift/extension schedule itineraries (PR37093)
llvm-svn: 329933
2018-04-12 18:25:38 +00:00
Lei Huang 10367eb422 [Power9]Legalize and emit code for converting (Un)Signed DWord to Quad-Precision
Legalize and emit code for:

  * xscvsdqp
  * xscvudqp

Differential Revision: https://reviews.llvm.org/D45230

llvm-svn: 329931
2018-04-12 18:00:14 +00:00
Petar Jovanovic 667e213018 [MIPS GlobalISel] remove superfluous #includes (NFC)
Remove superfluous #includes.
Minor code style change in MipsCallLowering::lowerFormalArguments().

llvm-svn: 329926
2018-04-12 17:01:46 +00:00
Jessica Paquette 8aa6cd5cb9 [AArch64] Move AFI->setRedZone(false) to top of emitPrologue
AFI->setRedZone(false) was put in the wrong place before, and so it only fired
on functions that didn't have stack frames. This moves that to the top of
emitPrologue to make sure that every function without a redzone has it set
correctly.

This also adds a function representing one of the early exit cases (GHC calling
convention) to the MachineOutliner noredzone test to ensure that we can outline
from functions like these, where we never use a redzone.

llvm-svn: 329922
2018-04-12 16:16:18 +00:00
Simon Dardis d886aba39d [mips] Correct the predicates of the load/store (double)word for coprocessor 3.
llvm-svn: 329913
2018-04-12 14:41:38 +00:00
Simon Pilgrim 8904a86f65 [X86] Remove AES/CLMUL/CRC32/LDDQU/MOVNT/POPCNT/SHA schedule itineraries (PR37093)
llvm-svn: 329912
2018-04-12 14:31:42 +00:00
Sander de Smalen 525e3225c2 [AArch64][AsmParser] Unify 'addVectorListOperands' functions.
Summary:
Merged 'addVectorList64Operands' and 'addVectorList128Operands' into a
generic 'addVectorListOperands', which can be easily extended to work
for SVE vectors.

This is patch [4/6] in a series to add assembler/disassembler support for
SVE's contiguous ST1 (scalar+imm) instructions.

Reviewers: fhahn, rengolin, javed.absar, huntergr, SjoerdMeijer, t.p.northover, echristo, evandro

Reviewed By: rengolin

Subscribers: kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D45430

llvm-svn: 329909
2018-04-12 13:19:32 +00:00
Simon Pilgrim 294556d40e [X86] Remove remaining system/special schedule itineraries (PR37093)
llvm-svn: 329906
2018-04-12 12:43:49 +00:00
Simon Dardis a5a3c38c3d [mips] Correct the predicates for special nops, tlb ctrl instrs, software breakpoint and prefx.
Reviewers: atanasyan, abeserminji

Differential Revision: https://reviews.llvm.org/D44436

llvm-svn: 329905
2018-04-12 12:37:02 +00:00
Simon Pilgrim 0cd0fbd8c5 [X86] Remove system/control schedule itineraries (PR37093)
llvm-svn: 329903
2018-04-12 12:09:24 +00:00
Sander de Smalen 650234ba36 [AArch64][AsmParser] Make parse function for VectorLists generic to other vector types.
Summary:
Added 'RegisterKind' to the VectorListOp structure, so that this operand 
type can be reused for SVE vector lists in a later patch. It also
refactors the 'tryParseVectorList' function so it can be used directly
in the ParserMethod of an operand. The parsing can now parse multiple 
kinds of vectors and recover if there is no match.

This is patch [3/6] in a series to add assembler/disassembler support for
SVE's contiguous ST1 (scalar+imm) instructions.

Reviewers: fhahn, rengolin, javed.absar, huntergr, SjoerdMeijer, t.p.northover, echristo, evandro

Reviewed By: rengolin

Subscribers: kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D45429

llvm-svn: 329900
2018-04-12 11:40:52 +00:00
Shiva Chen b48b027d05 [RISCV] Change function alignment to 4 bytes, and 2 bytes for RVC
Summary:

According RISC-V ELF psABI specification, base RV32 and RV64 ISAs only
allow 32-bit instruction alignment, but instruction allow to be aligned
to 16-bit boundaries for C-extension.

So we just align to 4 bytes and 2 bytes for C-extension is enough.

Reviewers: asb, apazos

Differential Revision: https://reviews.llvm.org/D45560

Patch by Kito Cheng.

llvm-svn: 329899
2018-04-12 11:30:59 +00:00
Simon Pilgrim 69e0e8e3d4 [X86] Remove CMOV/SETCC schedule itineraries (PR37093)
llvm-svn: 329898
2018-04-12 11:01:40 +00:00
Simon Pilgrim 10e3bdaaa8 [X86] Remove MMX/3DNow schedule itineraries (PR37093)
llvm-svn: 329896
2018-04-12 10:49:57 +00:00
Simon Pilgrim 32d368147f [X86] Remove X87 schedule itineraries (PR37093)
First of a number of commits to remove x86 schedule itineraries entirely - approved off-line with @craig.topper

llvm-svn: 329893
2018-04-12 10:27:37 +00:00
Jonas Paulsson 319ce96fe4 [SystemZ] Use ResourceCycles=30 for FPd unit (NFC).
This is better than listing FPd 30 times :-)

Review: Ulrich Weigand
llvm-svn: 329887
2018-04-12 08:08:42 +00:00
Jonas Paulsson e3f53e5d14 [SystemZ] Remove FullInstRWOverlapCheck from SchedMachineModels.
This is NFC, even though it caught just a few cases of overlapping regular
expressions.

Review: Ulrich Weigand
llvm-svn: 329886
2018-04-12 08:06:04 +00:00
Jonas Paulsson 26e171f0a7 [HexagonMachineScheduler] Remove local (copied) getWeakLeft().
Since the common code getWeakLeft() is now available, there should not
be a local copy of this function in target.

llvm-svn: 329885
2018-04-12 07:39:33 +00:00
Jonas Paulsson e8f1ac7063 [MachineScheduler] NFC refactoring
This patch makes tryCandidate() virtual and some utility functions like
tryLess(), tryGreater(), ... externally available (used to be static).

This makes it possible for a target to derive a new MachineSchedStrategy from
GenericScheduler and reuse most parts.

It was necessary to wrap functions with the same names in
AMDGPU/SIMachineScheduler in a local namespace.

Review: Andy Trick, Florian Hahn
https://reviews.llvm.org/D43329

llvm-svn: 329884
2018-04-12 07:21:39 +00:00
Alex Bradbury 21d28fe8b8 [RISCV] Codegen support for RV32D floating point comparison operations
Also add double-prevoius-failure.ll which captures a test case that at one
point triggered a compiler crash, while developing calling convention support
for f64 on RV32D with soft-float ABI.

llvm-svn: 329877
2018-04-12 05:50:06 +00:00
Alex Bradbury 60baa2e015 [RISCV] Codegen support for RV32D floating point conversion operations
This also includes support and a test for truncating stores, which are now
possible thanks to the fpround pattern.

llvm-svn: 329876
2018-04-12 05:47:15 +00:00
Alex Bradbury 5d0dfa5e0e [RISCV] Add codegen support for RV32D floating point arithmetic operations
llvm-svn: 329874
2018-04-12 05:42:42 +00:00
Alex Bradbury 0b4175f160 [RISCV] Codegen support for RV32D floating point load/store, fadd.d, calling conv
fadd.d is required in order to force floating point registers to be used in
test code, as parameters are passed in integer registers in the soft float
ABI.

Much of this patch is concerned with support for passing f64 on RV32D with a
soft-float ABI. Similar to Mips, introduce pseudoinstructions to build an f64
out of a pair of i32 and to split an f64 to a pair of i32. BUILD_PAIR and
EXTRACT_ELEMENT can't be used, as a BITCAST to i64 would be necessary, but i64
is not a legal type.

llvm-svn: 329871
2018-04-12 05:34:25 +00:00
Yan Luo bedca0b41b Test commit access
llvm-svn: 329870
2018-04-12 04:26:49 +00:00
Simon Pilgrim 7b88d09e75 [X86] Remove unused itinerary argument from FMA3/FMA4/XOP instructions. NFCI.
llvm-svn: 329862
2018-04-11 23:24:38 +00:00
whitequark 1ae61a6126 [LLVM-C] Add LLVMGetHostCPU{Name,Features}.
Without these functions it's hard to create a TargetMachine for
Orc JIT that creates efficient native code.

It's not sufficient to just expose LLVMGetHostCPUName(), because
for some CPUs there's fewer features actually available than
the CPU name indicates (e.g. AVX might be missing on some CPUs
identified as Skylake).

Differential Revision: https://reviews.llvm.org/D44861

llvm-svn: 329856
2018-04-11 22:40:42 +00:00
Nemanja Ivanovic c564dc060a [PowerPC] Fix condition for 64-bit rotate when replacing r+r instr with r+i
This patch fixes https://bugs.llvm.org/show_bug.cgi?id=37039
The condition only covers one of the two 64-bit rotate instructions. This just
adds the second (RLDICLo).

Patch by Josh Stone.

llvm-svn: 329852
2018-04-11 21:25:44 +00:00
Yonghong Song 149d4d3730 bpf: signal error instead of silent drop for certain invalid asm insn
Currently, an invalid asm insn, either in an asm file or
in an inline asm format, might be silently dropped. This patch
fixed two places where this may happen by
signaling the error so user knows what goes wrong.

The following is an example to demonstrate error messages:

    -bash-4.2$ cat t.c
    int test(void *ctx) {
    #if defined(NO_ERROR)
      asm volatile("r0 = *(u16 *)skb[%0]" : : "i"(2));
    #elif defined(ERROR_1)
      asm volatile("r20 = *(u16 *)skb[%0]" : : "i"(2));
    #elif defined(ERROR_2)
      asm volatile("r0 = *(u16 *)(r1 + ?)" : :);
    #endif
      return 0;
    }
    -bash-4.2$ cat run.sh
    for macro in NO_ERROR ERROR_1 ERROR_2; do
      echo "===== compile for macro" $macro
      clang -D${macro} -O2 -target bpf -emit-llvm -S t.c
      echo "==llc=="
      llc -march=bpf -filetype=obj t.ll
    done
    -bash-4.2$ ./run.sh
    ===== compile for macro NO_ERROR
    ==llc==
    ===== compile for macro ERROR_1
    ==llc==
    <inline asm>:1:2: error: invalid register/token name
            r20 = *(u16 *)skb[2]
            ^
    note: !srcloc = 135
    ===== compile for macro ERROR_2
    ==llc==
    <inline asm>:1:21: error: unexpected token
            r0 = *(u16 *)(r1 + ?)
                               ^
    note: !srcloc = 210
    -bash-4.2$

Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Yonghong Song <yhs@fb.com>
llvm-svn: 329849
2018-04-11 20:24:52 +00:00
Gabor Buella 2ef36f3571 [X86] Describe wbnoinvd instruction
Similar to the wbinvd instruction, except this
one does not invalidate caches. Ring 0 only.
The encoding matches a wbinvd instruction with
an F3 prefix.

Reviewers: craig.topper, zvi, ashlykov

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D43816

llvm-svn: 329847
2018-04-11 20:01:57 +00:00
Simon Pilgrim 8fc2b49620 [X86][Atom] Convert Atom scheduler model to SchedRW (PR32431)
Atom is the only x86 target that still uses schedule itineraries, if we can remove this then we can begin the work on removing x86 itineraries. I've also found that it will help with PR36550.

I've focussed on matching the existing model as closely as possible (relying on the schedule tests), PR36895 indicated a lot of these were incorrect but we can just as easily fix these after this patch as before. Hopefully we can get llvm-exegesis to help here,

There are a few instructions that rely on itinerary scheduling (mainly push/pop/return) of multiple resource stages, but I don't think any of these are show stoppers.

There are also a few codegen changes that seem related to the post-ra scheduler acting a little differently, I haven't tracked these down but they don't seem critical.

NOTE: I don't have access to any Atom hardware, so this hasn't been tested in the wild.

Differential Revision: https://reviews.llvm.org/D45486

llvm-svn: 329837
2018-04-11 18:23:01 +00:00
Simon Pilgrim 7f321d8c24 [X86] Generalize X86PadShortFunction to work with TargetSchedModel
Pre-commit for D45486, don't rely on itinerary scheduler model to determine latencies for padding, use the generic TargetSchedModel::computeInstrLatency call.

Also, replace hard coded (atom specific) 2*uop creation per padding cycle with a version based on the scheduler model's issue width.

Differential Revision: https://reviews.llvm.org/D45486

llvm-svn: 329834
2018-04-11 18:05:17 +00:00
Artem Belevich 2f8efcf3ca [NVPTX] Removed 'satom' feature which is no longer used.
Differential Revision: https://reviews.llvm.org/D45061

llvm-svn: 329830
2018-04-11 17:51:33 +00:00
Artem Belevich 24e8a680e5 [NVPTX, CUDA] Improved feature constraints on NVPTX target builtins.
When NVPTX TARGET_BUILTIN specifies sm_XX or ptxYY as required feature,
consider those features available if we're compiling for GPU >= sm_XX or have
enabled PTX version >= ptxYY.

Differential Revision: https://reviews.llvm.org/D45061

llvm-svn: 329829
2018-04-11 17:51:19 +00:00
Tim Renouf fd8d4af3bc [AMDGPU] Ensure there are enough registers for wave dispatch
Summary:
This fixes the number of SGPRs and VGPRs in the *_RSRC1 register to
allow for registers set up in wave dispatch, even if those registers are
not used in the shader.

Re-landed after noticing that the buildbot failure from 329808 seemed to
be unrelated.

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D45503

Change-Id: I6575f0e0d2a528d1319d0b289f0ebe4510fa5771
llvm-svn: 329826
2018-04-11 17:18:36 +00:00
Petar Jovanovic 366857a23a [MIPS GlobalISel] Select add i32, i32
Add the minimal support necessary to lower a function that returns the
sum of two i32 values.
Support argument/return lowering of i32 values through registers only.
Add tablegen for regbankselect and instructionselect.

Patch by Petar Avramovic.

Differential Revision: https://reviews.llvm.org/D44304

llvm-svn: 329819
2018-04-11 15:12:32 +00:00
Yaxun Liu 9381ae9791 [AMDGPU] Fix lowering enqueue_kernel
Two issues were fixed:

runtime has difficulty to allocate memory for an external symbol of a
kernel and set the address of the external symbol, therefore make the runtime
handle of an enqueued kernel an ordinary global variable. Runtime only needs
to store the address of the loaded kernel to the handle and has verified
that this approach works.

handle the situation where __enqueue_kernel* gets inlined therefore
the enqueued kernel may be used through a constant expr instead
of an instruction.

Differential Revision: https://reviews.llvm.org/D45187

llvm-svn: 329815
2018-04-11 14:46:15 +00:00
Tim Renouf 8ca33bfcf3 Revert "[AMDGPU] Ensure there are enough registers for wave dispatch"
This reverts 329808. That change caused a report of a failure in
test/CodeGen/MIR/AMDGPU/mir-canon-multi.mir that I didn't see. I suspect
it is an expensive-check-only error.

Change-Id: I8133f26f15e7d5ec2b09c687c12cd70e918461b0
llvm-svn: 329811
2018-04-11 14:27:41 +00:00
Sander de Smalen c88f9a1a57 [AArch64][AsmParser] Split index parsing from vector list.
Summary:
Place parsing of a vector index into a separate function to reduce
duplication, since the code is duplicated in both the parsing of a
Neon vector register operand and a Neon vector list.

This is patch [2/6] in a series to add assembler/disassembler support for
SVE's contiguous ST1 (scalar+imm) instructions.

Reviewers: fhahn, rengolin, javed.absar, huntergr, SjoerdMeijer, t.p.northover, echristo, evandro

Reviewed By: rengolin

Subscribers: kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D45428

llvm-svn: 329809
2018-04-11 14:10:37 +00:00
Tim Renouf f26b723491 [AMDGPU] Ensure there are enough registers for wave dispatch
Summary:
This fixes the number of SGPRs and VGPRs in the *_RSRC1 register to
allow for registers set up in wave dispatch, even if those registers are
not used in the shader.

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D45503

Change-Id: I6575f0e0d2a528d1319d0b289f0ebe4510fa5771
llvm-svn: 329808
2018-04-11 14:02:41 +00:00
Simon Pilgrim 89c8a10f7c [X86] Add variable shuffle schedule classes
Split variable index shuffles from immediate index shuffles

WriteFVarShuffle - variable 'in-lane' shuffles (VPERMILPS/VPERMIL2PS etc.)
WriteVarShuffle - variable 'in-lane' shuffles (PSHUFB/VPPERM etc.)

WriteFVarShuffle256 - variable 'cross-lane' shuffles (VPERMPS etc.)
WriteVarShuffle256 - variable 'cross-lane' shuffles (VPERMD etc.)

Differential Revision: https://reviews.llvm.org/D45404

llvm-svn: 329806
2018-04-11 13:49:19 +00:00
Dmitry Preobrazhensky fc715551a3 [AMDGPU][MC][GFX9] Added v_screen_partition_4se_b32
See bug 36845: https://bugs.llvm.org/show_bug.cgi?id=36845

Differential Revision: https://reviews.llvm.org/D45443

Reviewers: artem.tamazov, arsenm, timcorringham
llvm-svn: 329801
2018-04-11 13:13:30 +00:00
Francis Visoiu Mistrih 6463922e3a [AArch64] Fix regression after r329691
In r329691, we would choose FP even if the offset wouldn't fit, just
because the offset is smaller than the one from BP. This made many
accesses through FP need to scavenge a register, which resulted in
slower and bigger code for no good reason.

This patch now always picks the offset that fits first, even if FP is
preferred.

llvm-svn: 329797
2018-04-11 12:36:55 +00:00
Sjoerd Meijer ac96d7c4b3 [ARM] FP16 VSEL codegen
This is a follow up of rL327695 to instruction select more variants of VSELGT
and VSELGE, for which it is necessary to custom lower SELECT.

More work is required in this area, which will be addressed soon:
- more variants need to be regression tested, but this depends on the next point.
- first LowerConstantFP need to be adjusted for fp16 values.

Differential Revision: https://reviews.llvm.org/D45205

llvm-svn: 329788
2018-04-11 09:28:04 +00:00
Sander de Smalen 73937b7c9d [AArch64][AsmParser] Unify code for parsing Neon/SVE vectors.
Summary:
Merged 'tryMatchVectorRegister' (specific to Neon) and
'tryParseSVERegister' into a single 'tryParseVectorRegister' function, and
created a generic 'parseVectorKind()' function that returns the #Elements
and ElementWidth of a vector suffix. This reduces the duplication of
this functionality between two the vector implementations.

This is patch [1/6] in a series to add assembler/disassembler support for
SVE's contiguous ST1 (scalar+imm) instructions.

Reviewers: fhahn, rengolin, javed.absar, huntergr, SjoerdMeijer, t.p.northover, echristo, evandro

Reviewed By: fhahn

Subscribers: tschuett, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D45427

llvm-svn: 329782
2018-04-11 07:36:10 +00:00
Craig Topper 9507fa358c [X86] Remove 128/256-bit masked pmaddubsw and pmaddwd intrinsics. Replace 512-bit masked intrinsic with unmasked intrinsic and a select.
The 128/256-bit versions were no longer used by clang. It uses the legacy SSE/AVX2 version and a select. The 512-bit was changed to the same for consistency.

llvm-svn: 329774
2018-04-11 04:55:04 +00:00
Craig Topper ee2c1dea4d [X86] In X86FlagsCopyLowering, when rewriting a memory setcc we need to emit an explicit MOV8mr instruction.
Previously the code only knew how to handle setcc to a register.

This should fix a crash in the chromium build.

llvm-svn: 329771
2018-04-11 01:09:10 +00:00
Sriraman Tallam d693093a65 GOTPCREL references must always use RIP.
With -fno-plt, global value references can use GOTPCREL and RIP must be used.

Differential Revision: https://reviews.llvm.org/D45460

llvm-svn: 329765
2018-04-10 22:50:05 +00:00
Marek Olsak a9a58fa236 AMDGPU: enable 128-bit for local addr space under an option
Author: Samuel Pitoiset

ds_read_b128 and ds_write_b128 have been recently enabled
under the amdgpu-ds128 option because the performance benefit
is unclear.

Though, using 128-bit loads/stores for the local address space
appears to introduce regressions in tessellation shaders. Not
sure what is broken, but as ds_read_b128/ds_write_b128 are not
enabled by default, just introduce a global option and enable
128-bit only if requested (until it's fixed/used correctly).

v2: - fix regressions in merge-stores.ll and multiple_tails.ll

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105464
llvm-svn: 329764
2018-04-10 22:48:23 +00:00
Geoff Berry 5696e075c3 [AArch64][Falkor] Fix bug in Falkor HWPF collision avoidance pass.
Summary:
When inserting MOVs to avoid Falkor HWPF collisions, the non-base
register operand of load instructions (e.g. a register offset) was not
being considered live, so it could potentially have been used as a
scratch register, clobbering the actual offset value.

Reviewers: mcrosier

Subscribers: rengolin, javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D45502

llvm-svn: 329761
2018-04-10 21:43:03 +00:00
Jessica Paquette a450ed2352 Recommit r329716 "Add missing nullptr check before getSection() to AArch64MachObjectWriter::recordRelocation"
This commit fixes the bot failures that were coming up before with r329716.

The fix was to move the check for "isInSection()" inside of the if condition
and emit the error there instead of waiting to get past the unreachable statement.

This should work in debug and release builds now.

llvm-svn: 329746
2018-04-10 19:46:43 +00:00
Amara Emerson e27d5016ef [AArch64] Fix isel failure when BUILD_PAIR nodes are left over.
rdar://39175175

llvm-svn: 329743
2018-04-10 19:01:58 +00:00
Gabor Buella 213edc4a15 [X86] Split up -march=icelake to -client & -server
Reviewers: craig.topper, zvi, echristo

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D45055

llvm-svn: 329742
2018-04-10 18:59:13 +00:00
Craig Topper 442428540a [X86] Change the name string for the newly add DF flag register to 'dirflag' to match the clobber name supported by clang for MS inline assembly.
This should fix the failure found by Chromium reported here https://bugs.chromium.org/p/chromium/issues/detail?id=831158

The test case will be added in clang.

llvm-svn: 329734
2018-04-10 18:21:04 +00:00
Jessica Paquette c140bbddaf Revert 329716 "Add missing nullptr check before getSection() to AArch64MachObjectWriter::recordRelocation"
This broke a bunch of bots so I'm reverting while I figure it out.

llvm-svn: 329728
2018-04-10 17:53:41 +00:00
Peter Collingbourne a7d936f0c0 Revert r329611, "AArch64: Allow offsets to be folded into addresses with ELF."
Caused a build failure in check-tsan.

llvm-svn: 329718
2018-04-10 16:19:30 +00:00
Jessica Paquette e4b90d82a0 Add missing nullptr check to AArch64MachObjectWriter::recordRelocation
There was missing nullptr check before a call to getSection() in
recordRelocation. This would result in a segfault in code like the attached
test.

This adds the missing check and a test which makes sure we get the expected 
error output.

llvm-svn: 329716
2018-04-10 15:53:28 +00:00
Nicolai Haehnle b1c3b22b4c AMDGPU/MC: Allow disassembling without symbol info
Summary:
We would like the UMR debugging tool[0] to be able to provide
disassembly for currently live waves based on plain memory
dumps, and we want to leverage the LLVM disassembler for this.

This mostly works, except that UMR clearly can't provide real
symbol info, so it wants to set DisInfo == nullptr.

[0] https://cgit.freedesktop.org/amd/umr/

Reviewers: arsenm, rampitec, artem.tamazov, dp

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D45477

Change-Id: Ibb2c5af2e66f2e100b4702fd81308e1932bc4ee6
llvm-svn: 329715
2018-04-10 15:46:43 +00:00
Chad Rosier af7519e9af Fix spelling. NFC.
llvm-svn: 329709
2018-04-10 14:57:13 +00:00
Simon Pilgrim 95f941117c Fix whitespace indentation. NFCI.
llvm-svn: 329704
2018-04-10 14:21:33 +00:00
Gabor Buella 3eab22d896 [X86] Disable SGX for Skylake Server
Reviewers: craig.topper, zvi, echristo

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D45057

llvm-svn: 329700
2018-04-10 13:58:57 +00:00
Francis Visoiu Mistrih f2c22050e8 [AArch64] Use FP to access the emergency spill slot
In the presence of variable-sized stack objects, we always picked the
base pointer when resolving frame indices if it was available.

This makes us hit an assert where we can't reach the emergency spill
slot if it's too far away from the base pointer. Since on AArch64 we
decide to place the emergency spill slot at the top of the frame, it
makes more sense to use FP to access it.

The changes here don't affect only emergency spill slots but all the
frame indices. The goal here is to try to choose between FP, BP and SP
so that we minimize the offset and avoid scavenging, or worse, asserting
when trying to access a slot allocated by the scavenger.

Previously discussed here: https://reviews.llvm.org/D40876.

Differential Revision: https://reviews.llvm.org/D45358

llvm-svn: 329691
2018-04-10 11:29:40 +00:00
Tim Renouf 7190a4692a [AMDGPU] For OS type AMDPAL, fixed scratch on compute shader
Summary:
For OS type AMDPAL, the scratch descriptor is loaded from offset 0 of
the GIT, whose 32 bit pointer is in s0 (s8 for gfx9 merged shaders).

This commit fixes that to use offset 0x10 instead of offset 0 for a
compute shader, per the PAL ABI spec.

V2: Ensure s0 (s8 for gfx9 merged shader) is marked live-in when loading
scratch descriptor from GIT.

Reviewers: kzhuravl, nhaehnle, timcorringham

Subscribers: kzhuravl, wdng, yaxunl, t-tye, llvm-commits, dstuttard, nhaehnle, arsenm

Differential Revision: https://reviews.llvm.org/D44468

Change-Id: I93dffa647758e37f613bb5e0dfca840d82e6d26f
llvm-svn: 329690
2018-04-10 11:25:15 +00:00
Tim Northover 6a1c51bf6b AArch64: diagnose unpredictable store-exclusive instructions
Much like any written register in load/store instructions, the status register
is not allowed to overlap with any others. So diagnose it like we already do
with the other cases.

llvm-svn: 329687
2018-04-10 11:04:29 +00:00