Commit Graph

49048 Commits

Author SHA1 Message Date
Simon Pilgrim 1c1335a10d [X86][BMI1] Fix BLSI/BLSMSK/BLSR BMI1 scheduling on btver2
These have the same behaviour as tzcnt on btver2 - confirmed with AMD 16h SOG, Agner and instlatx64.

llvm-svn: 342235
2018-09-14 13:31:14 +00:00
Simon Pilgrim 6a47cdbdec [X86][BMI1] Add scheduler class for BLSI/BLSMSK/BLSR BMI1 instructions
llvm-svn: 342234
2018-09-14 13:09:56 +00:00
David Stuttard 20de3e99b5 [AMDGPU] Ensure trig range reduction only used for subtargets that require it
Summary:
GFX9 and above support sin/cos instructions with a greater range and thus don't
require a fract instruction prior to invocation.

Added a subtarget feature to reflect this and added code to take advantage of
expanded range on GFX9+

Also updated the tests to check correct behaviour

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D51933

Change-Id: I1c1f1d3726a5ae32116646ca5cfa1ab4ef69e5b0
llvm-svn: 342222
2018-09-14 10:27:19 +00:00
Sam Parker 7b84fd7847 [ARM] bottom-top mul support in ARMParallelDSP
On failing to find sequences that can be converted into dual macs,
try to find sequential 16-bit loads that are used by muls which we
can then use smultb, smulbt, smultt with a wide load.

Differential Revision: https://reviews.llvm.org/D51983

llvm-svn: 342210
2018-09-14 08:09:09 +00:00
Jonas Paulsson 77df2f2f38 [SystemZ] Adjust cost functions for subtargets that use LI + LOC instead of IPM
After recent improvements which makes better use of LOC instead of IPM, the
TTI cost functions also needs to be updated to reflect this.

This involves sext, zext and xor of i1.

The tests were updated so that for z13 the new costs are expected, while the
old costs are still checked for on zEC12.

Review: Ulrich Weigand
https://reviews.llvm.org/D51339

llvm-svn: 342207
2018-09-14 06:46:55 +00:00
Tim Renouf c8af6a46fa [AMDGPU] Removed unused method
Summary:
I accidentally left this behind in D50306, and it causes a build warning
when I build with gcc7.

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D52022

Change-Id: I30f7a47047e9d9d841f652da66d2fea19e74842c
llvm-svn: 342189
2018-09-13 21:56:25 +00:00
Nirav Dave 59ad1c8457 [X86] Fix register resizings for inline assembly register operands.
When replacing a named register input to the appropriately sized
sub/super-register. In the case of a 64-bit value being assigned to a
register in 32-bit mode, match GCC's assignment.

Reviewers: eli.friedman, craig.topper

Subscribers: nickdesaulniers, llvm-commits, hiraditya

Differential Revision: https://reviews.llvm.org/D51502

llvm-svn: 342175
2018-09-13 20:33:56 +00:00
Nirav Dave 2060a16dfd [X86] Cleanup pair returns. NFCI.
llvm-svn: 342174
2018-09-13 20:33:27 +00:00
Ana Pazos 065b088759 [RISCV][MC] Reject bare symbols for the simm6 and simm6nonzero operand types
Summary:
Fixed assertions due to invalid fixup when encoding compressed instructions
 (c.addi, c.addiw, c.li, c.andi) with bare symbols with/without modifiers.
  This matches GAS behavior as well.

This bug was uncovered by a LLVM MC Disassembler Protocol Buffer Fuzzer
for the RISC-V assembly language.

Reviewers: asb

Reviewed By: asb

Subscribers: rbar, johnrusso, simoncook, sabuasal, niosHD, kito-cheng, shiva0217, zzheng, edward-jones, mgrang, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, asb

Differential Revision: https://reviews.llvm.org/D52005

llvm-svn: 342160
2018-09-13 18:37:23 +00:00
Ana Pazos b0799dda77 [RISCV] Fix decoding of invalid instruction with C extension enabled.
Summary:
The illegal instruction 0x00 0x00 is being wrongly decoded as
c.addi4spn with 0 immediate.

The invalid instruction 0x01 0x61 is being wrongly decoded as
c.addi16sp with 0 immediate.

This bug was uncovered by a LLVM MC Disassembler Protocol Buffer Fuzzer
for the RISC-V assembly language.

Reviewers: asb

Reviewed By: asb

Subscribers: rbar, johnrusso, simoncook, sabuasal, niosHD, kito-cheng, shiva0217, zzheng, edward-jones, mgrang, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, asb

Differential Revision: https://reviews.llvm.org/D51815

llvm-svn: 342159
2018-09-13 18:21:19 +00:00
Sam Clegg 79c054f6b8 [WebAssembly] Fix signature of `main` in FixFunctionBitcasts
Also, add a check to ensure that when main has the expected signature
we do not create a wrapper.

Differential Revision: https://reviews.llvm.org/D51562

llvm-svn: 342157
2018-09-13 17:13:10 +00:00
Sam Parker aaec3c6260 [ARM] Allow truncs as sources in ARM CGP
We previously only allowed truncs as sinks, but now allow them as
sources too. We do this by checking that the result type is the
narrow type that we're trying to optimise for.

Differential Revision: https://reviews.llvm.org/D51978

llvm-svn: 342141
2018-09-13 15:14:12 +00:00
Sam Parker 96f77f142b [ARM] Fix FixConst for ARMCodeGenPrepare
Part of FixConsts wrongly assumes either a 8- or 16-bit constant
which can result in the wrong constants being generated during
promotion.

Differential Revision: https://reviews.llvm.org/D52032

llvm-svn: 342140
2018-09-13 14:48:10 +00:00
Matt Arsenault ff987ac6ea AMDGPU: Fix not preserving alignent in call setups
If an argument was passed on the stack, this
was using the default alignment.

I'm not sure there's an observable change from this. This
was observable due to bugs in expansion of unaligned
loads and stores, but since that is fixed I don't think
this matters much.

llvm-svn: 342133
2018-09-13 12:14:31 +00:00
Tim Northover c15d47bb01 ARM: align loops to 4 bytes on Cortex-M3 and Cortex-M4.
The Technical Reference Manuals for these two CPUs state that branching
to an unaligned 32-bit instruction incurs an extra pipeline reload
penalty. That's bad.

This also enables the optimization at -Os since it costs on average one
byte per loop in return for 1 cycle per iteration, which is pretty good
going.

llvm-svn: 342127
2018-09-13 10:28:05 +00:00
Alexander Timofeev 4d302f6911 [AMDGPU] Load divergence predicate refactoring
Differential revision: https://reviews.llvm.org/D51931

    Reviewers: rampitec

llvm-svn: 342120
2018-09-13 09:06:56 +00:00
Simon Atanasyan c49da2e4ed [mips] Enable the mnemonic spell corrector
This implements suggesting alternative mnemonics when an invalid one is
specified. For example `addru $9, $6, 17767` leads to the following
error message:

error: unknown instruction, did you mean: add, addiu, addu, maddu?

Differential revision: https://reviews.llvm.org/D40646

llvm-svn: 342119
2018-09-13 08:38:03 +00:00
Alexander Timofeev 2fb44808b1 [AMDGPU] Preliminary patch for divergence driven instruction selection. Load offset inlining pattern changed.
Differential revision: https://reviews.llvm.org/D51975

    Reviewers: rampitec

llvm-svn: 342115
2018-09-13 06:34:56 +00:00
Craig Topper f107123a88 [X86] Type legalize v2i32 div/rem by scalarizing rather than promoting
Summary:
Previously we type legalized v2i32 div/rem by promoting to v2i64. But we don't support div/rem of vectors so op legalization would then scalarize it using i64 scalar ops since it doesn't know about the original promotion. 64-bit scalar divides on Intel hardware are known to be slow and in 32-bit mode they require a libcall.

This patch switches type legalization to do the scalarizing itself using i32.

It looks like the division by power of 2 optimization is still kicking in and leaving the code as a vector. The division by other constant optimization doesn't kick in pre type legalization since it ignores illegal types. And previously, after type legalization we scalarized the v2i64 since we don't have v2i64 MULHS/MULHU support.

Another option might be to widen v2i32 to v4i32 so we could do division by constant optimizations, but we'd have to be careful to only do that for constant divisors or we risk scalaring to 4 scalar divides.

Reviewers: RKSimon, spatel

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D51325

llvm-svn: 342114
2018-09-13 06:13:37 +00:00
Saleem Abdulrasool aaa72c547b ARM: correct the relocation type for `bl` on WoA
The `IMAGE_REL_ARM_BRANCH20T` applies only to a `b.w` instruction.  A
thumb-2 `bl` should be relocated using a `IMAGE_REL_ARM_BRANCH24T`.
Correct the relocation that we emit in such a case.

Resolves PR38620!  Based on the patch by Jordan Rhee!

llvm-svn: 342109
2018-09-13 04:55:08 +00:00
Thomas Lively 65825cd7c5 Remove isAsCheapAsAMove from v128.const
llvm-svn: 342106
2018-09-13 02:50:57 +00:00
Thomas Lively 17ba6becaa Remove isAsCheapAsAMove from mem ops
llvm-svn: 342105
2018-09-13 02:50:57 +00:00
Thomas Lively 56b34f6c51 [WebAssembly] Add missing SIMD instruction attributes
Summary:
These attributes are copied from equivalent instructions in
WebAssemblyInstrInfo.td.

Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D51518

llvm-svn: 342104
2018-09-13 02:50:56 +00:00
Krzysztof Parzyszek a6d4fc0e29 [Hexagon] Use shuffles when lowering "gather" shufflevectors
Shufflevector instructions in LLVM IR that extract a subset of elements
of a longer input into a shorter vector can be done using VECTOR_SHUFFLEs.
This will avoid expanding them into constly extracts and inserts.

llvm-svn: 342091
2018-09-12 22:14:52 +00:00
Krzysztof Parzyszek f853741142 [Hexagon] Improve the selection algorithm in scalarizeShuffle
Use topological ordering for newly generated nodes.

llvm-svn: 342090
2018-09-12 22:10:58 +00:00
Heejin Ahn 300f42fbce [WebAssembly] Make tied inline asm operands work again
Summary:
rL341389 broke code with tied register operands in inline assembly. For
example, `asm("" : "=r"(var) : "0"(var));`
The code above specifies the input operand to be in the same register
with the output operand, tying the two register. This patch makes this
kind of code work again.

Reviewers: dschuff

Subscribers: sbc100, jgravelle-google, eraman, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D51991

llvm-svn: 342084
2018-09-12 21:34:39 +00:00
Krzysztof Parzyszek cd95e03cf0 [Hexagon] Use legalized type for extracted elements in scalarizeShuffle
Scalarization of a shuffle will break up the source vectors into individual
elements, and use them to assemble the resulting vector. An element type of
a legal vector type may not necessarily be a legal scalar type, so make
sure that the extracted values are extended to a legal scalar type.

llvm-svn: 342079
2018-09-12 20:58:48 +00:00
Konstantin Zhuravlyov 6e551e0e49 AMDGPU: Print all kernel descriptor directives (including the ones with default values)
Change by Tony Tye

Differential Revision: https://reviews.llvm.org/D51954

llvm-svn: 342077
2018-09-12 20:25:39 +00:00
Konstantin Zhuravlyov 71e43ee47d AMDGPU: Re-apply r341982 after fixing the layering issue
Move isa version determination into TargetParser.

Also switch away from target features to CPU string when
determining isa version. This fixes an issue when we
output wrong isa version in the object code when features
of a particular CPU are altered (i.e. gfx902 w/o xnack
used to result in gfx900).

llvm-svn: 342069
2018-09-12 18:50:47 +00:00
Thomas Lively ebd4c906d8 [WebAssembly] SIMD comparisons
Summary:
Match the ordering semantics of non-vector comparisons. For
floating point comparisons that do not correspond to instructions, the
tests check that some vector comparison instruction was emitted but do
not care about the full implementation.

Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D51765

llvm-svn: 342064
2018-09-12 17:56:00 +00:00
Diogo N. Sampaio 01b916e188 [ARM] Tighten f64<->f16 conversion requirements
Fix missing Requires fields.

Patch by Bernard Ogden (bogden)

Reviewers: SjoerdMeijer, javed.absar, t.p.northover	

Reviewed By: t.p.northover

Differential Revision: https://reviews.llvm.org/D51631

llvm-svn: 342061
2018-09-12 16:24:43 +00:00
Craig Topper 2262613532 [X86] Remove isel patterns for ADCX instruction
There's no advantage to this instruction unless you need to avoid touching other flag bits. It's encoding is longer, it can't fold an immediate, it doesn't write all the flags.

I don't think gcc will generate this instruction either.

Fixes PR38852.

Differential Revision: https://reviews.llvm.org/D51754

llvm-svn: 342059
2018-09-12 15:47:34 +00:00
Sander de Smalen 2d77e788f2 [AArch64] Implement aarch64_vector_pcs codegen support.
This patch adds codegen support for the saving/restoring
V8-V23 for functions specified with the aarch64_vector_pcs
calling convention attribute, as added in patch D51477.

Reviewers: t.p.northover, gberry, thegameg, rengolin, javed.absar, MatzeB

Reviewed By: thegameg

Differential Revision: https://reviews.llvm.org/D51479

llvm-svn: 342049
2018-09-12 12:10:22 +00:00
Sam Parker 1187911b0b [ARM] Follow-up to rL342033
Fixed typo which can cause segfault.

llvm-svn: 342040
2018-09-12 09:58:56 +00:00
Sander de Smalen 7140363cd0 [AArch64] NFC: Refactoring to prepare for vector PCS.
This patch refactors several parts of AArch64FrameLowering
so that it can be easily extended to support saving/restoring
of FPR128 (Q) registers.

Reviewers: t.p.northover, gberry, thegameg, rengolin, javed.absar

Reviewed By: thegameg

Differential Revision: https://reviews.llvm.org/D51478

llvm-svn: 342038
2018-09-12 09:44:46 +00:00
Sam Parker a023c7a9cb [ARM] Exchange MAC operands in ARMParallelDSP
SMLAD and SMLALD instructions also come in the form of SMLADX and
SMLALDX which perform an exchange on their second operand. To support
this, more of the loads in the MAC candidates are compared for
sequential access and a boolean value has been added to BinOpChain.

AddMACCandiate has been refactored into a small pattern matching
state machine to reduce the amount of duplicated code, but also to
enable the matching to be more flexible. CreateParallelMACPairs now
iterates through all the candidates to find parallel ones.

Differential Revision: https://reviews.llvm.org/D51424

llvm-svn: 342033
2018-09-12 09:17:44 +00:00
Sam Parker 569b24549e [ARM] Allow bitcasts in ARMCodeGenPrepare
Allow bitcasts in the use-def chains, treating them as sources.

Differential Revision: https://reviews.llvm.org/D50758

llvm-svn: 342032
2018-09-12 09:11:48 +00:00
Sander de Smalen 4dbc512676 [AArch64] Add parsing of aarch64_vector_pcs attribute.
This patch adds parsing support for the 'aarch64_vector_pcs'
calling convention attribute to calls and function declarations.

More information describing the vector ABI and procedure call standard
can be found here:

  https://developer.arm.com/products/software-development-tools/\
                            hpc/arm-compiler-for-hpc/vector-function-abi

Reviewers: t.p.northover, rnk, rengolin, javed.absar, thegameg, SjoerdMeijer

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D51477

llvm-svn: 342030
2018-09-12 08:54:06 +00:00
Ilya Biryukov 95066496d0 Revert "AMDGPU: Move isa version and EF_AMDGPU_MACH_* determination into TargetParser."
This reverts commit r341982.

The change introduced a layering violation. Reverting to unbreak
our integrate.

llvm-svn: 342023
2018-09-12 07:05:30 +00:00
Craig Topper dc32e91bc6 [X86] Teach X86SelectionDAGInfo::EmitTargetCodeForMemcpy about GNUX32
Summary:
In GNUX23, is64BitMode returns true, but pointers are 32-bits. So we shouldn't copy pointer values into RSI/RDI since the widths don't match.

Fixes PR38865 despite what the title says. I think the llvm_unreachable in the copyPhysReg code tricked the optimizer and made the fatal error trigger.

Reviewers: rnk, efriedma, MatzeB, echristo

Reviewed By: efriedma

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D51893

llvm-svn: 342015
2018-09-12 01:57:22 +00:00
Konstantin Zhuravlyov 941615e4c8 AMDGPU: Move isa version and EF_AMDGPU_MACH_* determination
into TargetParser.

Also switch away from target features to CPU string when
determining isa version. This fixes an issue when we
output wrong isa version in the object code when features
of a particular CPU are altered (i.e. gfx902 w/o xnack
used to result in gfx900).

Differential Revision: https://reviews.llvm.org/D51890

llvm-svn: 341982
2018-09-11 18:56:51 +00:00
Craig Topper 8238580aae [X86] Prefer unpckhpd over movhlps in isel for fake unary cases
In r337348, I changed lowering to prefer X86ISD::UNPCKL/UNPCKH opcodes over MOVLHPS/MOVHLPS for v2f64 {0,0} and {1,1} shuffles when we have SSE2. This enabled the removal of a bunch of weirdly bitcasted isel patterns in r337349. To avoid changing the tests I placed a gross hack in isel to still emit movhlps instructions for fake unary unpckh nodes. A similar hack was not needed for unpckl and movlhps because we do execution domain switching for those. But unpckh and movhlps have swapped operand order.

This patch removes the hack.

This is a code size increase since unpckhpd requires a 0x66 prefix and movhlps does not. But if that's a big concern we should be using movhlps for all unpckhpd opcodes and let commuteInstruction turnit into unpckhpd when its an advantage.

Differential Revision: https://reviews.llvm.org/D49499

llvm-svn: 341973
2018-09-11 17:57:27 +00:00
Craig Topper cc9efaffad [X86] Teach X86FastISel::X86SelectRet to use EAX for the sret pointer in GNUX32
GNUX32 uses 32-bit pointers despite is64BitMode being true. So we should use EAX to return the value.

Fixes ones of the failures from PR38865.

Differential Revision: https://reviews.llvm.org/D51940

llvm-svn: 341972
2018-09-11 17:57:23 +00:00
Josh Stone aca532f14d Test commit: remove trailing whitespace
llvm-svn: 341966
2018-09-11 17:28:43 +00:00
Craig Topper d7362a3e5f [X86] Correct the one use check from r341915.
The one use check should be on the bitcast, not the input to the bitcast.

llvm-svn: 341956
2018-09-11 16:05:03 +00:00
Simon Atanasyan 16c2311c59 [MIPS] Fix illegal type assert in single-float mode
An fp_to_sint node would be incorrectly lowered to a TruncIntFP node in
single-float mode. This would trigger an "Unexpected illegal type!"
assert.

Patch by Dan Ravensloft.

Differential revision: https://reviews.llvm.org/D51810

llvm-svn: 341952
2018-09-11 15:32:47 +00:00
Sam Parker 01db2983cd [ARM] Add smlald support in ARMParallelDSP
Search from i64 reducing phis, as well as i32, to allow the
generation of smlald instructions.

Differential Revision: https://reviews.llvm.org/D51101

llvm-svn: 341941
2018-09-11 14:01:22 +00:00
Sam Parker 945604d511 [ARM] Enable ARMCodeGenPrepare by default
We've had the pass enabled downstream for a couple of weeks and it
seems to be okay, so enable it by default.

Differential Revision: https://reviews.llvm.org/D51920

llvm-svn: 341932
2018-09-11 12:45:43 +00:00
Alexander Timofeev db7ee7660a [AMDGPU] Preliminary patch for divergence driven instruction selection. Immediate selection predicate changed
Differential revision: https://reviews.llvm.org/D51734
Reviewers: rampitec

llvm-svn: 341928
2018-09-11 11:56:50 +00:00
Simon Atanasyan 32d8d1bf04 [mips] Add a pattern for 64-bit GPR variant of the `rdhwr` instruction
MIPS ISAs start to support third operand for the `rdhwr` instruction
starting from Revision 6. But LLVM generates assembler code with
three-operands version of this instruction on any MIPS64 ISA. The third
operand is always zero, so in case of direct code generation we get
correct code.

This patch fixes the bug by adding an instruction alias. The same alias
already exists for 32-bit ISA.

Ideally, we also need to reject three-operands version of the `rdhwr`
instruction in an assembler code if ISA revision is less than 6. That is
a task for a separate patch.

This fixes PR38861 (https://bugs.llvm.org/show_bug.cgi?id=38861)

Differential revision: https://reviews.llvm.org/D51773

llvm-svn: 341919
2018-09-11 09:57:25 +00:00