LanaiMCCodeEmitter.cpp was not using any APIs from Lanai.h, and was only
including it for transitive dependencies. Doing so is problematic from
include-what-you-use perspective, but it is also a layering issue (it
creates a dependency cycle between the primary Lanai target library and
the MCTargetDesc library).
llvm-svn: 362394
HexagonInstPrinter.cpp was not using any APIs from HexagonAsmPrinter.h.
Doing so is problematic from include-what-you-use perspective, but it is
also a layering issue (it creates a dependency cycle between the primary
Hexagon target library and the MCTargetDesc library).
llvm-svn: 362389
HexagonMCInstrInfo.cpp was not using any APIs from Hexagon.h. Doing so
is problematic from include-what-you-use perspective, but it is also a
layering issue (it creates a dependency cycle between the primary
Hexagon target library and the MCTargetDesc library).
llvm-svn: 362387
HexagonMCCodeEmitter.cpp was not using any APIs from Hexagon.h. Doing
so is problematic from include-what-you-use perspective, but it is also
a layering issue (it creates a dependency cycle between the primary
Hexagon target library and the MCTargetDesc library).
llvm-svn: 362386
HexagonMCCompound.cpp was not using any APIs from Hexagon.h. Doing so
is problematic from include-what-you-use perspective, but it is also a
layering issue (it creates a dependency cycle between the primary
Hexagon target library and the MCTargetDesc library).
llvm-svn: 362385
HexagonShuffler.cpp was not using any APIs from Hexagon.h, and was only
including it for transitive dependencies. Doing so is problematic from
include-what-you-use perspective, but it is also a layering issue (it
creates a dependency cycle between the primary Hexagon target library
and the MCTargetDesc library).
llvm-svn: 362384
HexagonMCChecker.cpp was not using any APIs from Hexagon.h. Doing so is
problematic from include-what-you-use perspective, but it is also a
layering issue (it creates a dependency cycle between the primary
Hexagon target library and the MCTargetDesc library).
llvm-svn: 362383
HexagonMCTargetDesc.cpp was not using any APIs from Hexagon.h. Doing so
is problematic from include-what-you-use perspective, but it is also a
layering issue (it creates a dependency cycle between the primary
Hexagon target library and the MCTargetDesc library).
llvm-svn: 362382
HexagonMCShuffler.cpp was not using any APIs from Hexagon.h. Doing so
is problematic from include-what-you-use perspective, but it is also a
layering issue (it creates a dependency cycle between the primary
Hexagon target library and the MCTargetDesc library).
llvm-svn: 362381
HexagonELFObjectWriter.cpp was not using any APIs from Hexagon.h, and
was only including it for transitive dependencies. Doing so is
problematic from include-what-you-use perspective, but it is also a
layering issue (it creates a dependency cycle between the primary
Hexagon target library and the MCTargetDesc library).
llvm-svn: 362376
HexagonAsmBackend.cpp was not using any APIs from Hexagon.h. Doing so
is problematic from include-what-you-use perspective, but it is also a
layering issue (it creates a dependency cycle between the primary
Hexagon target library and the MCTargetDesc library).
llvm-svn: 362372
HexagonAsmParser.cpp was not using any APIs from Hexagon.h. Doing so is
problematic from include-what-you-use perspective, but it is also a
layering issue (it creates a dependency cycle between the primary
Hexagon target library and the AsmParser library).
llvm-svn: 362370
HexagonShuffler.h was not using any APIs from Hexagon.h, and was only
including it for transitive dependencies. Doing so is problematic from
include-what-you-use perspective, but it is also a layering issue (it
creates a dependency cycle between the primary Hexagon target library
and the MCTargetDesc library).
llvm-svn: 362369
BPFMCTargetDesc.cpp was not using any APIs from BPF.h. Doing so is
problematic from include-what-you-use perspective, but it is also a
layering issue (it creates a dependency cycle between the primary
BPF target library and the MCTargetDesc library).
llvm-svn: 362368
Summary:
- pr42062
When compiling for MinSize,
ARMTargetLowering::LowerCall decides to indirect
multiple calls to a same function. However,
it disconsiders the limitation that thumb1
indirect calls require the callee to be in a
register from r0 to r3 (llvm limiation).
If all those registers are used by arguments, the
compiler dies with "error: run out of registers
during register allocation".
This patch tells the function
IsEligibleForTailCallOptimization if we intend to
perform indirect calls, as to avoid tail call
optimization.
Reviewers: dmgreen, efriedma
Reviewed By: efriedma
Subscribers: javed.absar, kristof.beyls, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62683
llvm-svn: 362366
Summary:
LDWRdPtr would be expanded to ld+ldd. ldd only accepts the pointer register is Y or Z.
So the register class of pointer of LDWRdPtr should be PTRDISPREGS instead of PTRREGS.
Reviewers: dylanmckay
Reviewed By: dylanmckay
Subscribers: dylanmckay, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62300
llvm-svn: 362351
Move this combine from x86 into generic DAGCombine, which currently only manages cases where the bitcast is between types of the same scalarsize.
Differential Revision: https://reviews.llvm.org/D59188
llvm-svn: 362324
The `cfcmsa` and `ctcmsa` instructions accept index of MSA control
register. The MIPS64 SIMD Architecture define eight MSA control
registers. But register index for `cfcmsa` and `ctcmsa` instructions
might be any number in 0..31 range. If the index is greater then 7,
`cfcmsa` writes zero to the destination registers and `ctcmsa` does
nothing [1].
[1] MIPS Architecture for Programmers Volume IV-j:
The MIPS64 SIMD Architecture Module
https://www.mips.com/?do-download=the-mips64-simd-architecture-module
Differential Revision: https://reviews.llvm.org/D62597
llvm-svn: 362299
If we would allow register coalescing on PTRDISPREGS class then register
allocator can lock Z register to some virtual register. Larger instructions
requiring a memory acces then fail during the register allocation phase since
there is no available register to hold a pointer if Y register was already
taken for a stack frame. This patch prevents it by keeping Z register
spillable. It does it by not allowing coalescer to lock it.
Original discussion on https://github.com/avr-rust/rust/issues/128.
llvm-svn: 362298
CodeView has its own register map which is defined in cvconst.h. Missing this
mapping before saving register to CodeView causes debugger to show incorrect
value for all register based variables, like variables in register and local
variables addressed by register (stack pointer + offset).
This change added mapping between LLVM register and CodeView register so the
correct register number will be stored to CodeView/PDB, it aso fixed the
mapping from CodeView register number to register name based on current
CPUType but print PDB to yaml still assumes X86 CPU and needs to be fixed.
Differential Revision: https://reviews.llvm.org/D62608
llvm-svn: 362280
Summary:
It looks like since INLINEASM_BR was created off of INLINEASM (r353563),
a few checks for INLINEASM needed to be updated to check for either
case.
pr/41999
Reviewers: hfinkel
Reviewed By: hfinkel
Subscribers: nemanjai, hiraditya, kbarton, jsji, llvm-commits, craig.topper, srhines
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62403
llvm-svn: 362278
AMDGPU uses multiplier 9 for the inline cost. It is taken into account
everywhere except for inline hint threshold. As a result we are penalizing
functions with the inline hint making them less probable to be inlined
than those without the hint. Defaults are 225 for a normal function and
325 for a function with an inline hint. Currently we have effective
threshold 225 * 9 = 2025 for normal functions and just 325 for those with
the hint. That is fixed by this patch.
Differential Revision: https://reviews.llvm.org/D62707
llvm-svn: 362239
In PPCReduceCRLogicals after splitting the original MBB into 2, the 2 impacted branches still use original branch probability. This is unreasonable. Suppose we have following code, and the probability of each successor is 50%.
condc = conda || condb
br condc, label %target, label %fallthrough
It can be transformed to following,
br conda, label %target, label %newbb
newbb:
br condb, label %target, label %fallthrough
Since each branch has a probability of 50% to each successor, the total probability to %fallthrough is 25% now, and the total probability to %target is 75%. This actually changed the original profiling data. A more reasonable probability can be set to 70% to the false side for each branch instruction, so the total probability to %fallthrough is close to 50%.
This patch assumes the branch target with two incoming edges have same edge frequency and computes new probability fore each target, and keep the total probability to original targets unchanged.
Differential Revision: https://reviews.llvm.org/D62430
llvm-svn: 362237
Summary:
A three sources variant of the TBL instruction is added to the existing
SVE instruction in SVE2. This is implemented with minor changes to the
existing TableGen class. TBX is a new instruction with its own
definition.
The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest
Reviewed By: chill
Differential Revision: https://reviews.llvm.org/D62600
llvm-svn: 362214
Handle position independent code for MIPS32.
When callee is global address, lower call will emit callee
as G_GLOBAL_VALUE and add target flag if needed.
Support $gp in getRegBankFromRegClass().
Select G_GLOBAL_VALUE, specially handle case when
there are target flags attached by lowerCall.
Differential Revision: https://reviews.llvm.org/D62589
llvm-svn: 362210
Move initGlobalBaseReg from MipsSEDAGToDAGISel to MipsFunctionInfo.
This way functions used for handling position independent code during
instruction selection, getGlobalBaseReg and initGlobalBaseReg,
end up in same class.
Differential Revision: https://reviews.llvm.org/D62586
llvm-svn: 362206
Lower call for callee that is register for MIPS32.
Register should contain callee function address.
Differential Revision: https://reviews.llvm.org/D62585
llvm-svn: 362204
These patterns can incorrectly narrow a volatile load from 128-bits to 64-bits.
Similar to PR42079.
Switch to using (v4i32 (bitcast (v2i64 (scalar_to_vector (loadi64))))) as the
load pattern used in the instructions.
This probably still has issues in 32-bit mode where loadi64 isn't legal. Maybe
we should use VZMOVL for widened loads even when we don't need the upper bits
as zeroes?
llvm-svn: 362203
DAG combine will usually fold fpextend+load to an fp extload anyway. So the
256 and 512 patterns were probably unnecessary. The 128 bit pattern was special
in that it looked for a v4f32 load, but then used it in an instruction that
only loads 64-bits. This is bad if the load happens to be volatile. We could
probably make the patterns volatile aware, but that's more work for something
that's probably rare. The peephole pass might kick in and save us anyway. We
might also be able to fix this with some additional DAG combines.
This also adds patterns for vselect+extload to enabled masked vcvtps2pd to be
used. Previously we looked for the unlikely vselect+fpextend+load.
llvm-svn: 362199
This makes the 5 address operands come first. And the data operand comes last.
This matches the operand order the instruction is created with. It's also the
expected order in X86MCInstLower. So everything appeared to work, but the
operands didn't match their declared type.
Fixes a -verify-machineinstrs failure.
Also remove the isel patterns from these instructions since they should only
be used for stack spills and reloads. I'm not even sure what types the patterns
were looking for to match.
llvm-svn: 362193
The result types aren't mentioned in the pattern name so really shouldn't be in the PatFrags.
The users of these either have their own type constraint or rely on the type constranit system to realize the only legal extend would be to f64.
llvm-svn: 362175
The LoadExt table defaults to all combinations being Legal. For
vector types, only src VTs with an i1 element type were ever changed.
So we don't need to mark them legal manually.
llvm-svn: 362170
With LLPC, previous investigation has suggested that si-scheduler
interacts badly with SiFormMemoryClauses on an XNACK target in some
games.
That needs further investigation in the future. In the meantime, this
commit adds a target-specific attribute to allow us to disable
SIFormMemoryClauses by setting it to 1 on a per-function basis for LLPC
to use.
Differential Revision: https://reviews.llvm.org/D62572
Change-Id: Ia0ca12ce79093cbbe86caded723ffb13384ede92
llvm-svn: 362127
Most of the code used for finding a 'narrow' sequence is not used,
so I've removed it and simplified the calls from the smlad matcher.
llvm-svn: 362104
Now the NEON ones have a prefix "NEON_", and the VFP ones have a
prefix "VFP_". This is so that the regex in ARMScheduleA57.td can be
made to match both of _those_ classes of VMAXNM without also matching
the MVE ones that are going to be introduced soon. NFCI.
Patch by Simon Tatham.
Differential Revision: https://reviews.llvm.org/D60700
llvm-svn: 362097
This adds:
- LLVM subtarget features to make all the new instructions conditional on,
- CPU and FPU names for use on clang's command line, with default FPUs set
so that "armv8.1-m.main+fp" and "armv8.1-m.main+fp.dp" will select the right
FPU features,
- architecture extension names "mve" and "mve.fp",
- ABI build attribute support for v8.1-M (a new value for Tag_CPU_arch) and MVE
(a new actual tag).
Patch mostly by Simon Tatham.
Differential Revision: https://reviews.llvm.org/D60698
llvm-svn: 362090
The MVE extension in Arm v8.1-M permits the use of some move, load and
store isntructions which access the FP registers, even if there's no
actual FP support in the processor (in particular, if you have the
integer-only version of MVE).
Therefore, we need separate subtarget features to condition those
instructions on, which are implied by both FP and MVE but are not part
of either.
Patch mostly by Simon Tatham.
Differential Revision: https://reviews.llvm.org/D60694
llvm-svn: 362088
We already have good codegen for (vXiY *ext(vXi1 bitcast(iX))) cases, this patch uses it for loads of vXi1 types as well - changing the load into a iX integer load, and bitcasting so that combineToExtendBoolVectorInReg can then use it.
Differential Revision: https://reviews.llvm.org/D62449
llvm-svn: 362081
MVE architecturally specifies a 'beat' system in which a vector
instruction executed now will complete its actual operation over the
next four cycles, so it can overlap with the execution of the previous
and next MVE instruction.
This makes it generally an advantage to avoid moving values back and
forth between MVE registers and anywhere else, if there's any sensible
way to do the same processing in whatever register type the values
already occupied.
That's just what the 'execution domain' system is supposed to achieve.
So here we add a new execution domain which will contain all the MVE
vector instructions when they are added.
Patch by: Simon Tatham
Differential Revision: https://reviews.llvm.org/D60703
llvm-svn: 362068
The new ARMPredicates.td is included from ARM.td, early enough that
the predicate definitions are already in scope when ARMSchedule.td is
included. This will make it possible to refer to them in
UnsupportedFeatures fields of scheduling models.
NFC: the chunk of Tablegen being moved here is copied and pasted
verbatim.
Patch by: Simon Tatham
Differential Revision: https://reviews.llvm.org/D60693
llvm-svn: 361958
Summary:
Patch adds support for the following instructions:
* EOR3, BSL, BCAX, BSL1N, BSL2N, NBSL, XAR
Aliases for types .B/.H/.S for EOR3 and BCAX have been added, the
preferred disassembly is .D.
The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest
Reviewed By: SjoerdMeijer
Differential Revision: https://reviews.llvm.org/D62387
llvm-svn: 361936
Summary:
Patch adds support for the following instructions:
SVE2 floating-point pairwise operations:
* FADDP, FMAXNMP, FMINNMP, FMAXP, FMINP
The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest
Reviewed By: chill
Differential Revision: https://reviews.llvm.org/D62383
llvm-svn: 361933
avoid static check fail
RegClassOrBank is an object of RegClassOrRegBank, which is defined as
using llvm::RegClassOrRegBank = typedef PointerUnion<const
TargetRegisterClass *, const RegisterBank *>
so control flow can not get here. Use ""llvm_unreachable" here to avoid
"null pointer" confusion.
Patch by Shengchen Kan (skan)
Differential Revision: https://reviews.llvm.org/D62006
Signed-off-by: pengfei <pengfei.wang@intel.com>
llvm-svn: 361912
D18885 emitted 5 bytes for call *foo@tlsdesc(%rax). It should use the
2-byte form instead and let R_X86_64_TLSDESC_CALL apply to the beginning
of the call instruction.
The 2-byte form was deliberately chosen to make ->LE and ->IE relaxation work:
0: 48 8d 05 00 00 00 00 lea 0x0(%rip),%rax # 7 <.text+0x7>
3: R_X86_64_GOTPC32_TLSDESC a-0x4
7: ff 10 callq *(%rax)
7: R_X86_64_TLSDESC_CALL a
=>
0: 48 c7 c0 fc ff ff ff mov $0xfffffffffffffffc,%rax
7: 66 90 xchg %ax,%ax
Also change the symbol type to STT_TLS when VK_TLSCALL or VK_TLSDESC is
seen.
Reviewed By: compnerd
Differential Revision: https://reviews.llvm.org/D62512
llvm-svn: 361910
Add support for selecting FCMPSri and FCMPDri when comparing against 0.0, and
factor out opcode selection for G_FCMP into its own function.
Add a test to show that we don't do this with other immediates.
Differential Revision: https://reviews.llvm.org/D62539
llvm-svn: 361888
Summary:
This adds support for translation of LLVM IR fence instruction. We
convert a singlethread fence to a pseudo compiler barrier which becomes
0 instructions in final binary, and a thread fence to an idempotent
atomicrmw instruction to a memory address.
Reviewers: dschuff, jfb, sunfish, tlively
Subscribers: sbc100, jgravelle-google, llvm-commits
Differential Revision: https://reviews.llvm.org/D50277
llvm-svn: 361884
This patch optimizes ISD::LRINT and ISD::LLRINT to frintx plus
fcvtzs. It currently only handles the scalar version.
Reviewed By: SjoerdMeijer, mstorsjo
Differential Revision: https://reviews.llvm.org/D62018
llvm-svn: 361877
This patch add the ISD::LRINT and ISD::LLRINT along with new
intrinsics. The changes are straightforward as for other
floating-point rounding functions, with just some adjustments
required to handle the return value being an interger.
The idea is to optimize lrint/llrint generation for AArch64
in a subsequent patch. Current semantic is just route it to libm
symbol.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D62017
llvm-svn: 361875
Summary:
- There's a regression due to the cross-block RC assignment. Use the
proper way to derive the output register RC in inline asm.
Reviewers: rampitec, alex-t
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, dstuttard, tpr, t-tye, eraman, hiraditya, llvm-commits, yaxunl
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62537
llvm-svn: 361868
If the only VGPRs used for SGPR spilling were not CSRs, this was
enabling all laness and immediately restoring exec. This is the usual
situation in leaf functions.
llvm-svn: 361848
Summary:
- Don't treat the use of a scalar register as `vreg_1` an VGPR usage.
Otherwise, that promotes that scalar register into vector one, which
breaks the assumption that scalar register holds the lane mask.
- The issue is triggered in a complicated case, where if the uses of
that (lane mask) scalar register is legalized firstly before its
definition, e.g., due to the mismatch block placement and its
topological order or loop. In that cases, the legalization of PHI
introduces the use of that scalar register as `vreg_1`.
Reviewers: rampitec, nhaehnle, arsenm, alex-t
Subscribers: kzhuravl, jvesely, wdng, dstuttard, tpr, t-tye, hiraditya, llvm-commits, yaxunl
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62492
llvm-svn: 361847
Those two subtarget features were awkward because their semantics are
reversed: each one indicates the _lack_ of support for something in
the architecture, rather than the presence. As a consequence, you
don't get the behavior you want if you combine two sets of feature
bits.
Each SubtargetFeature for an FP architecture version now comes in four
versions, one for each combination of those options. So you can still
say (for example) '+vfp2' in a feature string and it will mean what
it's always meant, but there's a new string '+vfp2d16sp' meaning the
version without those extra options.
A lot of this change is just mechanically replacing positive checks
for the old features with negative checks for the new ones. But one
more interesting change is that I've rearranged getFPUFeatures() so
that the main FPU feature is appended to the output list *before*
rather than after the features derived from the Restriction field, so
that -fp64 and -d32 can override defaults added by the main feature.
Reviewers: dmgreen, samparker, SjoerdMeijer
Subscribers: srhines, javed.absar, eraman, kristof.beyls, hiraditya, zzheng, Petar.Avramovic, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D60691
llvm-svn: 361845
If we don't have VLX then 256-bit SET0 should be lowered
to VPXOR with ZMM registers. This restores functionality
accidentally removed by r309926.
Differential Revision: https://reviews.llvm.org/D62415
llvm-svn: 361843
This shows up as a side issue to the main problem for the AVX target example from PR37428:
https://bugs.llvm.org/show_bug.cgi?id=37428 - https://godbolt.org/z/7tpRa3
But as we can see in the pile of existing test diffs, it's actually a widespread problem
that affects any AVX or later target. Apart from a couple of oddballs, I think these are
all improvements for the reasons stated in the code comment: we do not want to enable YMM
unnecessarily (avoid vzeroupper and frequency throttling) and some cores split 256-bit
stores anyway.
We could say that MergeConsecutiveStores() is going overboard on some of these examples,
but that won't solve the problem completely. But that is the reason I'm proposing this as
a lowering rather than a combine: we will infinite loop fighting the merge code if we try
this earlier.
Differential Revision: https://reviews.llvm.org/D62498
llvm-svn: 361822
Forking this out of the discussion in D62498
(and assuming that will be committed later, so adding the helper function here).
The LangRef says:
"the backend should never split or merge target-legal volatile load/store instructions."
Differential Revision: https://reviews.llvm.org/D62506
llvm-svn: 361815
Summary:
Patch adds support for the following instructions:
SVE2 crypto constructive binary operations:
* SM4EKEY, RAX1
SVE2 crypto destructive binary operations:
* AESE, AESD, SM4E
SVE2 crypto unary operations:
* AESMC, AESIMC
AESE, AESD, AESMC and AESIMC are enabled with +sve2-aes. SM4E and
SM4EKEY are enabled with +sve2-sm4. RAX1 is enabled with +sve2-sha3.
The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest
Reviewed By: SjoerdMeijer
Differential Revision: https://reviews.llvm.org/D62307
llvm-svn: 361797
Summary:
Patch adds support for the following instructions:
SVE2 histogram generation (segment):
* HISTSEG
SVE2 histogram generation (vector):
* HISTCNT
The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest
Reviewed By: chill
Differential Revision: https://reviews.llvm.org/D62306
llvm-svn: 361796
Summary:
Patch adds support for the following instructions:
SVE2 bitwise exclusive-or interleaved:
* EORBT, EORTB
SVE2 bitwise permute:
* BEXT, BDEP, BGRP
SVE2 bitwise shift left long:
* SSHLLB, SSHLLT, USHLLB, USHLLT
SVE2 integer add/subtract interleaved long:
* SADDLBT, SSUBLBT, SSUBLTB
BDEP, BEXT and BGRP are enabled with SVE2 feature +bitperm, all other
instructions in this group are enabled with +sve2.
Reviewed By: chill
Differential Revision: https://reviews.llvm.org/D62304
llvm-svn: 361795
AArch64AsmBackend.cpp was not using any APIs from AArch64.h, and was
only including it for transitive dependencies. Doing so is problematic
from include-what-you-use perspective, but it is also a layering issue
(it creates a dependency cycle between the primary AArch64 target
library and the MCTargetDesc library).
llvm-svn: 361774
1a8b2ea611cf4ca7cb09562e0238cfefa27c05b5 Divergence driven ISel. Assign register class for cross block values according to the divergence.
llvm-svn: 361770
The variables in BTF DataSec type encode in-section offset.
R_BPF_NONE should be generated instead of R_BPF_64_32.
Signed-off-by: Yonghong Song <yhs@fb.com>
Differential Revision: https://reviews.llvm.org/D62460
llvm-svn: 361742
Details: To make instruction selection really divergence driven it is necessary to assign
the correct register classes to the cross block values beforehand. For the divergent targets
same value type requires different register classes dependent on the value divergence.
Reviewers: rampitec, nhaehnle
Differential Revision: https://reviews.llvm.org/D59990
This commit was reverted because of the build failure.
The reason was mlformed patch.
Build failure fixed.
llvm-svn: 361741
This matches countLeadingOnes() and countTrailingOnes(), and
APInt's countLeadingZeros() and countTrailingZeros().
(as well as __builtin_clzll())
llvm-svn: 361724
This add patterns for fp16 round and ceil etc. Same as the float and double
patterns.
Differential Revision: https://reviews.llvm.org/D62326
llvm-svn: 361718
Promote a number of fp16 math intrinsics to float, so that the relevant float
math routines can be used. Copysign is expanded so as to be handled in-place.
Differential Revision: https://reviews.llvm.org/D62325
llvm-svn: 361717
We were only testing for direct SETCC results - this allows us to peek through AND/OR/XOR combinations of the comparison results as well.
There's a missing SEXT(PACKSS) fold that I need to investigate for v8i1 cases before I can enable it there as well.
llvm-svn: 361716
If we have a known non-nan operand, place it in the second operand
of fmin/fmax that is returned if either operand is nan.
Differential Revision: https://reviews.llvm.org/D62448
llvm-svn: 361704
INC/DEC is really a special case of a more generic issue. We should also turn leas into add reg/reg or add reg/imm regardless of the slow lea flags.
This also supports LEA64_32 which has 64 bit input registers and 32 bit output registers. So we need to convert the 64 bit inputs to their 32 bit equivalents to check if they are equal to base reg.
One thing to note, the original code preserved the kill flags by adding operands to the new instruction instead of using addReg. But I think tied operands aren't supposed to have the kill flag set. I dropped the kill flags, but I could probably try to preserve it in the add reg/reg case if we think its important. Not sure which operand its supposed to go on for the LEA64_32r instruction due to the super reg implicit uses. Though I'm also not sure those are needed since they were probably just created by an INSERT_SUBREG from a 32-bit input.
Differential Revision: https://reviews.llvm.org/D61472
llvm-svn: 361691
This copies the Sandy Bridge zero idiom support to later CPUs. Adding the AVX2 and AVX512F/VL instructions as appropriate.
Differential Revision: https://reviews.llvm.org/D62360
llvm-svn: 361690
In a few places in getInstrMapping, we check if use/def instructions for the
instruction we're mapping have floating point constraints.
We can improve this check and reduce the number of copies in GISel-compiled code
if we make a couple observations:
- For a def instruction, it only matters if the def instruction must always
output a value stored on a FPR
- For a use instruction, it only matters if the use instruction must always
only take in values stored in FPRs
This adds two new functions:
- onlyUsesFP
- onlyDefinesFP
Then we can use those when we're checking the uses/defs instead.
Without this patch, the load, unmerge, store, and select in the added test
would have unnecessary copies.
Differential Revision: https://reviews.llvm.org/D62426
llvm-svn: 361679
Factor it out into a function, and replace places where we had the same check
with the new function.
Differential Revision: https://reviews.llvm.org/D62421
llvm-svn: 361677
Summary:dd
This patch implements call lowering for calls without parameters
on AIX as initial support.
Reviewers: sfertile, hubert.reinterpretcast, aheejin, efriedma
Differential Revision: https://reviews.llvm.org/D61948
llvm-svn: 361669
The fcsel and csel instructions differ in only the register banks they work on.
So, they're entirely interchangeable otherwise.
With this in mind, this does two things:
- Teach AArch64RegisterBankInfo to consider the inputs to G_SELECT as well as
the outputs.
- Teach it to choose the best register bank mapping based off the constraints
of the inputs and outputs.
The "best" in this case means the one that requires the smallest number of
copies to properly emit a fcsel/csel.
For example, if the inputs are all already going to be on FPRs, we should
emit a fcsel, even if the output is a GPR. This costs one copy to produce the
result, but saves us from copying the inputs into GPRs.
Also update the regbank-select.mir to check that we end up with the right
select instruction.
Differential Revision: https://reviews.llvm.org/D62267
llvm-svn: 361665
Summary:
It looks like since INLINEASM_BR was created off of INLINEASM, a few
checks for INLINEASM needed to be updated to check for either case.
pr/41999
Reviewers: t.p.northover, peter.smith
Reviewed By: peter.smith
Subscribers: craig.topper, javed.absar, kristof.beyls, hiraditya, llvm-commits, peter.smith, srhines
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62402
llvm-svn: 361661
Summary:
We were observing failures for arm32 allyesconfigs of the Linux kernel
with the asm goto Clang patch, where ldr's were being generated to
offsets too far away to encode in imm12.
It looks like since INLINEASM_BR was created off of INLINEASM, a few
checks for INLINEASM needed to be updated to check for either case.
pr/41999
Link: https://github.com/ClangBuiltLinux/linux/issues/490
Reviewers: peter.smith, kristof.beyls, ostannard, rengolin, t.p.northover
Reviewed By: peter.smith
Subscribers: jyu2, javed.absar, hiraditya, llvm-commits, nathanchance, craig.topper, kees, srhines
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62400
llvm-svn: 361659
This was skipping GetUnderlyingObject for nonprivate addresses, but an
alloca could also be found through an addrspacecast if it's flat.
llvm-svn: 361649
Details: To make instruction selection really divergence driven it is necessary to assign
the correct register classes to the cross block values beforehand. For the divergent targets
same value type requires different register classes dependent on the value divergence.
Reviewers: rampitec, nhaehnle
Differential Revision: https://reviews.llvm.org/D59990
llvm-svn: 361644
For the situation, where we generate the following code:
crxor 8, 8, 8
< Some instructions>
.LBB0_1:
< Some instructions>
cror 1, 8, 8
cror (COPY of CRbit) depends on the result of the crxor instruction.
CR8 is known to be zero as crxor is equivalent to CRUNSET. We can simply use
crxor 1, 1, 1 instead to zero out CR1, which does not have any dependency on
any previous instruction.
This patch will optimize it to:
< Some instructions>
.LBB0_1:
< Some instructions>
cror 1, 1, 1
Patch By: Victor Huang (NeHuang)
Differential Revision: https://reviews.llvm.org/D62044
llvm-svn: 361632
This patch adds the overridable TargetLowering::getTargetConstantFromLoad function which allows targets to return any constant value loaded by a LoadSDNode node - only X86 makes use of this so far but everything should be in place for other targets.
computeKnownBits then uses this function to improve codegen, notably vector code after legalization.
A future commit will do the same for ComputeNumSignBits but computeKnownBits sees the bigger benefit.
This required a couple of fixes:
* SimplifyDemandedBits must early-out for getTargetConstantFromLoad cases to prevent infinite loops of constant regeneration (similar to what we already do for BUILD_VECTOR).
* Fix a DAGCombiner::visitTRUNCATE issue as we had trunc(shl(v8i32),v8i16) <-> shl(trunc(v8i16),v8i32) infinite loops after legalization on AVX512 targets.
Differential Revision: https://reviews.llvm.org/D61887
llvm-svn: 361620
Summary:
This patch adds support for the polynomial multiplication instructions
PMULLB/PMULLT. The 64-bit source and 128-bit destination element
variants are enabled with crypto extensions (+sve2-aes), similar to the
NEON PMULL2 instruction. All other variants are enabled with +sve2.
The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest
Reviewed By: SjoerdMeijer
Differential Revision: https://reviews.llvm.org/D62145
llvm-svn: 361619
Summary:
This patch adds support for the SVE2 saturating/rounding bitwise shift
left (predicated) group of instructions:
* SRSHL, URSHL, SRSHLR, URSHLR, SQSHL, UQSHL, SQRSHL, UQRSHL,
SQSHLR, UQSHLR, SQRSHLR, UQRSHLR
Immediate forms of the SQSHL and UQSHL instructions are also added to
the existing SVE bitwise shift by immediate (predicated) group, as well
as three new instructions SRSHR/URSHR/SQSHLU. The new instructions in
this group are encoded similarly and are implemented using the same
TableGen class with a minimal change (1 bit in encoding).
The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest
Reviewed By: SjoerdMeijer
Differential Revision: https://reviews.llvm.org/D62140
llvm-svn: 361612
Summary:
Bit 20 in sve2_int_arith_pred TableGen class was overlapping. The
encodings are not affected as bit 20 is defined by the opc bits
and this was overwriting the earlier error of setting bit 20 to 0.
Raised by Momchil: https://reviews.llvm.org/D62130
Reviewed By: chill
Differential Revision: https://reviews.llvm.org/D62292
llvm-svn: 361609
swifterror marks an argument as a register pretending to be a pointer, so we
need a guaranteed mem2reg-like analysis of its uses. Fortunately most of the
infrastructure can be reused from the DAG world.
llvm-svn: 361608
The D45316 introduced the `shouldTransformMulToShiftsAddsSubs` function
to check that breaking down constant multiplications into a series
of shifts, adds, and subs is efficient. Unfortunately, this function
does not check maximum number of steps on all paths of the algorithm.
This patch fixes this bug.
Fix for PR41929.
Differential Revision: https://reviews.llvm.org/D62166
llvm-svn: 361606
This pass wasn't printing any messages at all, which I find really inconvenient
while debugging/tracing things. It now dumps the before and after of expanded
instructions. It doesn't do this yet for all instructions, but this is a good
start I guess.
Differential Revision: https://reviews.llvm.org/D62297
llvm-svn: 361604
When we are scheduling the load and addi, if all other heuristic didn't take effect,
we will try to schedule the addi before the load, to hide the latency, and avoid the
true dependency added by RA. And this only take effects for Power9.
Differential Revision: https://reviews.llvm.org/D61930
llvm-svn: 361600
Summary:
On Windows, X8 may be used to pass in the address of an aggregate that
is returned indirectly. Therefore, it should be forwarded to variadic
musttail calls and preserved in thunks.
Fixes PR41997
Reviewers: mgrang, efriedma
Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62344
llvm-svn: 361585
We were assuming a much larger possible per-wave visible stack
allocation than is possible:
faa3ae5138/src/core/runtime/amd_gpu_agent.cpp (L70)
Based on this, we can assume the high 15 bits of a frame index or sret
are 0. The frame index value is the per-lane offset, so the maximum
frame index value is MAX_WAVE_SCRATCH / wavesize.
Remove the corresponding subtarget feature and option that made
this configurable.
llvm-svn: 361541
Fixes https://bugs.llvm.org/show_bug.cgi?id=40969
The functions findPotentiallyBlockedCopies and buildCopy are currently not
accounting for the presence of debug instructions. In the former this results
in the optimization not being trigerred, and in the latter results in
inconsistent codegen.
This patch enables the optimization to be performed in a debug build and
ensures the codegen is consistent with non-debug builds.
Patch by Chris Dawson.
Differential Revision: https://reviews.llvm.org/D61680
llvm-svn: 361527
Summary:
These features will both be implemented soon, so I thought I would
save time by adding the boilerplate for both of them at the same time.
Reviewers: aheejin
Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D62047
llvm-svn: 361516
This patch adds the pseudo instructions la.tls.ie and la.tls.gd, used in
the initial-exec and global-dynamic TLS models respectively when
addressing a global. The pseudo instructions are expanded in the
assembly parser.
llvm-svn: 361499
The previous patch added a member set to store instructions that we
could allow to wrap. But this wasn't cleared between searches meaning
that they could get promoted, incorrectly, during the promotion of a
separate valid chain.
Differential Revision: https://reviews.llvm.org/D62254
llvm-svn: 361462
Summary:
In this patch, `ISD::RETURNADDR` is lowered on the emscripten target
to the new Emscripten runtime function `emscripten_return_address`, which
implements the functionality.
Patch by Guanzhong Chen
Reviewers: tlively, aheejin
Reviewed By: tlively
Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62210
llvm-svn: 361454
In general dynamic/local dynamic TLS models, with -fno-plt,
* x86: emit `calll *___tls_get_addr@GOT(%ebx)` instead of `calll ___tls_get_addr@PLT`
Note, on x86, if we can get rid of %ebx as the PIC register,
it may be better to use a register not preserved across function calls.
* x86_64: emit `callq *__tls_get_addr@GOTPCREL(%rip)` instead of `callq __tls_get_addr@PLT`
Reorganize the code by separating 32-bit and 64-bit.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D62106
llvm-svn: 361453
We effectively had a second set of isel patterns that tried to use a
regular store instruction and an extract_subreg instruction. Or a masked move
and an extract_subreg. These patterns were intended to override the
matching of VEXTRACT instructions by taking advantage of the priority
of the explicit immediate 0 for the index.
This patch instaed just disables the immediate 0 matchin the VEXTRACT
patterns. This each of the component pieces of the larger patterns will
match by themselves.
This found a bug of sorts were we didn't use 128-bit store for 512->128
extract on KNL. Its unclear what the right thing here should be.
Using the vextract avoids constraining the register allocator to use
xmm0-15. But it always results in a longer encoding if the register
allocator ends up choosing xmm0-15 anyway.
llvm-svn: 361431
We were turning roundss/sd/ps/pd intrinsics with immediates of 1 or 2 into
llvm.floor/ceil. The llvm.ceil/floor intrinsics are supposed to correspond
to the libm functions. For the libm functions we need to disable the
precision exception so the llvm.floor/ceil functions should always map to
encodings 0x9 and 0xA.
We had a mix of isel patterns where some used 0x9 and 0xA and others used
0x1 and 0x2. We need to be consistent and always use 0x9 and 0xA.
Since we have no way in isel of knowing where the llvm.ceil/floor came
from, we can't map X86 specific intrinsics with encodings 1 or 2 to it.
We could map 0x9 and 0xA to llvm.ceil/floor instead, but I'd really like
to see a use case and optimization advantage first.
I've left the backend test cases to show the blend we now emit without
the extra isel patterns. But I've removed the InstCombine tests completely.
llvm-svn: 361425
This fix is for the problem from https://bugs.llvm.org/show_bug.cgi?id=38714.
Specifically, Simple Register Coalescing creates following conversion :
undef %0.sub_32:gpr64 = ORRWrs $wzr, %3:gpr32common, 0, debug-location !24;
It copies 32-bit value from gpr32 into gpr64. But Live DEBUG_VALUE analysis
is not able to create debug location record for that instruction. So the problem
is in that debug info for argc variable is incorrect. The fix is
to write custom isCopyInstrImpl() which would recognize the ORRWrs instr.
llvm-svn: 361417
Keep it optional in cases this is ever needed in some global
context. Currently it's only used for getting an upper bound inline
asm code size.
For AMDGPU, gfx10 increases the maximum instruction size to
20-bytes. This avoids penalizing older subtargets when estimating code
size, and making some annoying branch relaxation test adjustments.
llvm-svn: 361405
When the tiny code model is requested for a target machine that does not
support this, we get an error message (which is nice) but also this diagnostic
and request to submit a bug report:
fatal error: error in backend: Target does not support the tiny CodeModel
[Inferior 2 (process 31509) exited with code 0106]
clang-9: error: clang frontend command failed with exit code 70 (use -v to see invocation)
(gdb) clang version 9.0.0 (http://llvm.org/git/clang.git 29994b0c63a40f9c97c664170244a7bba5ecc15e) (http://llvm.org/git/llvm.git 95606fdf91c2d63a931e865f4b78b2e9828ddc74)
Target: arm-arm-none-eabi
Thread model: posix
clang-9: note: diagnostic msg: PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace, preprocessed source, and associated run script.
clang-9: note: diagnostic msg:
********************
PLEASE ATTACH THE FOLLOWING FILES TO THE BUG REPORT:
Preprocessed source(s) and associated run script(s) are located at:
clang-9: note: diagnostic msg: /tmp/tiny-dfe1a2.c
clang-9: note: diagnostic msg: /tmp/tiny-dfe1a2.sh
clang-9: note: diagnostic msg:
But this is not a bug, this is a feature. :-) Not only is this not a bug, this
is also pretty confusing. This patch causes just to print the fatal error and
not the diagnostic:
fatal error: error in backend: Target does not support the tiny CodeModel
Differential Revision: https://reviews.llvm.org/D62236
llvm-svn: 361370
Summary:
For big-endian powerpc64, the default ABI is ELFv1. OpenPower ABI ELFv2 is supported when -mabi=elfv2 is specified. FreeBSD support for PowerPC64 ELFv2 ABI with LLVM is in progress[1]. This patch adds an alternative way to specify ELFv2 ABI on target triple [2].
The following results are expected:
ELFv1 when using:
-target powerpc64-unknown-freebsd12.0
-target powerpc64-unknown-freebsd12.0 -mabi=elfv1
-target powerpc64-unknown-freebsd12.0-elfv1
ELFv2 when using:
-target powerpc64-unknown-freebsd12.0 -mabi=elfv2
-target powerpc64-unknown-freebsd12.0-elfv2
[1] https://wiki.freebsd.org/powerpc/llvm-elfv2
[2] https://clang.llvm.org/docs/CrossCompilation.html
Patch by Alfredo Dal'Ava Júnior!
Differential Revision: https://reviews.llvm.org/D61950
llvm-svn: 361355
The Armv8.2-A crypto extensions all defaulted to true, but should default to
false, like all the other extensions.
Differential Revision: https://reviews.llvm.org/D62180
llvm-svn: 361354
Fix for https://bugs.llvm.org/show_bug.cgi?id=41971. Make the
combineVectorSizedSetCCEquality() transform more conservative by
checking that the bitcast to the vector type will be cheap/free
for both operands. I'm considering it cheap if it's a constant,
a load or already a vector. I've dropped the explicit check for
f128 because it should fall out naturally (in the cases where
it'd be detrimental).
Differential Revision: https://reviews.llvm.org/D62220
llvm-svn: 361352
CET-IBT enabled
Return-twice functions will indirectly jump after the caller's position.
So when CET-IBT is enable, we should make sure these is endbr*
instructions follow these Return-twice function caller. Like GCC does.
Patch by Xiang Zhang (xiangzhangllvm)
Differential Revision: https://reviews.llvm.org/D61881
llvm-svn: 361342
r360889 added new llround builtin functions. This patch adds their
signatures for the WebAssembly backend.
It also adds wasm32 support to utils/update_llc_test_checks.py, since
that's the script other targets are using for their testcases for this
feature.
Differential Revision: https://reviews.llvm.org/D62207
llvm-svn: 361327
On PowerPC64 ELFv2 ABI, functions may have 2 entry points: global and local.
The local entry point location of a function is stored in the st_other field of the symbol, as an offset relative to the global entry point.
In order to make symbol assignments (e.g. .equ/.set) work properly with this, PPCTargetELFStreamer already copies the local entry bits from the source symbol to the destination one, on emitAssignment(). The problem is that this copy is performed only at the assignment location, where the source symbol may not yet have processed the .localentry directive, that sets the local entry. This may cause the destination symbol to end up with wrong local entry information. Other symbol info is not affected by this because, in this case, the destination symbol value is actually a symbol reference.
This change keeps track of these assignments, and update all needed st_other fields when finish() is called.
Patch by Leandro Lupori!
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D56586
llvm-svn: 361237
Some checks in isShuffleMaskLegal expect an even number of elements,
e.g. isTRN_v_undef_Mask or isUZP_v_undef_Mask, otherwise they access
invalid elements and crash. This patch adds checks to the impacted
functions.
Fixes PR41951
Reviewers: t.p.northover, dmgreen, samparker
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D60690
llvm-svn: 361235
PrepareConstants step converts add/sub with 'negative' immediates to
sub/add with a 'positive' imm to make promotion more simple. nuw
already states that the add shouldn't cause an unsigned wrap, so
it shouldn't need any tweaking. Plus, we also don't allow a sub with
a 'negative' immediate to be safe wrap, so this functionality has
been removed. The PrepareConstants step now just handles the add
instructions that we've determined would be safe if they wrap around
zero.
Differential Revision: https://reviews.llvm.org/D62057
llvm-svn: 361227
Summary:
The endianess used in the calling convention does not always match the
endianess of the target on all architectures, namely AVR.
When an argument is too large to be legalised by the architecture and is
split for the ABI, a new hook TargetLoweringInfo::shouldSplitFunctionArgumentsAsLittleEndian
is queried to find the endianess that function arguments must be laid
out in.
This approach was recommended by Eli Friedman.
Originally reported in https://github.com/avr-rust/rust/issues/129.
Patch by Carl Peto.
Reviewers: bogner, t.p.northover, RKSimon, niravd, efriedma
Reviewed By: efriedma
Subscribers: JDevlieghere, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62003
llvm-svn: 361222
Unfortunately the way SIInsertSkips works is backwards, and is
required for correctness. r338235 added handling of some special cases
where skipping is mandatory to avoid side effects if no lanes are
active. It conservatively handled asm correctly, but the same logic
needs to apply to calls.
Usually the call sequence code is larger than the skip threshold,
although the way the count is computed is really broken, so I'm not
sure if anything was likely to really hit this.
llvm-svn: 361202
A std::array is implemented as a template with an array
inside a struct. Older versions of clang, like 3.6,
require an extra set of curly braces around std::array
initializations to avoid warnings.
The C++ language was changed regarding this by CWG 1270.
So more modern tool chains does not complaing even if
leaving out one level of braces.
llvm-svn: 361171
Summary:
This patch adds support for the integer pairwise add and accumulate long
instructions SADALP/UADALP. These instructions are predicated.
The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest
Reviewed By: SjoerdMeijer
Differential Revision: https://reviews.llvm.org/D62001
llvm-svn: 361154
Refactor DIExpression::With* into a flag enum in order to be less
error-prone to use (as discussed on D60866).
Patch by Djordje Todorovic.
Differential Revision: https://reviews.llvm.org/D61943
llvm-svn: 361137
Summary:
This patch adds support for the predicated integer halving add/sub
instructions:
* SHADD, UHADD, SRHADD, URHADD
* SHSUB, UHSUB, SHSUBR, UHSUBR
The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest
Reviewed By: rovka
Differential Revision: https://reviews.llvm.org/D62000
llvm-svn: 361136
Summary:
Avoid introducing hazard mitigation when lgkmcnt is reduced to 0.
Clarify code comments to explain assumptions made for this hazard
mitigation. Expand and correct test cases to cover variants of
s_waitcnt.
Reviewers: nhaehnle, rampitec
Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62058
llvm-svn: 361124
This is ported from the custom AMDGPU DAG implementation. I think this
is a better default expansion than what the DAG currently uses, at
least if the target has CTLZ.
This implements the signed version in terms of the unsigned
conversion, which is implemented with bit operations. SelectionDAG has
several other implementations that should eventually be ported
depending on what instructions are legal.
llvm-svn: 361081
Same as what we do for vector reductions in combineHorizontalPredicateResult, use movmsk+cmp for scalar (and(extract(x,0),extract(x,1)) reduction patterns.
llvm-svn: 361052
Summary:
In order to combine memory operations efficiently, the load/store
optimizer might move some instructions around. It's usually safe
to move instructions down past the merged instruction because the
pass checks if memory operations can be re-ordered.
Though, the current logic doesn't handle Write-after-Write hazards.
This fixes a reflection issue with Monster Hunter World and DXVK.
v2: - rebased on top of master
- clean up the test case
- handle WaW hazards correctly
Bugzilla: https://bugs.llvm.org/show_bug.cgi?id=40130
Original patch by Samuel Pitoiset.
Reviewers: tpr, arsenm, nhaehnle
Reviewed By: nhaehnle
Subscribers: ronlieb, arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye
Differential Revision: https://reviews.llvm.org/D61313
llvm-svn: 361008
Summary:
Patch adds support for indexed and unpredicated vectors forms of the
following instructions:
* SQDMLALB, SQDMLALT, SQDMLSLB, SQDMLSLT
The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest
Reviewed By: SjoerdMeijer
Differential Revision: https://reviews.llvm.org/D61997
llvm-svn: 361005
Summary:
Patch adds support for indexed and unpredicated vectors forms of the
following instructions:
* SMLALB, SMLALT, UMLALB, UMLALT, SMLSLB, SMLSLT, UMLSLB, UMLSLT
The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest
Reviewed By: rovka
Differential Revision: https://reviews.llvm.org/D61951
llvm-svn: 361003
Summary:
Patch adds support for indexed and unpredicated vectors forms of the
following instructions:
* SMULLB, SMULLT, UMULLB, UMULLT, SQDMULLB, SQDMULLT
The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest
Reviewed By: rovka
Differential Revision: https://reviews.llvm.org/D61936
llvm-svn: 361002
This can be used to create references among sections. When --gc-sections
is used, the referenced section will be retained if the origin section
is retained.
llvm-svn: 360990
Replace the member variable Target with Triple
Use Triple instead of TheTarget.getName() to dispatch on 32-bit/64-bit.
Delete redundant parameters
llvm-svn: 360986
This can be used to create references among sections. When --gc-sections
is used, the referenced section will be retained if the origin section
is retained.
See R_MIPS_NONE (D13659), R_ARM_NONE (D61992), R_AARCH64_NONE (D61973) for similar changes.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D62014
llvm-svn: 360983
Summary:
This can be used to create references among sections. When --gc-sections
is used, the referenced section will be retained if the origin section
is retained.
Reviewed By: peter.smith
Differential Revision: https://reviews.llvm.org/D61973
llvm-svn: 360981
R_ARM_NONE can be used to create references among sections. When
--gc-sections is used, the referenced section will be retained if the
origin section is retained.
Add a generic MCFixupKind FK_NONE as this kind of no-op relocation is
ubiquitous on ELF and COFF, and probably available on many other binary
formats. See D62014.
Reviewed By: peter.smith
Differential Revision: https://reviews.llvm.org/D61992
llvm-svn: 360980
Make sure to not unroll a vector division/remainder (with a constant splat
divisor) after type legalization, since the scalar type may then be illegal.
Review: Ulrich Weigand
https://reviews.llvm.org/D62036
llvm-svn: 360965
These are valid Jcc, but aren't based on the EFLAGS condition codes (Intel 64
and IA-32 Architetcures Software Developer's Manual Vol. 1, Appendix B). These
are covered in clang/test, but not llvm/test.
llvm-svn: 360960
In Intel syntax, it's not uncommon to see a "short" modifier on Jcc conditional
jumps, which indicates the offset should be a "short jump" (8-bit immediate
offset from EIP, -128 to +127). This patch expands to all recognized Jcc
condition codes, and removes the inline restriction.
Clang already ignores "jmp short" in inline assembly. However, only "jmp" and a
couple of Jcc are actually checked, and only inline (i.e., not when using the
integrated assembler for asm sources). A quick search through asm-containing
libraries at hand shows a pretty broad range of Jcc conditions spelled with
"short."
GAS ignores the "short" modifier, and instead uses an encoding based on the
given immediate. MS inline seems to do the same, and I suspect MASM does, too.
NASM will yield an error if presented with an out-of-range immediate value.
Example of GCC 9.1 and MSVC v19.20, "jmp short" with offsets that do and do not
fit within 8 bits: https://gcc.godbolt.org/z/aFZmjY
Differential Revision: https://reviews.llvm.org/D61990
llvm-svn: 360954
This better matches the verbiage in Intel documentation, and should help avoid
confusion between these two different kinds of values, both of which are parsed
from mnemonics.
llvm-svn: 360953
Summary:
This refactors four pieces of code that create SDNodes for references to
symbols:
- normal global address lowering (LEA, MOV, etc)
- callee global address lowering (CALL)
- external symbol address lowering (LEA, MOV, etc)
- external symbol address lowering (CALL)
Each of these pieces of code need to:
- classify the reference
- lower the symbol
- emit a RIP wrapper if needed
- emit a load if needed
- add offsets if needed
I think handling them all in one place will make the code easier to
maintain in the future.
Reviewers: craig.topper, RKSimon
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61690
llvm-svn: 360952
This suppresses exceptions which is what we should be doing for ceil and floor. We already use the correct immediate
in patterns without masking.
llvm-svn: 360915
This is the conservatively correct default. It is always safe to
assume xnack is enabled, but not the converse.
Introduce a feature to blacklist targets where xnack can never be
meaningfully enabled. I'm not sure the targets this is applied to is
100% correct.
llvm-svn: 360903
This patch add the ISD::LROUND and ISD::LLROUND along with new
intrinsics. The changes are straightforward as for other
floating-point rounding functions, with just some adjustments
required to handle the return value being an interger.
The idea is to optimize lround/llround generation for AArch64
in a subsequent patch. Current semantic is just route it to libm
symbol.
llvm-svn: 360889
Summary:
The complex DOT instructions perform a dot-product on quadtuplets from
two source vectors and the resuling wide real or wide imaginary is
accumulated into the destination register. The instructions come in two
forms:
Vector form, e.g.
cdot z0.s, z1.b, z2.b, #90 - complex dot product on four 8-bit quad-tuplets,
accumulating results in 32-bit elements. The
complex numbers in the second source vector are
rotated by 90 degrees.
cdot z0.d, z1.h, z2.h, #180 - complex dot product on four 16-bit quad-tuplets,
accumulating results in 64-bit elements.
The complex numbers in the second source
vector are rotated by 180 degrees.
Indexed form, e.g.
cdot z0.s, z1.b, z2.b[3], #0 - complex dot product on four 8-bit quad-tuplets,
with specified quadtuplet from second source vector,
accumulating results in 32-bit elements.
cdot z0.d, z1.h, z2.h[1], #0 - complex dot product on four 16-bit quad-tuplets,
with specified quadtuplet from second source vector,
accumulating results in 64-bit elements.
The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest
Reviewed By: SjoerdMeijer, rovka
Differential Revision: https://reviews.llvm.org/D61903
llvm-svn: 360870
Summary:
Add support for the following instructions:
* MUL (indexed and unpredicated vectors forms)
* SQDMULH (indexed and unpredicated vectors forms)
* SQRDMULH (indexed and unpredicated vectors forms)
* SMULH (unpredicated, predicated form added in SVE)
* UMULH (unpredicated, predicated form added in SVE)
* PMUL (unpredicated)
The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest
Reviewed By: SjoerdMeijer, rovka
Differential Revision: https://reviews.llvm.org/D61902
llvm-svn: 360867
If we're trying to match an LEA, its possible the LEA match will be deemed unprofitable. In which case the negation we created in matchAddress would be left dangling in the SelectionDAG. This could artificially increase use counts for other nodes in the DAG. Though I don't have an example of that. But it just seems like bad form to have dangling nodes in isel.
Differential Revision: https://reviews.llvm.org/D61047
llvm-svn: 360823
Summary:
Otherwise, we emit directives for CFI without any actual CFI opcodes to
go with them, which causes tools to malfunction. The technique is
similar to what the x86 backend already does.
Fixes https://bugs.llvm.org/show_bug.cgi?id=40876
Patch by: froydnj (Nathan Froyd)
Reviewers: mstorsjo, eli.friedman, rnk, mgrang, ssijaric
Reviewed By: rnk
Subscribers: javed.absar, kristof.beyls, llvm-commits, dmajor
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61960
llvm-svn: 360816
These particular instructions only operate on 128-bit vectors and have no wider equivalents. And the
element size is always known.
One could argue that MOVSS/MOVSD could be merged, but that's probably disruptive to code in
X86ISelLowering and probably low value.
llvm-svn: 360815
Summary:
SGPR in CC can be either hw initialized or set by other chained shaders
and so this increases the SGPR count availalbe to CC to 105.
Change-Id: I3dfadc750fe4a3e2bd07117a2899fd13f3e2fef3
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61261
llvm-svn: 360778
The new cortex-m schedule in rL360768 helps performance, but can increase the
amount of high-registers used. This, on average, ends up increasing the
codesize by a fair amount (because less instructions are converted from T2 to
T1). On cortex-m at -Oz, where we are quite size-paranoid, it is better to use
the existing DAG scheduler with the RegPressure scheduling preference (at least
until the issues around T2 vs T1 instructions can be improved).
I have also made sure that the Sched::RegPressure dag scheduler is always
chosen for MinSize.
The test shows one case where we increase the number of registers used.
Differential Revision: https://reviews.llvm.org/D61882
llvm-svn: 360769
This patch adds a simple Cortex-M4 schedule, renaming the existing M3
schedule to M4 and filling in the latencies as-per the Cortex-M4 TRM:
https://developer.arm.com/docs/ddi0439/latest
Most of these are 1, with the important exception being loads taking 2
cycles. A few others are also higher, but I don't believe they make a
large difference. I've repurposed the M3 schedule as the latencies are
mostly the same between the two cores, with the M4 having more FP and
DSP instructions. We also turn on MISched and UseAA for the cores that
now use this.
It also adds some schedule Write's to various instruction to make things
simpler.
Differential Revision: https://reviews.llvm.org/D54142
llvm-svn: 360768
LLVM previously used `DW_CFA_def_cfa` instruction in .eh_frame to set
the register and offset for current CFA rule. We change it to
`DW_CFA_def_cfa_register` which is the same one used by GAS that only
changes the register but keeping the old offset.
Patch by Mirko Brkusanin.
Differential Revision: https://reviews.llvm.org/D61899
llvm-svn: 360765
They encode the same way, but OR32mi8Locked sets hasUnmodeledSideEffects set
which should be stronger than the mayLoad/mayStore on LOCK_OR32mi8. I think
this makes sense since we are using it as a fence.
This also seems to hide the operation from the speculative load hardening pass
so I've reverted r360511.
llvm-svn: 360747
Move the declarations of getThe<Name>Target() functions into a new header in
TargetInfo and make users of these functions include this new header.
This fixes a layering problem.
llvm-svn: 360738
Move the declarations of getThe<Name>Target() functions into a new header in
TargetInfo and make users of these functions include this new header.
This fixes a layering problem.
llvm-svn: 360736
Move the declarations of getThe<Name>Target() functions into a new header in
TargetInfo and make users of these functions include this new header.
This fixes a layering problem.
llvm-svn: 360735
Move the declarations of getThe<Name>Target() functions into a new header in
TargetInfo and make users of these functions include this new header.
This fixes a layering problem.
llvm-svn: 360734
Move the declarations of getThe<Name>Target() functions into a new header in
TargetInfo and make users of these functions include this new header.
This fixes a layering problem.
llvm-svn: 360733
Move the declarations of getThe<Name>Target() functions into a new header in
TargetInfo and make users of these functions include this new header.
This fixes a layering problem.
llvm-svn: 360732
Move the declarations of getThe<Name>Target() functions into a new header in
TargetInfo and make users of these functions include this new header.
This fixes a layering problem.
llvm-svn: 360731
Move the declarations of getThe<Name>Target() functions into a new header in
TargetInfo and make users of these functions include this new header.
This fixes a layering problem.
llvm-svn: 360729
Move the declarations of getThe<Name>Target() functions into a new header in
TargetInfo and make users of these functions include this new header.
This fixes a layering problem.
llvm-svn: 360728
Move the declarations of getThe<Name>Target() functions into a new header in
TargetInfo and make users of these functions include this new header.
This fixes a layering problem.
llvm-svn: 360727
Move the declarations of getThe<Name>Target() functions into a new header in
TargetInfo and make users of these functions include this new header.
This fixes a layering problem.
llvm-svn: 360726
Move the declarations of getThe<Name>Target() functions into a new header in
TargetInfo and make users of these functions include this new header.
This fixes a layering problem.
llvm-svn: 360724
Move the declarations of getThe<Name>Target() functions into a new header in
TargetInfo and make users of these functions include this new header.
This fixes a layering problem.
llvm-svn: 360722
Move the declarations of getThe<Name>Target() functions into a new header in
TargetInfo and make users of these functions include this new header.
This fixes a layering problem.
llvm-svn: 360721
This was the portion split off D58632 so that it could follow the redzone API cleanup. Note that I changed the offset preferred from -8 to -64. The difference should be very minor, but I thought it might help address one concern which had been previously raised.
Differential Revision: https://reviews.llvm.org/D61862
llvm-svn: 360719
Move the declarations of getThe<Name>Target() functions into a new header in
TargetInfo and make users of these functions include this new header.
This fixes a layering problem.
llvm-svn: 360718
Move the declarations of getThe<Name>Target() functions into a new header in
TargetInfo and make users of these functions include this new header.
This fixes a layering problem.
llvm-svn: 360716
Move the declarations of getThe<Name>Target() functions into a new header in
TargetInfo and make users of these functions include this new header.
This fixes a layering problem.
llvm-svn: 360713
Move the declarations of getThe<Name>Target() functions into a new header in
TargetInfo and make users of these functions include this new header.
This fixes a layering problem.
llvm-svn: 360709
The +DumpCode attribute is a horrible hack in AMDGPU to embed the
disassembly of the generated code into the elf file. It is used by LLPC
to implement an extension that allows the application to read back the
disassembly of the code. Longer term, we should re-implement that by
using the LLVM disassembler from the Vulkan driver.
Recent LLVM changes broke +DumpCode. With -filetype=asm it crashed, and
with -filetype=obj I think it did not include any instructions, only the
labels. Fixed with this commit: now it has no effect with -filetype=asm,
and works as intended with -filetype=obj.
Differential Revision: https://reviews.llvm.org/D60682
Change-Id: I6436d86fe2ea220d74a643a85e64753747c9366b
llvm-svn: 360688
D61068 handled vector shifts, this patch does the same for scalars where there are similar number of pipes for shifts as bit ops - this is true almost entirely for AMD targets where the scalar ALUs are well balanced.
This combine avoids AND immediate mask which usually means we reduce encoding size.
Some tests show use of (slow, scaled) LEA instead of SHL in some cases, but thats due to particular shift immediates - shift+mask generate these just as easily.
Differential Revision: https://reviews.llvm.org/D61830
llvm-svn: 360684
Summary:
This patch adds support for the following instructions:
MLA mul-add, writing addend (Zda = Zda + Zn * Zm[idx])
MLS mul-sub, writing addend (Zda = Zda + -Zn * Zm[idx])
Predicated forms of these instructions were added in SVE.
The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest
Reviewed By: rovka
Differential Revision: https://reviews.llvm.org/D61514
llvm-svn: 360682
For known CRBit spills, CRSET/CRUNSET, it is more efficient to load and spill
the known value instead of extracting the bit.
eg. This sequence is currently used to spill a CRUNSET:
crclr 4*cr5+lt
mfocrf r3,4
rlwinm r3,r3,20,0,0
stw r3,132(r1)
This patch custom lower it to:
li r3,0
stw r3,132(r1)
Differential Revision: https://reviews.llvm.org/D61754
llvm-svn: 360677
This adds support for the arm64_32 watchOS ABI to LLVM's low level tools,
teaching them about the specific MachO choices and constants needed to
disassemble things.
llvm-svn: 360663
This is a follow on to D58632, with the same logic. Given a memory operation which needs ordering, but doesn't need to modify any particular address, prefer to use a locked stack op over an mfence.
Differential Revision: https://reviews.llvm.org/D61863
llvm-svn: 360649
Returning SDValue() makes the caller think that nothing happened and it will
end up executing the Expand path. This generates extra nodes that will need to
be pruned as dead code.
Returning an ISD::MERGE_VALUES will tell the caller that we'd like to make a
change and it will take care of replacing uses. This will prevent falling into
the Expand path.
llvm-svn: 360627
These are updates to match how isel table would emit a LOCK_OR32mi8 node.
-Use i32 for the immediate zero even though only 8 bits are encoded.
-Use i16 for segment register.
-Use LOCK_OR32mi8 for idempotent atomic operations in 32-bit mode to match
64-bit mode. I'm not sure why OR32mi8Locked and LOCK_OR32mi8 both exist. The
only difference seems to be that OR32mi8Locked is marked as UnmodeledSideEffects=1.
-Emit an extra i32 result for the flags output.
I don't know if the types here really matter just noticed it was inconsistent
with normal behavior.
llvm-svn: 360619
Usually this will abort fast-isel at the instruction using the
non-legal result, but if the only use is in a different basic block,
we'll incorrectly assume that the zext/sext is to i32 (rather than
i128 in this case).
Differential Revision: https://reviews.llvm.org/D61823
llvm-svn: 360616
Summary:
X86TargetLowering::LowerAsmOperandForConstraint had better support than
TargetLowering::LowerAsmOperandForConstraint for arbitrary depth
getelementpointers for "i", "n", and "s" extended inline assembly
constraints. Hoist its support from the derived class into the base
class.
Link: https://github.com/ClangBuiltLinux/linux/issues/469
Reviewers: echristo, t.p.northover
Reviewed By: t.p.northover
Subscribers: t.p.northover, E5ten, kees, jyknight, nemanjai, javed.absar, eraman, hiraditya, jsji, llvm-commits, void, craig.topper, nathanchance, srhines
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61560
llvm-svn: 360604
Fixes the regression noted in D61782 where a VZEXT_MOVL was being inserted because we weren't discriminating between 'zeroable' and 'all undef' for the upper elts.
Differential Revision: https://reviews.llvm.org/D61782
llvm-svn: 360596
Now that we can use HADD/SUB for scalar additions from any pair of extracted elements (D61263), we can relax the one use limit as we will be able to merge multiple uses into using the same HADD/SUB op.
This exposes a couple of missed opportunities in LowerBuildVectorv4x32 which will be committed separately.
Differential Revision: https://reviews.llvm.org/D61782
llvm-svn: 360594
Summary:
This patch adds the following features defined by Arm SVE2 architecture
extension:
sve2, sve2-aes, sve2-sm4, sve2-sha3, bitperm
For existing CPUs these features are declared as unsupported to prevent
scheduler errors.
The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest
Reviewers: SjoerdMeijer, sdesmalen, ostannard, rovka
Reviewed By: SjoerdMeijer, rovka
Subscribers: rovka, javed.absar, tschuett, kristof.beyls, kristina, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61513
llvm-svn: 360573
This adds the FPC (floating-point control register) as a reserved
physical register and models its use by SystemZ instructions.
Note that only the current rounding modes and the IEEE exception
masks are modeled. *Changes* of the FPC due to exceptions (in
particular the IEEE exception flags and the DXC) are not modeled.
At this point, this patch is mostly NFC, but it will prevent
scheduling of floating-point instructions across SPFC/LFPC etc.
llvm-svn: 360570
When deciding the safety of generating smlad, we checked for any
writes within the block that may alias with any of the loads that
need to be widened. This is overly conservative because it only
matters when there's a potential aliasing write to a location
accessed by a pair of loads.
Now we check for aliasing writes only once, during setup. If two
loads are found to have an aliasing write between them, we don't add
these loads to LoadPairs. This means that later during the transform,
we can safely widened a pair without worrying about aliasing.
However, to maintain correctness, we also need to change the way that
wide loads are inserted because the order is now important.
The MatchSMLAD method has also been changed, absorbing
MatchReductions and AddMACCandidate to hopefully improve readability.
Differential Revision: https://reviews.llvm.org/D6102
llvm-svn: 360567
This fixes the link error
ld.lld: error: undefined symbol: llvm::WebAssembly::anyTypeToString(unsigned int)
>>> referenced by WebAssemblyDisassembler.cpp
llvm-svn: 360558
Currently, without -g, BTF sections may still be emitted with
data sections, e.g., for linux kernel bpf selftest
test_tcp_check_syncookie_kern.c issue discovered by Martin
as shown below.
-bash-4.4$ bpftool btf dump file test_tcp_check_syncookie_kern.o
[1] VAR 'results' type_id=0, linkage=global-alloc
[2] VAR '_license' type_id=0, linkage=global-alloc
[3] DATASEC 'license' size=0 vlen=1
type_id=2 offset=0 size=4
[4] DATASEC 'maps' size=0 vlen=1
type_id=1 offset=0 size=28
Let disable BTF generation if no debuginfo, which is
the original design.
Signed-off-by: Yonghong Song <yhs@fb.com>
Differential Revision: https://reviews.llvm.org/D61826
llvm-svn: 360556
I've included a new fix in X86RegisterInfo to prevent PR41619 without
reintroducing r359392. We might be able to improve that in the base class
implementation of shouldRewriteCopySrc somehow. But this hopefully enables
forward progress on SimplifyDemandedBits improvements for now.
Original commit message:
This patch adds support for BigBitWidth -> SmallBitWidth bitcasts, splitting the DemandedBits/Elts accordingly.
The AMDGPU backend needed an extra (srl (and x, c1 << c2), c2) -> (and (srl(x, c2), c1) combine to encourage BFE creation, I investigated putting this in DAGComb
but it caused a lot of noise on other targets - some improvements, some regressions.
The X86 changes are all definite wins.
llvm-svn: 360552
For some targets, there is a circular dependency between InstPrinter and
MCTargetDesc. Merging them together will fix this. For the other targets,
the merging is to maintain consistency so all targets will have the same
structure.
llvm-svn: 360550
See if we can simplify the demanded vector elts from the extraction before trying to simplify the demanded bits.
This helps us with target shuffles and hops in particular.
llvm-svn: 360535
The original costs stopped at SSE42, I've added conservative estimates for everything down to SSE1/SSE2 and moved some of the SSE42 costs to SSE41 (really only the addition of PCMPGT makes any difference).
I've also added missing vXi8 costs (we use PHMINPOSUW for i8/i16 for scarily quick results) and 256-bit vector costs for AVX1.
llvm-svn: 360528
For some targets, there is a circular dependency between InstPrinter and
MCTargetDesc. Merging them together will fix this. For the other targets,
the merging is to maintain consistency so all targets will have the same
structure.
llvm-svn: 360510
For some targets, there is a circular dependency between InstPrinter and
MCTargetDesc. Merging them together will fix this. For the other targets,
the merging is to maintain consistency so all targets will have the same
structure.
llvm-svn: 360506
For some targets, there is a circular dependency between InstPrinter and
MCTargetDesc. Merging them together will fix this. For the other targets,
the merging is to maintain consistency so all targets will have the same
structure
llvm-svn: 360505
For some targets, there is a circular dependency between InstPrinter and
MCTargetDesc. Merging them together will fix this. For the other targets,
the merging is to maintain consistency so all targets will have the same
structure.
llvm-svn: 360502
For some targets, there is a circular dependency between InstPrinter and
MCTargetDesc. Merging them together will fix this. For the other targets,
the merging is to maintain consistency so all targets will have the same
structure.
llvm-svn: 360500
For some targets, there is a circular dependency between InstPrinter and
MCTargetDesc. Merging them together will fix this. For the other targets,
the merging is to maintain consistency so all targets will have the same
structure.
llvm-svn: 360498
For some targets, there is a circular dependency between InstPrinter and
MCTargetDesc. Merging them together will fix this. For the other targets,
the merging is to maintain consistency so all targets will have the same
structure.
llvm-svn: 360497
For some targets, there is a circular dependency between InstPrinter and
MCTargetDesc. Merging them together will fix this. For the other targets,
the merging is to maintain consistency so all targets will have the same
structure.
llvm-svn: 360496
For some targets, there is a circular dependency between InstPrinter and
MCTargetDesc. Merging them together will fix this. For the other targets,
the merging is to maintain consistency so all targets will have the same
structure.
llvm-svn: 360494
For some targets, there is a circular dependency between InstPrinter and
MCTargetDesc. Merging them together will fix this. For the other targets,
the merging is to maintain consistency so all targets will have the same
structure.
llvm-svn: 360493
For some targets, there is a circular dependency between InstPrinter and
MCTargetDesc. Merging them together will fix this. For the other targets,
the merging is to maintain consistency so all targets will have the same
structure.
llvm-svn: 360490
For some targets, there is a circular dependency between InstPrinter and
MCTargetDesc. Merging them together will fix this. For the other targets,
the merging is to maintain consistency so all targets will have the same
structure.
llvm-svn: 360488
For some targets, there is a circular dependency between InstPrinter and
MCTargetDesc. Merging them together will fix this. For the other targets,
the merging is to maintain consistency so all targets will have the same
structure.
llvm-svn: 360487
For some targets, there is a circular dependency between InstPrinter and
MCTargetDesc. Merging them together will fix this. For the other targets,
the merging is to maintain consistency so all targets will have the same
structure.
llvm-svn: 360486
For some targets, there is a circular dependency between InstPrinter and
MCTargetDesc. Merging them together will fix this. For the other targets,
the merging is to maintain consistency so all targets will have the same
structure.
llvm-svn: 360485
For some targets, there is a circular dependency between InstPrinter and
MCTargetDesc. Merging them together will fix this. For the other targets,
the merging is to maintain consistency so all targets will have the same
structure.
llvm-svn: 360484
As requested in D58632, cleanup our red zone detection logic in the X86 backend. The existing X86MachineFunctionInfo flag is used to track whether we *use* the redzone (via a particularly optimization?), but there's no common way to check whether the function *has* a red zone.
I'd appreciate careful review of the uses being updated. I think they are NFC, but a careful eye from someone else would be appreciated.
Differential Revision: https://reviews.llvm.org/D61799
llvm-svn: 360479
After D58632, we can create idempotent atomic operations to the top of stack.
This confused speculative load hardening because it thinks accesses should have
virtual register base except for the cases it already excluded.
This commit adds a new exclusion for this case. I'll try to reduce a test case
for this, but this fix was verified to work by the reporter. This should avoid
needing to revert D58632.
llvm-svn: 360475
Summary: Skip over prefetches when assigning debug info to instructions with memory operands. This way, the debug info is stable after instrumenting a binary with prefetches, allowing for iterative profiling and instrumentation.
Reviewers: davidxl
Reviewed By: davidxl
Subscribers: aprantl, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61789
llvm-svn: 360471
Fixes https://bugs.llvm.org/show_bug.cgi?id=40969
The functions findPotentiallyBlockedCopies and buildCopy are currently not
accounting for the presence of debug instructions. In the former this results
in the optimization not being trigerred, and in the latter results in
inconsistent codegen.
This patch enables the optimization to be performed in a debug build and
ensures the codegen is consistent with non-debug builds.
Patch by Chris Dawson.
Differential Revision: https://reviews.llvm.org/D61680
llvm-svn: 360436
If we only use the lower xmm of a ymm hop, then extract the xmm's (for free), perform the xmm hop and then insert back into a ymm (for free).
Fixes some of the regressions noted in D61782
llvm-svn: 360435
The current PIC model for WebAssembly is more like ELF in that it
allows symbol interposition.
This means that more functions end up being addressed via the GOT
and fewer directly added to the wasm table.
One effect is a reduction in the number of wasm table entries similar
to the previous attempt in https://reviews.llvm.org/D61539 which was
reverted.
Differential Revision: https://reviews.llvm.org/D61772
llvm-svn: 360402
This also allows three op patterns to use increased constant bus
limit of GFX10.
Differential Revision: https://reviews.llvm.org/D61763
llvm-svn: 360395
The current lowering uses an mfence. mfences are substaintially higher latency than the locked operations originally requested, but we do want to avoid contention on the original cache line. As such, use a locked instruction on a cache line assumed to be thread local.
Differential Revision: https://reviews.llvm.org/D58632
llvm-svn: 360393
Summary:
The ".dword" directive is a synonym for ".xword" and is used used
by klibc, a minimalistic libc subset for initramfs.
Reviewers: t.p.northover, nickdesaulniers
Reviewed By: nickdesaulniers
Subscribers: nickdesaulniers, javed.absar, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61719
llvm-svn: 360381
As reported on PR39920, "slow horizontal ops" targets tend to internally expand to 2*shuffle+add/sub - so if we can reduce 2*shuffle+add/sub to a hadd/sub then we should do it - similar port usage but reduced instruction count.
This works out in most cases, although the "PR22377" regression in vector-shuffle-combining.ll is annoying - going from 2*shuffle+add+shuffle to hadd+2*shuffle - I've opened PR41813 to cover this.
Differential Revision: https://reviews.llvm.org/D61308
llvm-svn: 360360
I've started this cleanup more several times now, but got sidetracked
elsewhere, e.g. by llvm-exegesis problems. Not this time, finally!
This is mainly cleaning up the inverse throughput values,
and a few latencies/uops, based on the llvm-exegesis measured values.
Though this is not complete by any means,
there's certainly more cleanup to be done.
The performance numbers (i've only checked by RawSpeed benchmark) aren't
really surprising - overall this *slightly* (< -1%) improves perf.
llvm-svn: 360341
Add an Argument that has the SExtAttr attached, as well as SIToFP
instructions, as values that generate sign bits. SIToFP doesn't
strictly do this and could be treated as a sink to be sign-extended.
Differential Revision: https://reviews.llvm.org/D61381
llvm-svn: 360331
This code was never covered by tests, in PR41786 it was pointed out that
the deletion part doesn't work, and in a full Chrome build I was never
able to hit the code path that looks through copies. It seems the
situation it's supposed to handle doesn't actually come up in practice.
Delete it to simplify the code.
Differential revision: https://reviews.llvm.org/D61671
llvm-svn: 360320
The VOP3 form should always be the preferred selection, to be shrunk
later. This should only be an optimization issue, but this partially
works around a problem from clobbering VCC when SIFixSGPRCopies
rewrites an SCC defining operation directly to VCC.
3 of the testcases are regressions from failing to fold the immediate
in cases it should. These can be avoided by improving the VCC liveness
handling in SIFoldOperands. Simply increasing the threshold to
computeRegisterLiveness works, although this is common enough that VCC
liveness should probably be tracked throughout the pass. The hack of
leaving behind an implicit_def instruction to avoid breaking iterator
wastes instruction count, which inhibits finding the VCC def in long
chains of adds. Doing this however exposes different, worse looking
regressions from poor scheduling behavior. This could probably be
avoided around by forcing the shrink of the addc here, but the
scheduler should probably be fixed.
The r600 add test needs to be split out because it asserts on the
arguments in the new test during the calling convention lowering.
llvm-svn: 360293
Summary:
Preserve MemorySSA in LoopSimplify, in the old pass manager, if the analysis is available.
Do not preserve it in the new pass manager.
Update tests.
Subscribers: nemanjai, jlebar, javed.absar, Prazek, kbarton, zzheng, jsji, llvm-commits, george.burgess.iv, chandlerc
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60833
llvm-svn: 360270
This was committed in rL358887 but reverted in rL360066 due to a x86 regression, really it should be have been pre-committed instead of being part of the SimplifyDemandedBits bitcast patch.
llvm-svn: 360263
Using SP in this position is unpredictable in ARMv7. CMP and CMN are not
affected, and of course v8 relaxes this requirement, but that's handled
elsewhere.
llvm-svn: 360242
Summary:
A COFF stub indirects the reference to a symbol through memory. A
.refptr.$sym global variable pointer is created to refer to $sym.
Typically mingw uses these for external global variable declarations,
but we can use them for weak function declarations as well.
Updates the dso_local classification to add a special case for
extern_weak symbols on COFF in both clang and LLVM.
Fixes PR37598
Reviewers: smeenai, mstorsjo
Subscribers: hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D61615
llvm-svn: 360207
Summary: GCNHazardRecognizer fails to identify hazards that are in and around bundles. This patch allows the hazard recognizer to consider bundled instructions in both scheduler and hazard recognizer mode. We ignore “bundledness” for the purpose of detecting hazards and examine the instructions individually.
Reviewers: arsenm, msearles, rampitec
Reviewed By: rampitec
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61564
llvm-svn: 360199
The single-constant algorithm produces infinities on a lot of denormal values.
The precision of the two-constant algorithm is actually sufficient across the
range of denormals. We will switch to that algorithm for now to avoid the
infinities on denormals. In the future, we will re-evaluate the algorithm to
find the optimal one for PowerPC.
Differential revision: https://reviews.llvm.org/D60037
llvm-svn: 360144
Basic "revectorization" combine, we can probably do more opcodes here but it can be a tricky cost-benefit depending on where the subvectors came from - but this case helps shuffle combining.
llvm-svn: 360134
Summary:
No test case because I don't know of a way to trigger this, but I
accidentally caused this to fail while working on a different change.
Change-Id: I8015aa447fe27163cc4e4902205a203bd44bf7e3
Reviewers: arsenm, rampitec
Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61490
llvm-svn: 360123
TypedDINodeRef<T> is a redundant wrapper of Metadata * that is actually a T *.
Accordingly, change DI{Node,Scope,Type}Ref uses to DI{Node,Scope,Type} * or their const variants.
This allows us to delete many resolve() calls that clutter the code.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D61369
llvm-svn: 360108
The FR32/FR64/VR128/VR256 register classes don't contain the upper 16 registers. For most cases we use the default implementation which will find any register class that contains the register in question if the VT is legal for the register class. But if the VT is i32 or i64, we won't find a matching register class and will instead up in the code modified in this patch.
If the requested register is x/y/zmm16-31 we weren't returning a register class that contains those registers and will hit an assertion in the caller.
To fix this, I've changed to use the extended register class instead. I don't believe we need a subtarget check to see if avx512 is enabled. The default implementation just pick whatever register class it finds first. I checked and we currently pick FR32X for XMM0 with an f32 type using the default implementation regardless of whether avx512 is enabled. So I assume its it is ok to do the same for i32.
Differential Revision: https://reviews.llvm.org/D61457
llvm-svn: 360102
This generally follows what other targets do. I don't completely
understand why the special case for tail calls existed in the first
place; even when the code was committed in r105413, call lowering didn't
work in the way described in the comments.
Stack protector lowering breaks if the register copies are not glued to
a tail call: we have to insert the stack protector check before the tail
call, and we choose the location based on the assumption that all
physical register dependencies of a tail call are adjacent to the tail
call. (See FindSplitPointForStackProtector.) This is sort of fragile,
but I don't see any reason to break that assumption.
I'm guessing nobody has seen this before just because it's hard to
convince the scheduler to actually schedule the code in a way that
breaks; even without the glue, the only computation that could actually
be scheduled after the register copies is the computation of the call
address, and the scheduler usually prefers to schedule that before the
copies anyway.
Fixes https://bugs.llvm.org/show_bug.cgi?id=41417
Differential Revision: https://reviews.llvm.org/D60427
llvm-svn: 360099
We require d/q suffixes on the memory form of these instructions to disambiguate the memory size.
We don't require it on the register forms, but need to support parsing both with and without it.
Previously we always printed the d/q suffix on the register forms, but it's redundant and
inconsistent with gcc and objdump.
After this patch we should support the d/q for parsing, but not print it when its unneeded.
llvm-svn: 360085
After support for dealing with types that need to be extended in some way was
added in r358032 we didn't correctly handle <1 x T> return types. These types
don't have a GISel direct representation, instead we just see them as scalars.
When we need to pad them into <2 x T> types however we need to use a
G_BUILD_VECTOR instead of trying to do a G_CONCAT_VECTOR.
This fixes PR41738.
llvm-svn: 360068
Reverts "[X86] Remove (V)MOV64toSDrr/m and (V)MOVDI2SSrr/m. Use 128-bit result MOVD/MOVQ and COPY_TO_REGCLASS instead"
Reverts "[TargetLowering][AMDGPU][X86] Improve SimplifyDemandedBits bitcast handling"
Eric Christopher and Jorge Gorbe Moya reported some issues with these patches to me off list.
Removing the CodeGenOnly instructions has changed how fneg is handled during fast-isel with sse/sse2. We're now emitting fsub -0.0, x instead
moving to the integer domain(in a GPR), xoring the sign bit, and then moving back to xmm. This is because the fast isel table no longer
contains an entry for (f32/f64 bitcast (i32/i64)) so the target independent fneg code fails. The use of fsub changes the behavior of nan with
respect to -O2 codegen which will always use a pxor. NOTE: We still have a difference with double with -m32 since the move to GPR doesn't work
there. I'll file a separate PR for that and add test cases.
Since removing the CodeGenOnly instructions was fixing PR41619, I'm reverting r358887 which exposed that PR. Though I wouldn't be surprised
if that bug can still be hit independent of that.
This should hopefully get Google back to green. I'll work with Simon and other X86 folks to figure out how to move forward again.
llvm-svn: 360066
A condition for exiting the legalization of v4i32 conversion to v2f64 through
extract/convert/build erroneously checks for the extract having type i32.
This is not adequate as smaller extracts are actually legalized to i32 as well.
Furthermore, an early exit is missing which means that we only check that
both extracts are from the same vector if that check fails.
As a result, both cases in the included test case fail - the first gets a
select error and the second generates incorrect code.
The culprit commit is r274535.
llvm-svn: 360043
scan-build was reporting that CommutableOpIdx1 never used its original initialized value - move it down to where its first used to make the real initialization more obvious (and matches the comment that's there).
llvm-svn: 360028
Summary:
1. Enable infrastructure of AVX512_BF16, which is supported for BFLOAT16 in Cooper Lake;
2. Enable VCVTNE2PS2BF16, VCVTNEPS2BF16 and DPBF16PS instructions, which are Vector Neural Network Instructions supporting BFLOAT16 inputs and conversion instructions from IEEE single precision.
VCVTNE2PS2BF16: Convert Two Packed Single Data to One Packed BF16 Data.
VCVTNEPS2BF16: Convert Packed Single Data to Packed BF16 Data.
VDPBF16PS: Dot Product of BF16 Pairs Accumulated into Packed Single Precision.
For more details about BF16 isa, please refer to the latest ISE document: https://software.intel.com/en-us/download/intel-architecture-instruction-set-extensions-programming-reference
Author: LiuTianle
Reviewers: craig.topper, smaslov, LuoYuanke, wxiao3, annita.zhang, RKSimon, spatel
Reviewed By: craig.topper
Subscribers: kristina, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60550
llvm-svn: 360017
This saves us some unnecessary copies.
If the inputs to a G_SELECT are floating point, we should use fcsel rather than
csel.
Changes here are...
- Teach selectCopy about s1-to-s1 copies across register banks.
- AArch64RegisterBankInfo about G_SELECT in general.
- Teach the instruction selector about the FCSEL instructions.
Also add two tests:
- select-select.mir to show that we get the expected FCSEL
- regbank-select.mir (unfortunately named) to show the register banks on
G_SELECT are properly preserved
And update fast-isel-select.ll to show that we do the same thing as other
instruction selectors in these cases.
llvm-svn: 359940
The x/y/z suffix is needed to disambiguate the memory form in at&t syntax since no xmm/ymm/zmm register is mentioned.
But we should also allow it for the register and broadcast forms where its not needed for consistency. This matches gas.
The printing code will still only use the suffix for the memory form where it is needed.
llvm-svn: 359903
The VOP3 form should always be the preferred selection form to be
shrunk later.
The r600 sub test needs to be split out because it asserts on the
arguments in the new test during the calling convention lowering.
llvm-svn: 359899
This was broken if the original operand was killed. The kill flag
would appear on both instructions, and fail the verifier. Keep the
kill flag, but remove the operands from the old instruction. This has
an added benefit of really reducing the use count for future folds.
Ideally the pass would be structured more like what PeepholeOptimizer
does to avoid this hack to avoid breaking instruction iterators.
llvm-svn: 359891
When a fold of an immediate into a sub/subrev required shrinking the
instruction, the wrong VOP2 opcode was used. This was using the VOP2
equivalent of the original instruction, not the commuted instruction
with the inverted opcode.
llvm-svn: 359883
The default impementation in the base class for TargetLowering::getRegForInlineAsmConstraint doesn't work for mask registers when the VT is a scalar type integer types since the only legal mask types are vXi1. So we end up just getting whatever the first register class that contains the register. Currently this appears to be VK1, but its really dependent on the order tablegen outputs the register classes.
Some code in the caller ends up looking up the type for this register class and find v1i1 then generates a copyfromreg from the physical k-register with the v1i1 type. Then it generates an any_extend from v1i1 to the scalar VT which isn't legal. This bad any_extend sticks around until isel where it selects a MOVZX32rr8 with a v1i1 input or maybe a i8 input. Not sure but eventually we pick up a copy from VK1 to GR8 in MachineIR which isn't supported. This leads to a failure in physical register copying.
This patch uses the scalar type to find a VK class of the right size. In the attached test case this will be VK16. This causes a bitcast from vk16 to i16 to be generated instead of an any_extend. This will be properly iseled to a VK16 to GR32 copy and a GR32->GR16 extract_subreg.
Fixes PR41678
Differential Revision: https://reviews.llvm.org/D61453
llvm-svn: 359837
We don't have FP exception limits in the IR constant folder for the binops (apart from strict ops),
so it does not make sense to have them here in the DAG either. Nothing else in the backend tries
to preserve exceptions (again outside of strict ops), so I don't see how this could have ever
worked for real code that cares about FP exceptions.
There are still cases (examples: unary opcodes in SDAG, FMA in IR) where we are trying (at least
partially) to preserve exceptions without even asking if the target supports FP exceptions. Those
should be corrected in subsequent patches.
Real support for FP exceptions requires several changes to handle the constrained/strict FP ops.
Differential Revision: https://reviews.llvm.org/D61331
llvm-svn: 359791
Limiting scalar hadd/hsub generation to the lowest xmm looks to be unnecessary - we will be extracting one upper xmm whatever, and we can remove a shuffle by using the hop which is inline with what shouldUseHorizontalOp expects to happen anyway.
Testing on btver2 (the main target for fast-hops) shows this is beneficial even for float ops where we have a 'shuffle' to extract the float result:
https://godbolt.org/z/0R-U-K
Differential Revision: https://reviews.llvm.org/D61426
llvm-svn: 359786
Select G_SEXT and G_ZEXT with destination types smaller than 32 bits in
the exact same way as 32 bits. This overwrites the higher bits, but that
should be ok since all legal users of types smaller than 32 bits ignore
those bits anyway.
llvm-svn: 359768
Summary:
Based on the Eli Friedman's comments in https://reviews.llvm.org/D60811 , we'd better return early if the element type is not byte-sized in `combineBVOfConsecutiveLoads`.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D61076
llvm-svn: 359764
The broadcasting variant for instruction vfpclassp[d,s] shouldn't use suffix q/l. So remove them from the template.
Patch by Pengfei Wang
Differential Revision: https://reviews.llvm.org/D61295
llvm-svn: 359753
This adds support for using fmov rather than a standard mov to materialize
G_FCONSTANT when it's safe to do so.
Update arm64-fast-isel-materialize.ll and select-constant.mir to show that the
selection is correct.
llvm-svn: 359734
We already perform horizontal add/sub if we extract from elements 0 and 1, this patch extends it to non-0/1 element extraction indices (as long as they are from the lowest 128-bit vector).
Differential Revision: https://reviews.llvm.org/D61263
llvm-svn: 359707
This is an alternative to D59669 which more aggressively extracts i1 elements from vXi1 bool vectors using a MOVMSK.
Differential Revision: https://reviews.llvm.org/D61189
llvm-svn: 359666
Add support for f16 libcalls in WebAssembly. This entails adding signatures
for the remaining F16 libcalls, and renaming gnu_f2h_ieee/gnu_h2f_ieee to
truncsfhf2/extendhfsf2 for consistency between f32 and f64/f128 (compiler-rt
already supports this).
Differential Revision: https://reviews.llvm.org/D61287
Reviewer: dschuff
llvm-svn: 359600
The reordering can leave at least a dead TokenFactor in the graph. This cause the linearize scheduler to fail with something like the assert seen in PR22614. This is only one of many ways we can break the linearize scheduler today so I can't say for sure that any of the other failures in that bug were caused by this issue.
This takes the heavy hammer approach of just running RemoveDeadNodes unconditionally at the end of the PreprocessISelDAG. If this turns out to be a compile time hit, we can try to refine it.
Differential Revision: https://reviews.llvm.org/D61164
llvm-svn: 359582
This removes some of the class variables. Merge basic block processing into
runOnMachineFunction to keep the flags local.
Pass MachineBasicBlock around instead of an iterator. We can get the iterator in
the few places that need it. Allows a range-based outer for loop.
Separate the Atom optimization from the rest of the optimizations. This allows
fixupIncDec to create INC/DEC and still allow Atom to turn it back into LEA
when profitable by its heuristics.
I'd like to improve fixupIncDec to turn LEAs into ADD any time the base or index
register is equal to the destination register. This is profitable regardless of
the various slow flags. But again we would want Atom to be able to undo that.
Differential Revision: https://reviews.llvm.org/D60993
llvm-svn: 359581
This implements TargetTransformInfo method getMemcpyCost, which estimates the
number of instructions to which a memcpy instruction expands to.
Differential Revision: https://reviews.llvm.org/D59787
llvm-svn: 359547
Current LLVM uses pxor+pinsrb on SSE4+ for INSERT_VECTOR_ELT(ZeroVec, 0, Elt) insead of much simpler movd.
INSERT_VECTOR_ELT(ZeroVec, 0, Elt) is idiomatic construct which is used e.g. for _mm_cvtsi32_si128(Elt) and for lowest element initialization in _mm_set_epi32.
So such inefficient lowering leads to significant performance digradations in ceratin cases switching from SSSE3 to SSE4.
https://bugs.llvm.org/show_bug.cgi?id=41512
Here INSERT_VECTOR_ELT(ZeroVec, 0, Elt) is simply converted to SCALAR_TO_VECTOR(Elt) when applicable since latter is closer match to desired behavior and always efficiently lowered to movd and alike.
Committed on behalf of @Serge_Preis (Serge Preis)
Differential Revision: https://reviews.llvm.org/D60852
llvm-svn: 359545
Bail out on function arguments/returns with types aggregating an
unsupported type. This fixes cases where we would happily and
incorrectly lower functions taking e.g. [1 x i64] parameters, when we
don't even support plain i64 yet.
llvm-svn: 359540
The MachineFunction wasn't used in getOptimalMemOpType, but more importantly,
this allows reuse of findOptimalMemOpLowering that is calling getOptimalMemOpType.
This is the groundwork for the changes in D59766 and D59787, that allows
implementation of TTI::getMemcpyCost.
Differential Revision: https://reviews.llvm.org/D59785
llvm-svn: 359537
The WebAssembly backend needs to know the signatures of all runtime
libcall functions. This adds the signature for __stack_chk_fail which was
previously missing.
Also, make the error message for a missing libcall include the name of
the function.
Differential Revision: https://reviews.llvm.org/D59521
Reviewed By: sbc100
llvm-svn: 359505
Change the PPCISelLowering.cpp function that decides to avoid update form in
favor of partial vector loads to know about newer load types and to not be
confused by the chain operand.
Differential Revision: https://reviews.llvm.org/D60102
llvm-svn: 359504
This was falling back and gives us a reason to create a selectIntrinsic function
which we would need eventually anyway. Update arm64-crypto.ll to show that we
correctly select it.
Also factor out the code for finding an intrinsic ID.
llvm-svn: 359501
This is necessary since SVN r330706, as tail merging can include
CFI instructions since then.
This fixes PR40322 and PR40012.
Differential Revision: https://reviews.llvm.org/D61252
llvm-svn: 359496
Add target shuffle decoding to isHorizontalBinOp as well as ISD::VECTOR_SHUFFLE support.
This does mean we can go through bitcasts so we need to bitcast the extracted args to ensure they are the correct type
Fixes PR39936 and should help with PR39920/PR39921
Differential Revision: https://reviews.llvm.org/D61245
llvm-svn: 359491
Use size_t assignment to prevent a bad explicit type conversion warning.
Given the typical size of shuffle masks this was never going to happen, but this at least stops the warning.
Reported in https://www.viva64.com/en/b/0629/
llvm-svn: 359479
This patch adds aliases for element sizes .B/.H/.S to the
AND/ORR/EOR/BIC bitwise logical instructions. The assembler now accepts
these instructions with all element sizes up to 64-bit (.D). The
preferred disassembly is .D.
llvm-svn: 359457
Summary:
This patch adds some basic operations for fp16
vectors, such as bitcast from fp16 to i16,
required to perform extract_subvector (also added
here) and extract_element.
Reviewers: SjoerdMeijer, DavidSpickett, t.p.northover, ostannard
Reviewed By: ostannard
Subscribers: javed.absar, kristof.beyls, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60618
llvm-svn: 359433
Summary:
The Procedure Call Standard for the Arm Architecture
states that float16x4_t and float16x8_t behave just
as uint16x4_t and uint16x8_t for argument passing.
This patch adds the fp16 vectors to the
ARMCallingConv.td file.
Reviewers: miyuki, ostannard
Reviewed By: ostannard
Subscribers: ostannard, javed.absar, kristof.beyls, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60720
llvm-svn: 359431
The 128/256 bit version of these instructions require an 'x' or 'y' suffix to
disambiguate the memory form in att syntax.
We were allowing the same suffix in intel syntax, but it appears gas does not
do that.
gas does allow the 'x' and 'y' suffix on register and broadcast forms even
though its not needed. We were allowing it on unmasked register form, but not on
masked versions or on masked or unmasked broadcast form.
While there fix some test coverage holes so they can be extended with the 'x'
and 'y' suffix tests.
llvm-svn: 359418
Some of the combines might be further improved if we lower more shuffles with X86ISD::VPERMV3 directly, instead of waiting to combine the results.
llvm-svn: 359400
An xor reduction of a bool vector can be optimized to a parity check of the MOVMSK/BITCAST'd integer - if the population count is odd return 1, else return 0.
Differential Revision: https://reviews.llvm.org/D61230
llvm-svn: 359396
Summary:
The register form of these instructions are CodeGenOnly instructions that cover
GR32->FR32 and GR64->FR64 bitcasts. There is a similar set of instructions for
the opposite bitcast. Due to the patterns using bitcasts these instructions get
marked as "bitcast" machine instructions as well. The peephole pass is able to
look through these as well as other copies to try to avoid register bank copies.
Because FR32/FR64/VR128 are all coalescable to each other we can end up in a
situation where a GR32->FR32->VR128->FR64->GR64 sequence can be reduced to
GR32->GR64 which the copyPhysReg code can't handle.
To prevent this, this patch removes one set of the 'bitcast' instructions. So
now we can only go GR32->VR128->FR32 or GR64->VR128->FR64. The instruction that
converts from GR32/GR64->VR128 has no special significance to the peephole pass
and won't be looked through.
I guess the other option would be to add support to copyPhysReg to just promote
the GR32->GR64 to a GR64->GR64 copy. The upper bits were basically undefined
anyway. But removing the CodeGenOnly instruction in favor of one that won't be
optimized seemed safer.
I deleted the peephole test because it couldn't be made to work with the bitcast
instructions removed.
The load version of the instructions were unnecessary as the pattern that selects
them contains a bitcasted load which should never happen.
Fixes PR41619.
Reviewers: RKSimon, spatel
Reviewed By: RKSimon
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61223
llvm-svn: 359392
As predicate masks are legal on AVX512 targets, we avoid MOVMSK in these cases, but we can just bitcast the bool vector to the integer equivalent directly - avoiding expansion of the reduction to a shuffle pattern.
llvm-svn: 359386
Fixes PR40332 in the limited case where we're selecting between a target shuffle and a zero vector.
We can extend this in the future to handle more opcodes and non-zero selections.
llvm-svn: 359378
Summary: If we have SSE2 we can use a MOVQ to store 64-bits and avoid falling back to a cmpxchg8b loop. If its a seq_cst store we need to insert an mfence after the store.
Reviewers: spatel, RKSimon, reames, jfb, efriedma
Reviewed By: RKSimon
Subscribers: hiraditya, dexonsmith, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60546
llvm-svn: 359368
This reverts commit 7a6ef3004655dd86d722199c471ae78c28e31bb4.
We discovered some internal test failures, so reverting for now.
Differential Revision: https://reviews.llvm.org/D61213
llvm-svn: 359363
getConstantVRegValWithLookThrough does the same thing as the
getConstantValueForReg function, and has more visibility across GISel. Plus, it
supports looking through G_TRUNC, G_SEXT, and G_ZEXT. So, we get better code
reuse and more functionality for free by using it.
Add some test cases to select-extract-vector-elt.mir to show that we can now
look through those instructions.
llvm-svn: 359351
Summary:
Targets like ARM, MSP430, PPC, and SystemZ have complex behavior when
printing the address of a MachineOperand::MO_GlobalAddress. Move that
handling into a new overriden method in each base class. A virtual
method was added to the base class for handling the generic case.
Refactors a few subclasses to support the target independent %a, %c, and
%n.
The patch also contains small cleanups for AVRAsmPrinter and
SystemZAsmPrinter.
It seems that NVPTXTargetLowering is possibly missing some logic to
transform GlobalAddressSDNodes for
TargetLowering::LowerAsmOperandForConstraint to handle with "i" extended
inline assembly asm constraints.
Fixes:
- https://bugs.llvm.org/show_bug.cgi?id=41402
- https://github.com/ClangBuiltLinux/linux/issues/449
Reviewers: echristo, void
Reviewed By: void
Subscribers: void, craig.topper, jholewinski, dschuff, jyknight, dylanmckay, sdardis, nemanjai, javed.absar, sbc100, jgravelle-google, eraman, kristof.beyls, hiraditya, aheejin, kbarton, fedor.sergeev, jrtc27, atanasyan, jsji, llvm-commits, kees, tpimh, nathanchance, peter.smith, srhines
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60887
llvm-svn: 359337
There are instructions for these, so mark them as legal. Select the correct
instruction in AArch64InstructionSelector.cpp.
Update select-bswap.mir and arm64-rev.ll to reflect the changes.
llvm-svn: 359331
The PPC vector cost model values for insert/extract element reflect older
processors that lacked vector insert/extract and move-to/move-from VSR
instructions. Update getVectorInstrCost to give appropriate values for when
the newer instructions are present.
Differential Revision: https://reviews.llvm.org/D60160
llvm-svn: 359313
Create a matchBitOpReduction helper that checks for the pattern with any opcode.
First step towards reusing this code to recognize other scalar reduction patterns.
llvm-svn: 359296
As detailed on PR40758, Bobcat/Jaguar can perform vector immediate shifts on the same pipes as vector ANDs with the same latency - so it doesn't make sense to replace a shl+lshr with a shift+and pair as it requires an additional mask (with the extra constant pool, loading and register pressure costs).
Differential Revision: https://reviews.llvm.org/D61068
llvm-svn: 359293
A small step towards combining shuffles across vector sizes - this recognizes when a shuffle's operands are all extracted from the same larger source and tries to combine to an unary shuffle of that source instead. Fixes one of the test cases from PR34380.
Differential Revision: https://reviews.llvm.org/D60512
llvm-svn: 359292
The code was using the alignment of a pointer to the value, not the
alignment of the constant itself.
Maybe we got away with it so far because the pointer alignment is
fairly high, but we did end up under-aligning <16 x i8> vectors,
which was caught in the Chromium build after lld stopped over-aligning
the .rodata.cst16 section in r356428. (See crbug.com/953815)
Differential revision: https://reviews.llvm.org/D61124
llvm-svn: 359287
These builtins provide access to the new integer and
sub-integer variants of MMA (matrix multiply-accumulate) instructions
provided by CUDA-10.x on sm_75 (AKA Turing) GPUs.
Also added a feature for PTX 6.4. While Clang/LLVM does not generate
any PTX instructions that need it, we still need to pass it through to
ptxas in order to be able to compile code that uses the new 'mma'
instruction as inline assembly (e.g used by NVIDIA's CUTLASS library
https://github.com/NVIDIA/cutlass/blob/master/cutlass/arch/mma.h#L101)
Differential Revision: https://reviews.llvm.org/D60279
llvm-svn: 359248
All of the new instructions are still handled mostly by tablegen. I've slightly
refactored the code to drive intrinsic/instruction generation from a master
list of supported variants, so all irregularities have to be implemented in one place only.
The test generation script wmma.py has been refactored in a similar way.
Differential Revision: https://reviews.llvm.org/D60015
llvm-svn: 359247
PTX 6.3 requires using ".aligned" in the MMA instruction names.
In order to generate correct name, now we pass current
PTX version to each instruction as an extra constant operand
and InstPrinter adjusts its output accordingly.
Differential Revision: https://reviews.llvm.org/D59393
llvm-svn: 359246
Generalized constructions of 'fragments' of MMA operations to provide
common primitives for construction of the ops. This will make it easier
to add new variants of the instructions that operate on integer types.
Use nested foreach loops which makes it possible to better control
naming of the intrinsics.
This patch does not affect LLVM's output, so there are no test changes.
Differential Revision: https://reviews.llvm.org/D59389
llvm-svn: 359245
I added a diagnostic along the lines of `-Wpessimizing-move` to detect `return x = y` suppressing copy elision, but I don't know if the diagnostic is really worth it. Anyway, here are the places where my diagnostic reported that copy elision would have been possible if not for the assignment.
P1155R1 in the post-San-Diego WG21 (C++ committee) mailing discusses whether WG21 should fix this pitfall by just changing the core language to permit copy elision in cases like these.
(Kona update: The bulk of P1155 is proceeding to CWG review, but specifically *not* the parts that explored the notion of permitting copy-elision in these specific cases.)
Reviewed By: dblaikie
Author: Arthur O'Dwyer
Differential Revision: https://reviews.llvm.org/D54885
llvm-svn: 359236
This case was missing before, so we couldn't legalize it.
Add it to AArch64LegalizerInfo.cpp and update select-extract-vector-elt.mir.
llvm-svn: 359231
This adds a legalization rule for G_ZEXT, G_ANYEXT, and G_SEXT which allows
extends whenever the types will fit in registers (or the source is an s1).
Update tests. Add GISel checks throughout all of arm64-vabs.ll,
where we now select a good portion of the code. Add GISel checks to
arm64-subvector-extend.ll, which has a good number of vector extends in it.
Differential Revision: https://reviews.llvm.org/D60889
llvm-svn: 359222
Add legalizer support for G_FNEARBYINT. It's the same as G_FCEIL etc.
Since the importer allows us to automatically select this after legalization,
also add tests for selection etc. Also update arm64-vfloatintrinsics.ll.
llvm-svn: 359204
Truncate the movmsk scalar integer result to the equivalent scalar integer width as before but then bitcast to the requested type.
We still have the issue identified in PR41594 but D61114 should handle this.
llvm-svn: 359176
On Mips32r2 bitcast can be expanded to two sw instructions and an ldc1
when using bitcast i64 to double or an sdc1 and two lw instructions when
using bitcast double to i64. By introducing custom lowering that uses
mtc1/mthc1 we can avoid excessive instructions.
Patch by Mirko Brkusanin.
Differential Revision: https://reviews.llvm.org/D61069
llvm-svn: 359171
The IndexReg will always be non-null at this point. Earlier in the function, if
IndexReg was null we set it to CurDAG->getRegister(0, VT) which made it
non-null.
llvm-svn: 359170
Summary:
This emits labels around heapallocsite calls and S_HEAPALLOCSITE debug
info in codeview. Currently only changes FastISel, so emitting labels still
needs to be implemented in SelectionDAG.
Reviewers: rnk
Subscribers: aprantl, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D61083
llvm-svn: 359149
Using initial-exec TLS variables is a reasonable performance
optimisation for system libraries. Use the correct PIC mechanism to get
hold of the GOT to avoid text relocations.
Differential Revision: https://reviews.llvm.org/D61026
llvm-svn: 359146
ReplaceAllUsesWith doesn't remove the node that was replaced. So its left around in the graph messing up use counts on other nodes.
One thing to note, is that this isn't valid if the node being deleted is the root node of an LEA match that gets rejected. In that case the node needs to stay alive because the isel table walking code would still have a reference to it that its going to try to match next. I don't think that's the case here though because the nodes being deleted here should be "and", "srl", and "zero_extend" none of which can be the root node of an LEA match.
Differential Revision: https://reviews.llvm.org/D61048
llvm-svn: 359121
If the types don't match, we can't just remove the shuffle.
There may be some other opportunity for optimization here,
but this should prevent the crashing seen in:
https://bugs.llvm.org/show_bug.cgi?id=41414
llvm-svn: 359095
Summary:
Both the input Value pointer and the returned Value
pointers in GetUnderlyingObjects are now declared as
const.
It turned out that all current (in-tree) uses of
GetUnderlyingObjects were trivial to update, being
satisfied with have those Value pointers declared
as const. Actually, in the past several of the users
had to use const_cast, just because of ValueTracking
not providing a version of GetUnderlyingObjects with
"const" Value pointers. With this patch we get rid
of those const casts.
Reviewers: hfinkel, materi, jkorous
Reviewed By: jkorous
Subscribers: dexonsmith, jkorous, jholewinski, sdardis, eraman, hiraditya, jrtc27, atanasyan, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61038
llvm-svn: 359072
Summary:
The MachineFunction should have been created with the correct subtarget. As
long as there is no way to change it, MipsTargetMachine can just capture it
directly from the MachineFunction without calling getSubtargetImpl again.
While there, const correct the Subtarget pointer to avoid a const_cast.
I believe the Mips16Subtarget and NoMips16Subtarget members are never used, but
I'll leave there removal for a separate patch.
Reviewers: echristo, atanasyan
Reviewed By: atanasyan
Subscribers: sdardis, arichardson, hiraditya, jrtc27, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60936
llvm-svn: 359071
Add selection support for G_INTRINSIC_ROUND, add a selection test, and add
check lines to arm64-vfloatintrinsics.ll and f16-instructions.ll.
llvm-svn: 359046
Add G_INTRINSIC_ROUND to isPreISelGenericFloatingPointOpcode to ensure that its
input and output are assigned the correct register bank.
Add a regbankselect test to verify that we get what we expect here.
llvm-svn: 359044
Summary:
Always convert switches to br_tables unless there is only one case,
which is equivalent to a simple branch. This reduces code size for wasm,
and we defer possible jump table optimizations to the VM.
Addresses PR41502.
Reviewers: kripken, sunfish
Subscribers: dschuff, sbc100, jgravelle-google, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60966
llvm-svn: 359038
Apparently FileCheck wasn't actually matching the fallback check lines in
arm64-vfloatintrinsics.ll properly. So, there were selection fallbacks for
G_INTRINSIC_TRUNC there.
Actually hook it up into AArch64InstructionSelector.cpp and write a proper
selection test.
I guess I'll figure out the FileCheck magic to make the fallback checks work
properly in arm64-vfloatintrinsics.ll.
llvm-svn: 359030
Add it to isPreISelGenericFloatingPointOpcode, and add a regbankselect test.
Update arm64-vfloatintrinsics.ll now that we can select it.
llvm-svn: 359022
Same patch as G_FCEIL etc.
Add the missing switch case in widenScalar, add G_INTRINSIC_TRUNC to the correct
rule in AArch64LegalizerInfo.cpp, and add a test.
llvm-svn: 359021
Same as G_FCEIL, G_FABS, etc. Just move it into that rule.
Add a legalizer test for G_FMA, which we didn't have before and update
arm64-vfloatintrinsics.ll.
llvm-svn: 359015
Circling back to a leftover bit from PR39859:
https://bugs.llvm.org/show_bug.cgi?id=39859#c1
...we have this counter-intuitive (based on the test diffs) opportunity to use 'psubus'.
This appears to be the better perf option for both Haswell and Jaguar based on llvm-mca.
We already do this transform for the SETULT predicate, so this makes the code more
symmetrical too. If we have pminub/pminuw, we prefer those, so this should not affect
anything but pre-SSE4.1 subtargets.
$ cat before.s
movdqa -16(%rip), %xmm2 ## xmm2 = [32768,32768,32768,32768,32768,32768,32768,32768]
pxor %xmm0, %xmm2
pcmpgtw -32(%rip), %xmm2 ## xmm2 = [255,255,255,255,255,255,255,255]
pand %xmm2, %xmm0
pandn %xmm1, %xmm2
por %xmm2, %xmm0
$ cat after.s
movdqa -16(%rip), %xmm2 ## xmm2 = [256,256,256,256,256,256,256,256]
psubusw %xmm0, %xmm2
pxor %xmm3, %xmm3
pcmpeqw %xmm2, %xmm3
pand %xmm3, %xmm0
pandn %xmm1, %xmm3
por %xmm3, %xmm0
$ llvm-mca before.s -mcpu=haswell
Iterations: 100
Instructions: 600
Total Cycles: 909
Total uOps: 700
Dispatch Width: 4
uOps Per Cycle: 0.77
IPC: 0.66
Block RThroughput: 1.8
$ llvm-mca after.s -mcpu=haswell
Iterations: 100
Instructions: 700
Total Cycles: 409
Total uOps: 700
Dispatch Width: 4
uOps Per Cycle: 1.71
IPC: 1.71
Block RThroughput: 1.8
Differential Revision: https://reviews.llvm.org/D60838
llvm-svn: 358999
64bit mode must use 64bit registers, otherwise assumptions about the top
half of the registers are made. Problem found by Takeshi Nakayama in
NetBSD.
llvm-svn: 358998
This patch adds support for parsing and assembling the %tls_ie_pcrel_hi
and %tls_gd_pcrel_hi modifiers.
Differential Revision: https://reviews.llvm.org/D55342
llvm-svn: 358994
Essentially complete a proper rebase of the V3 metadata change over
https://reviews.llvm.org/D49096.
Minimize the diff between the V2 and V3 variants of the relevant lit
tests, and clean up some trailing whitespace.
llvm-svn: 358992
The manual says that Thumb2 add/sub instructions are only allowed to modify sp
if the first source is also sp. This is slightly different from the usual rGPR
restriction since it's context-sensitive, so implement it in C++.
llvm-svn: 358987