As pointed out in post-commit review of r318738, `return ReplaceNode(..)` when
both ReplaceNode and the current function return void is confusing. This patch
moves to using a more obvious early return, and moves to just using an if to
catch the one case we currently care about. A future patch that adds further
custom instruction selection can introduce a switch.
llvm-svn: 318757
The obvious approach of defining a pattern like the one below actually doesn't
work:
`def : Pat<(i32 0), (i32 X0)>;`
As was noted when Lanai made this change (https://reviews.llvm.org/rL288215),
attempting to handle the constant 0 in tablegen leads to assertions due to a
physical register being used where a virtual register is expected.
llvm-svn: 318738
Previous patches primarily ensured that codegen was possible for the standard
RISC-V instructions. However, there are a number of IR inputs that wouldn't be
appropriately lowered. This patch both adds test cases and supports lowering
for a number of these cases:
* Improved sext/zext/trunc support
* Support for setcc variants that don't map directly to RISC-V instructions
* Lowering mul, and hence support for external symbols
* addc, adde, subc, sube
* mulhs, srem, mulhu, urem, udiv, sdiv
* {srl,sra,shl}_parts
* brind
* br_jt
* bswap, ctlz, cttz, ctpop
* rotl, rotr
* BlockAddress operands
Differential Revision: https://reviews.llvm.org/D29938
llvm-svn: 318737
Although ISD::SELECT_CC is a more natural match for RISCVISD::SELECT_CC (and
ultimately the integer RISC-V conditional branch instructions), we choose to
expand ISD::SELECT_CC and lower ISD::SELECT. The appropriate compare+branch
will be created in the case where an ISD::SELECT condition value is created by
an ISD::SETCC node, which operates on XLen types. Other datatypes such as
floating point don't have conditional branch instructions, and lowering
ISD::SELECT allows more flexibility for handling these cases.
Differential Revision: https://reviews.llvm.org/D29937
llvm-svn: 318735
We don't need separate 32 and 64 node types. We can use SDTCisInt and SDTCisSameSizeAs to ensure the mask size the result type and is integer.
llvm-svn: 318732
DAGTypeLegalizer::SplitInteger uses default pointer size as shift amount constant type,
which causes less performant ISA in amdgcn---amdgiz target since the default pointer
type is i64 whereas the desired shift amount type is i32.
This patch fixes that by using TLI.getScalarShiftAmountTy in DAGTypeLegalizer::SplitInteger.
The X86 change is necessary since splitting i512 requires shifting amount of 256, which
cannot be held by i8.
Differential Revision: https://reviews.llvm.org/D40148
llvm-svn: 318727
Initialize IsVis2 and IsVis3 in SparcSubtarget::initializeSubtargetDependencies.
MSan detected uninitialized read of IsVis3 after r318704. Initializing the
variables to false will prevent undefined behavior.
llvm-svn: 318724
Summary:
while investigating performance degradation of imagick benchmark
there were found inefficient pattern for UINT_TO_FP conversion.
That pattern causes RAW hazard in assembly code. Specifically,
uitofp IR operator results in poor assembler :
st %i0, [%fp - 952]
ldd [%fp - 952], %f0
it stores 32-bit integer register into memory location and then
loads 64-bit floating point data from that location.
That is exactly RAW hazard case. To optimize that case it is
possible to use SPISD::ITOF and SPISD::XTOF for conversion from
integer to floating point data type and to use ISD::BITCAST to
copy from integer register into floating point register.
The fix is to write custom UINT_TO_FP pattern using SPISD::ITOF,
SPISD::XTOF, ISD::BITCAST.
Patch by Alexey Lapshin
Reviewers: fedor.sergeev, jyknight, dcederman, lero_chris
Reviewed By: jyknight
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D36875
llvm-svn: 318704
Move the hazard scheduling pass to after the long branch pass, as the
long branch pass can create forbiddden slot hazards. Rather than complicating
the implementation of the long branch pass to handle forbidden slot hazards,
just reorder the passes.
llvm-svn: 318657
The VSX versions have the advantage of a full 64-register target whereas the FP
ones have the advantage of lower latency and higher throughput. So what we’re
after is using the faster instructions in low register pressure situations and
using the larger register file in high register pressure situations.
The heuristic chooses between the following 7 pairs of instructions.
PPC::LXSSPX vs PPC::LFSX
PPC::LXSDX vs PPC::LFDX
PPC::STXSSPX vs PPC::STFSX
PPC::STXSDX vs PPC::STFDX
PPC::LXSIWAX vs PPC::LFIWAX
PPC::LXSIWZX vs PPC::LFIWZX
PPC::STXSIWX vs PPC::STFIWX
Differential Revision: https://reviews.llvm.org/D38486
llvm-svn: 318651
Summary:
This patch fixes an issue so that the right alias is printed when the instruction has tied operands. It checks the number of operands in the resulting instruction as opposed to the alias, and then skips over tied operands that should not be printed in the alias.
This allows to generate the preferred assembly syntax for the AArch64 'ins' instruction, which should always be displayed as 'mov' according to the ARM Architecture Reference Manual. Several unit tests have changed as a result, but only to reflect the preferred disassembly. Some other InstAlias patterns (movk/bic/orr) needed a slight adjustment to stop them becoming the default and breaking other unit tests.
Please note that the patch is mostly the same as https://reviews.llvm.org/D29219 which was reverted because of an issue found when running TableGen with the Address Sanitizer. That issue has been addressed in this iteration of the patch.
Reviewers: rengolin, stoklund, huntergr, SjoerdMeijer, rovka
Reviewed By: rengolin, SjoerdMeijer
Subscribers: fhahn, aemerson, javed.absar, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D40030
llvm-svn: 318650
kernel verifier is becoming smarter and soon will support
direct and indirect function calls.
Remove obsolete error from BPF backend.
Make call to use PCRel_4 fixup.
'bpf to bpf' calls are distinguished from 'bpf to kernel' calls
by insn->src_reg == BPF_PSEUDO_CALL == 1 which is used as relocation
indicator similar to ld_imm64->src_reg == BPF_PSEUDO_MAP_FD == 1
The actual 'call' instruction remains the same for both
'bpf to kernel' and 'bpf to bpf' calls.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
llvm-svn: 318614
We don't need a dyn_cast, the predicate already specified the base node. We only need to check the type of the index, the base ptr is guaranteed to be scalar.
llvm-svn: 318596
This makes sure that functions that only clobber xmm registers
(on win64) also get the right cfi directives, if dwarf exceptions
are enabled.
Differential Revision: https://reviews.llvm.org/D40191
llvm-svn: 318591
We used to detect loads feeding fp instructions, but we were
failing to take into account cases where this happens through copies.
For instance, loads can fed copies coming from the ABI lowering
of floating point arguments/results.
llvm-svn: 318589
We used to detect that stores were fed by fp instructions, but we were
failing to take into account cases where this happens through copies.
For instance, stores can be fed by copies coming from the ABI lowering
of floating point arguments.
llvm-svn: 318588
This is mostly moving VMEM clause breaking into
the hazard recognizer. Also move another hazard
currently handled in the waitcnt pass.
Also stops breaking clauses unless xnack is enabled.
llvm-svn: 318557
Enabling and using dwarf exceptions seems like an easier path
to take, than to make the COFF/ARM backend output EHABI directives.
Previously, no EH model was enabled at all on this target.
There's no point in setting UseIntegratedAssembler to false since
GNU binutils doesn't support Windows on ARM, and since we don't
need to support external assembler, we don't need to use register
numbers in cfi directives.
Differential Revision: https://reviews.llvm.org/D39532
llvm-svn: 318510
All these headers already depend on CodeGen headers so moving them into
CodeGen fixes the layering (since CodeGen depends on Target, not the
other way around).
llvm-svn: 318490
't' constraint normally only accepts f32 operands, but for VCVT the
operands can be i32. LLVM is overly restrictive and rejects asm like:
float foo() {
float result;
__asm__ __volatile__(
"vcvt.f32.s32 %[result], %[arg1]\n"
: [result]"=t"(result)
: [arg1]"t"(0x01020304) );
return result;
}
Relax the value type for 't' constraint to either f32 or i32.
Differential Revision: https://reviews.llvm.org/D40137
llvm-svn: 318472
Only do this pre-legalize in case we're using the sign extend to legalize for KNL.
This recovers all of the tests that changed when I stopped SelectionDAGBuilder from deleting sign extends.
There's more work that could be done here particularly to fix the i8->i64 test case that experienced split.
llvm-svn: 318468
The wider element type will normally cause legalize to try to split and scalarize the gather/scatter, but we can't handle that. Instead, truncate the index early so the gather/scatter node is insulated from the legalization.
This really shouldn't happen in practice since InstCombine will normalize index types to the same size as pointers.
llvm-svn: 318452
This patch changes all i32 constant in store instruction to i64 with truncation, to increase the chance that the referenced constant can be shared with other i64 constant.
Differential Revision: https://reviews.llvm.org/D39352
llvm-svn: 318436
This patch contains more accurate cost of interelaved load\store of stride 2 for the types int64\double on AVX2.
Reviewers: delena, RKSimon, craig.topper, dorit
Reviewed By: dorit
Differential Revision: https://reviews.llvm.org/D40008
llvm-svn: 318385
The type legalizer will try to scalarize these operations if it sees them, but there is no handling for scalarizing them. This leads to a fatal error. With this change they will now be scalarized by the mem intrinsic scalarizing pass before SelectionDAG.
llvm-svn: 318380
Add hook in BPF backend so that llvm-objdump can print out
the jmp target with label names, e.g.,
...
if r1 != 2 goto 6 <LBB0_2>
...
goto 7 <LBB0_4>
...
LBB0_2:
...
LBB0_4:
...
Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
llvm-svn: 318358
Summary:
This patch adds a LLVM_ENABLE_GISEL_COV which, like LLVM_ENABLE_DAGISEL_COV,
causes TableGen to instrument the generated table to collect rule coverage
information. However, LLVM_ENABLE_GISEL_COV goes a bit further than
LLVM_ENABLE_DAGISEL_COV. The information is written to files
(${CMAKE_BINARY_DIR}/gisel-coverage-* by default). These files can then be
concatenated into ${LLVM_GISEL_COV_PREFIX}-all after which TableGen will
read this information and use it to emit warnings about untested rules.
This technique could also be used by SelectionDAG and can be further
extended to detect hot rules and give them priority over colder rules.
Usage:
* Enable LLVM_ENABLE_GISEL_COV in CMake
* Build the compiler and run some tests
* cat gisel-coverage-[0-9]* > gisel-coverage-all
* Delete lib/Target/*/*GenGlobalISel.inc*
* Build the compiler
Known issues:
* ${LLVM_GISEL_COV_PREFIX}-all must be generated as a manual
step due to a lack of a portable 'cat' command. It should be the
concatenation of all ${LLVM_GISEL_COV_PREFIX}-[0-9]* files.
* There's no mechanism to discard coverage information when the ruleset
changes
Depends on D39742
Reviewers: ab, qcolombet, t.p.northover, aditya_nandakumar, rovka
Reviewed By: rovka
Subscribers: vsk, arsenm, nhaehnle, mgorny, kristof.beyls, javed.absar, igorb, llvm-commits
Differential Revision: https://reviews.llvm.org/D39747
llvm-svn: 318356
Summary:
Make it possible to feed runtime information back to tablegen to enable
profile-guided tablegen-eration, detection of untested tablegen definitions, etc.
Being a cross-compiler by nature, LLVM will potentially collect data for multiple
architectures (e.g. when running 'ninja check'). We therefore need a way for
TableGen to figure out what data applies to the backend it is generating at the
time. This patch achieves that by including the name of the 'def X : Target ...'
for the backend in the TargetRegistry.
Reviewers: qcolombet
Reviewed By: qcolombet
Subscribers: jholewinski, arsenm, jyknight, aditya_nandakumar, sdardis, nemanjai, ab, nhaehnle, t.p.northover, javed.absar, qcolombet, llvm-commits, fedor.sergeev
Differential Revision: https://reviews.llvm.org/D39742
llvm-svn: 318352
Use VOP3 add/addc like usual.
This has some tradeoffs. Inline immediates fold
a little better, but other constants are worse off.
SIShrinkInstructions could be made smarter to handle
these cases.
This allows us to avoid selecting scalar adds where we
need to track the carry in scc and replace its users.
This makes it easier to use the carryless VALU adds.
llvm-svn: 318340
Move remaining inline matching of instructions of some optimizations into
separate functions, like in the other optimizations. Otherwise, NFC.
Differential revision: https://reviews.llvm.org/D40090
llvm-svn: 318335
Implements TargetLowering callback 'mayBeEmittedAsTailCall' that enables
CodeGenPrepare to duplicate returns when they might enable a tail-call.
Differential Revision: https://reviews.llvm.org/D39777
llvm-svn: 318321
Some CPUs are already overriding these sign extension instructions but we should be able to use the WriteALU schedule class by default.
Differential Revision: https://reviews.llvm.org/D39899
llvm-svn: 318308
Move the calling convention checks for tail-call eligibility for the 64-bit
SysV ABI into a separate function. This is so that it can be shared with
'mayBeEmittedAsTailCall' in a subsequent change.
llvm-svn: 318305
Summary:
Prevent an issue where a diagnostic is reported multiple times by bailing out with a ParseFail if an invalid SVE register element qualifier/suffix is specified, for example:
<stdin>:10:18: error: invalid sve vector kind qualifier
add z20.h, z2.h, z31.x
^
<stdin>:10:18: error: invalid sve vector kind qualifier
add z20.h, z2.h, z31.x
...
<stdin>:10:18: error: invalid sve vector kind qualifier
add z20.h, z2.h, z31.x
^
Reviewers: fhahn, rengolin
Reviewed By: rengolin
Subscribers: aemerson, javed.absar, tschuett, llvm-commits, kristof.beyls
Differential Revision: https://reviews.llvm.org/D39894
llvm-svn: 318297
APInt is now used instead of uint64_t in function genConstMult() allowing
multiplication optimizations with constants of arbitrary length.
Patch by Milos Stojanovic.
Differential Revision: https://reviews.llvm.org/D38130
llvm-svn: 318296
This is a refactoring/cleanup of Arm `addrmode2` operand class. The patch
removes it completely.
Differential Revision: https://reviews.llvm.org/D39832
llvm-svn: 318291
This allows us to remove extra extend creation during lowering and more accurately reflects the semantics of the instruction.
While there add an extra output VT to X86 masked gather node to better match the isel pattern predicate. Currently we're exploiting the fact that the isel table doesn't count how many output results a node actually has if the result type of any can be inferred from the first result and the type constraints defined in tablegen. I think we might ultimately want to lower all MGATHER/MSCATTER to an X86ISD node with the extra mask result and stop relying on this hole in the isel checking.
llvm-svn: 318278
This adjusts the tests to hopfully pacify the
llvm-clang-x86_64-expensive-checks-win buildbot.
Unlike many other instructions, these instructions have aliases which
take coprocessor registers, gpr register, accumulator (and dsp accumulator)
registers, floating point registers, floating point control registers and
coprocessor 2 data and control operands.
For the moment, these aliases are treated as pseudo instructions which are
expanded into the underlying instruction. As a result, disassembling these
instructions shows the underlying instruction and not the alias.
Reviewers: slthakur, atanasyan
Differential Revision: https://reviews.llvm.org/D35253
llvm-svn: 318207
In rare cases, common code will attempt to select an OR of two
constants. This confuses the logic in splitLargeImmediate,
causing an internal error during isel. Fixed by simply leaving
this case to common code to handle.
This fixes PR34859.
llvm-svn: 318187
Before using the 32-bit RISBMux set of instructions we need to
verify that the input bits are actually within range of the 32-bit
instruction. This fixer PR35289.
llvm-svn: 318177
Because the block-splitting code is multi-purpose, we have to meddle with the
branches when using it to fixup a conditional branch destination. We got the
code right, but forgot to update the CFG so the verifier complained when
expensive checks were on.
Probably harmless since constant-islands comes so late, but best to fix it
anyway.
llvm-svn: 318148
Get rid of the handwritten instruction selector code for handling
G_CONSTANT. This code wasn't checking all the preconditions correctly
anyway, so it's better to leave it to TableGen, which can handle at
least some cases correctly (e.g. MOVi, MOVi16, folding into binary
operations). Also add tests to cover those cases.
llvm-svn: 318146
When we emit a tail call for Armv8-M, but then discover that the caller needs to
save/restore `LR`, we convert the tail call to an ordinary one, since restoring
`LR` takes extra instructions, which may negate the benefits of the tail
call. If the callee, however, takes stack arguments, this conversion is
incorrect, since nothing has been done to pass the stack arguments.
Thus the patch reverts https://reviews.llvm.org/rL294000
Also, we improve the instruction sequence for popping `LR` in the case when we
couldn't immediately find a scratch low register, but we can use as a temporary
one of the callee-saved low registers and restore `LR` before popping other
callee-saves.
Differential Revision: https://reviews.llvm.org/D39599
llvm-svn: 318143
If the register from the copy from exec was spilled,
the copy before the spill was deleted leaving a spill
of undefined register verifier error and miscompiling.
Check for other use instructions of the copy register.
llvm-svn: 318132
Summary:
This fixes PR35221.
Use pseudo-instructions to let MachineCSE hoist global address computation.
Subscribers: aemerson, javed.absar, kristof.beyls, llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D39871
llvm-svn: 318081
If the base of our gather corresponds to something contained in X86ISD::Wrapper we should be able to fold it into the address.
This patch refactors some of the address matching to more fully use the X86ISelAddressMode struct and the getAddressOperands helper. A new helper function matchVectorAddress is added to call matchWrapper or fall back to matchAddressBase.
We should also be able to support constant offsets from a wrapper, but I'll look into that in a future patch. We may even be able to completely reuse matchAddress here, but I wanted to start simple and work up to it.
Differential Revision: https://reviews.llvm.org/D39927
llvm-svn: 318057
Remove builtins from llvm and add AutoUpgrade support.
Also add fast-isel tests for the TEST and TESTN instructions.
Differential Revision: https://reviews.llvm.org/D38736
llvm-svn: 318036
When generating table jump code for switch statements, place the jump
table label as the first operand in the various addition instructions
in order to enable addressing mode selectors to better match index
computation and possibly fold them into the addressing mode of the
table entry load instruction.
Differential revision: https://reviews.llvm.org/D39752
llvm-svn: 318033
This patch, together with a matching clang patch (https://reviews.llvm.org/D38672), implements the lowering of X86 shuffle i/f intrinsics to IR.
Differential Revision: https://reviews.llvm.org/D38671
Change-Id: I1e7d359a74743e995ec356237a85214ce55d3661
llvm-svn: 318026
Updated the scheduling information of the SKX subtarget in the file X86SchedSkylakeServer.td under lib/Target/X86 to:
1. add regular opcodes in addition to the suffixed "_Int" opcodes
2. add the (V)MAXCPD/MAXCPS/MAXCSD/MAXCSS/MINCPD/MINCPS/MINCSD/MINCSS
instructions that are equivalent to their counterparts without the 'C' as they are part of a hack to
make floating point min/max commutable under fast math.
Reviewers: zvi, RKSimon, craig.topper
Differential Revision: https://reviews.llvm.org/D39833
Change-Id: Ie13702a5ce1b1a08af91ca637a52b6962881e7d6
llvm-svn: 318024
This was using a custom function that didn't handle the
addressing modes properly for private. Use
isLegalAddressingMode to avoid duplicating this.
Additionally, skip the combine if there is only one use
since the standard combine will handle it.
llvm-svn: 318013
The VRNDSCALE instructions implement a superset of the (V)ROUND instructions. They are equivalent if the upper 4-bits of the immediate are 0.
This patch lowers the legacy intrinsics to the VRNDSCALE ISD node and masks the upper bits of the immediate to 0. This allows us to take advantage of the larger register encoding space.
We should maybe consider converting VRNDSCALE back to VROUND in the EVEX to VEX pass if the extended registers are not being used.
I notice some load folding opportunities being missed for the VRNDSCALESS/SD instructions that I'll try to fix in future patches.
llvm-svn: 318008
I want to reuse the VRNDSCALE node for the legacy SSE rounding intrinsics so that those intrinsics can use EVEX instructions. All of these nodes share tablegen multiclasses so I split them all so that they all remain similar in their implementations.
llvm-svn: 318007
matchBinOpReduction currently matches against a single opcode, but we already have a case where we repeat calls to try to match against AND/OR and I'll be shortly adding another case for SMAX/SMIN/UMAX/UMIN (D39729).
This NFCI patch alters matchBinOpReduction to try and pattern match against any of the provided list of candidate bin ops at once to save time.
Differential Revision: https://reviews.llvm.org/D39726
llvm-svn: 317985
This is consistent with out normal implementation of scalar instructions.
While there disable load folding for the patterns with IMPLICIT_DEF unless optimizing for size which is also our standard practice.
llvm-svn: 317977
* The method getRegAllocationHints() is now of bool type instead of void. If
true is returned, regalloc (AllocationOrder) will *only* try to allocate the
hints, as opposed to merely trying them before non-hinted registers.
* TargetRegisterInfo::getRegAllocationHints() is implemented for SystemZ with
an increase in number of LOCRs.
In this case, it is desired to force the hints even though there is a slight
increase in spilling, because if a non-hinted register would be allocated,
the LOCRMux pseudo would have to be expanded with a jump sequence. The LOCR
(Load On Condition) SystemZ instruction must have both operands in either the
low or high part of the 64 bit register.
Reviewers: Quentin Colombet and Ulrich Weigand
https://reviews.llvm.org/D36795
llvm-svn: 317879
r600 uses dummy pointer info for lowering load/store. Since dummy pointer info
assumes address space 0, this causes isel failure when temporary load/store SDNodes
are generated for amdgiz environment.
Since the offest is not constant, FixedStack pseudo source value cannot be used
to create the pointer info. This patch creates pointer info using llvm undef value.
At least this provides correct address space so that isel can be done correctly.
Differential Revision: https://reviews.llvm.org/D39698
llvm-svn: 317862
The pointer info for pseudo source for r600 is not correct when
alloca addr space is not 0, which causes invalid SDNode for r600---amdgiz.
This patch fixes that.
Differential Revision: https://reviews.llvm.org/D39670
llvm-svn: 317861
We don't really need any special handling of "offsettable"
memory addresses, but since some existing code uses inline
asm statements with the "o" constraint, add support for this
constraint for compatibility purposes.
llvm-svn: 317807
Correct the definition of 'j' as being unavailable for microMIPS32R6 and
provide the 'b' assembly idiom for codegen purposes for microMIPS32r3.
Provide the necessary 'br' pattern for microMIPS32R6 as it now longer
incorrectly uses the 'j' instruction.
Reviewers: atanasyan
Differential Revision: https://reviews.llvm.org/D39741
llvm-svn: 317801
No existing processor has both so it doesn't really matter what we do here. But we were previously just relying on pattern order which gave FMA4 priority.
llvm-svn: 317775
r317453 added new ISD nodes without rounding modes that were added to an existing if/else chain. But all the previous nodes handled there included a rounding mode. The final code after this if/else chain expected an extra operand that isn't present for the new nodes.
llvm-svn: 317748
This reverts r317579, originally committed as r317100.
There is a design issue with marking CFI instructions duplicatable. Not
all targets support the CFIInstrInserter pass, and targets like Darwin
can't cope with duplicated prologue setup CFI instructions. The compact
unwind info emission fails.
When the following code is compiled for arm64 on Mac at -O3, the CFI
instructions end up getting tail duplicated, which causes compact unwind
info emission to fail:
int a, c, d, e, f, g, h, i, j, k, l, m;
void n(int o, int *b) {
if (g)
f = 0;
for (; f < o; f++) {
m = a;
if (l > j * k > i)
j = i = k = d;
h = b[c] - e;
}
}
We get assembly that looks like this:
; BB#1: ; %if.then
Lloh3:
adrp x9, _f@GOTPAGE
Lloh4:
ldr x9, [x9, _f@GOTPAGEOFF]
mov w8, wzr
Lloh5:
str wzr, [x9]
stp x20, x19, [sp, #-16]! ; 8-byte Folded Spill
.cfi_def_cfa_offset 16
.cfi_offset w19, -8
.cfi_offset w20, -16
cmp w8, w0
b.lt LBB0_3
b LBB0_7
LBB0_2: ; %entry.if.end_crit_edge
Lloh6:
adrp x8, _f@GOTPAGE
Lloh7:
ldr x8, [x8, _f@GOTPAGEOFF]
Lloh8:
ldr w8, [x8]
stp x20, x19, [sp, #-16]! ; 8-byte Folded Spill
.cfi_def_cfa_offset 16
.cfi_offset w19, -8
.cfi_offset w20, -16
cmp w8, w0
b.ge LBB0_7
LBB0_3: ; %for.body.lr.ph
Note the multiple .cfi_def* directives. Compact unwind info emission
can't handle that.
llvm-svn: 317726
Previously, hasSideEffects was ? for TargetOpcode::PHI and would be inferred
as 1. D37065 sets the previously inferred properties explicitly. This patch sets
hasSideEffects=0 for PHI, as it is for G_PHI. MachineInstr::isSafeToMove has
been updated so it still returns false for PHI.
Additionally, HexagonBitSimplify relied on a PHI node having the
hasUnmodeledSideEffects property. This patch fixes that assumption.
Differential Revision: https://reviews.llvm.org/D37097
llvm-svn: 317721
We were calling tryFoldLoad with the 'and' node was the root and parent node of the load. But the parent of the load should be the shift that proceeds the and. While the and node is correctly the root node.
To fix this I had to make tryFoldLoad take a separate use and root input. I've added a convenience version with the old signature to avoid updating the other call sites.
llvm-svn: 317720
Without this we can't parse gather instructions in ms inline asm blocks. The validateInstruction function was introduced in r316700 to check gather constraints.
llvm-svn: 317713
Previously, an "r" constraint would mean the compiler provides a value
on WebAssembly's operand stack. This was tricky to use properly,
particularly since it isn't possible to declare a new local from within
an inline asm string.
With this patch, "r" provides the value in a WebAssembly local, and the
local index is provided to the inline asm string. This requires inline
asm to use get_local and set_local to read the register. This does
potentially result in larger code size, however inline asm should
hopefully be quite rare in WebAssembly.
This also means that the "m" constraint can no longer be supported, as
WebAssembly has nothing like a "memory operand" that includes an
implicit get_local.
This fixes PR34599 for the wasm32-unknown-unknown-wasm target (though
not for the ELF target).
llvm-svn: 317707
Note that this is just enough for simple function call examples to generate
working code. Support for varargs etc follows in future patches.
Differential Revision: https://reviews.llvm.org/D29936
llvm-svn: 317691
A good portion of this patch is the extra functions that needed to be
implemented to support the test case. e.g. storeRegToStackSlot,
loadRegFromStackSlot, eliminateFrameIndex.
Setting ISD::BR_CC to Expand may appear non-obvious on an architecture with
branch+cmp instructions. However, I found it much easier to deal with matching
the expanded form.
I had to change simm13_lsb0 and simm21_lsb0 to inherit from the
Operand<OtherVT> class rather than Operand<i32> in order to keep tablegen
happy. This isn't a big deal, but it does seem a shame to lose the uniformity
across immediate types when there's not an obvious benefit (I'm hoping a
tablegen expert will educate me on what I'm missing here!).
Differential Revision: https://reviews.llvm.org/D29935
llvm-svn: 317690
This required the implementation of RISCVTargetInstrInfo::copyPhysReg. Support
for lowering global addresses follow in the next patch.
Differential Revision: https://reviews.llvm.org/D29934
llvm-svn: 317685
Previously these pseudo instructions were not guarded by ISA, so their
select was dependant on the ordering of the entries in the DAG matcher.
Reviewers: atanasyan
Differential Revision: https://reviews.llvm.org/D39723
llvm-svn: 317681
rL162640 introduced CodeGenTarget::guessInstructionProperties. If a target
sets guessInstructionProperties=0 in its FooInstrInfo, tablegen will error if
it has to guess properties from patterns. Unfortunately,
guessInstructionProperties=0 can't be used with current upstream LLVM as
instructions in the TargetOpcode namespace are always included and sometimes
have inferred properties for mayLoad, mayStore, and hasSideEffects. This patch
provides the simplest possible fix to this problem, setting default values for
these fields in the TargetOpcode scope. There is no intended functional
change, as the explicitly set properties should match what was previously
inferred. A number of the instructions had hasSideEffects=1 inferred
unintentionally. This patch makes it explicit, while future patches (such as
D37097) correct the property.
Differential Revision: https://reviews.llvm.org/D37065
llvm-svn: 317674
This header includes CodeGen headers, and is not, itself, included by
any Target headers, so move it into CodeGen to match the layering of its
implementation.
llvm-svn: 317647
Summary:
This just seems to have been an oversight. We already supported the f64
atomic add with an explicit scope (e.g. "cta"), but not the scopeless
version.
Reviewers: tra
Subscribers: jholewinski, sanjoy, cfe-commits, llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D39638
llvm-svn: 317623
An "or" that sets the sign-bit can be replaced with a "xor", if
the sign-bit was known to be clear before. With some changes to
instruction combining, the simple sign-bit check was failing.
Replace it with a more flexible one to catch more cases.
llvm-svn: 317592
Patch [5/5] in a series to add assembler/disassembler support for AArch64 SVE unpredicated ADD/SUB instructions.
Patch by Sander De Smalen.
Reviewed by: rengolin
Differential Revision: https://reviews.llvm.org/D39091
llvm-svn: 317591
Patch [3/5] in a series to add assembler/disassembler support for AArch64 SVE unpredicated ADD/SUB instructions.
To summarise, this patch adds:
* SVE register definitions
* Methods to parse SVE register operands
* Methods to print SVE register operands
* RegKind SVEDataVector to distinguish it from other data types like scalar register or Neon vector.
* k_SVEDataRegister and SVEDataRegOp to describe SVE registers (which will be extended by further patches with e.g. ElementWidth and the shift-extend type).
Patch by Sander De Smalen.
Reviewed by: rengolin
Differential Revision: https://reviews.llvm.org/D39089
llvm-svn: 317590
Patch [4/5] in a series to add assembler/disassembler support for AArch64 SVE unpredicated ADD/SUB instructions.
We add SVE as unsupported feature for CPUs that don't have SVE to prevent errors from scheduler models saying it lacks information for these instructions.
Patch by Sander De Smalen.
Reviewed by: rengolin
Differential Revision: https://reviews.llvm.org/D39090
llvm-svn: 317582
Reland r317100 with minor fix regarding ComputeCommonTailLength function in
BranchFolding.cpp. Skipping top CFI instructions block needs to executed on
several more return points in ComputeCommonTailLength().
Original r317100 message:
"Correct dwarf unwind information in function epilogue for X86"
This patch aims to provide correct dwarf unwind information in function
epilogue for X86.
It consists of two parts. The first part inserts CFI instructions that set
appropriate cfa offset and cfa register in emitEpilogue() in
X86FrameLowering. This part is X86 specific.
The second part is platform independent and ensures that:
- CFI instructions do not affect code generation
- Unwind information remains correct when a function is modified by
different passes. This is done in a late pass by analyzing information
about cfa offset and cfa register in BBs and inserting additional CFI
directives where necessary.
Changed CFI instructions so that they:
- are duplicable
- are not counted as instructions when tail duplicating or tail merging
- can be compared as equal
Added CFIInstrInserter pass:
- analyzes each basic block to determine cfa offset and register valid at
its entry and exit
- verifies that outgoing cfa offset and register of predecessor blocks match
incoming values of their successors
- inserts additional CFI directives at basic block beginning to correct the
rule for calculating CFA
Having CFI instructions in function epilogue can cause incorrect CFA
calculation rule for some basic blocks. This can happen if, due to basic
block reordering, or the existence of multiple epilogue blocks, some of the
blocks have wrong cfa offset and register values set by the epilogue block
above them.
CFIInstrInserter is currently run only on X86, but can be used by any target
that implements support for adding CFI instructions in epilogue.
Patch by Violeta Vukobrat.
llvm-svn: 317579
Summary:
The cost calculation for default case on X86 target does not always
follow correct wayt because of missing 4-th argument in
`BaseT::getCastInstrCost()` call. Added this missing parameter.
Reviewers: hfinkel, mkuper, RKSimon, spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D39687
llvm-svn: 317576
Patch [2/5] in a series to add assembler/disassembler support for AArch64 SVE unpredicated ADD/SUB instructions.
This change is a non functional change that adds RegKind as an alternative to 'isVector' to prepare it for newer types (SVE data vectors and predicate vectors) that will be added in next patches (where the SVE data vector is added as part of this patch set)
Patch by Sander De Smalen.
Reviewed by: rengolin
Differential Revision: https://reviews.llvm.org/D39088
llvm-svn: 317569
This changes the interface of how targets describe how to legalize, see
the below description.
1. Interface for targets to describe how to legalize.
In GlobalISel, the API in the LegalizerInfo class is the main interface
for targets to specify which types are legal for which operations, and
what to do to turn illegal type/operation combinations into legal ones.
For each operation the type sizes that can be legalized without having
to change the size of the type are specified with a call to setAction.
This isn't different to how GlobalISel worked before. For example, for a
target that supports 32 and 64 bit adds natively:
for (auto Ty : {s32, s64})
setAction({G_ADD, 0, s32}, Legal);
or for a target that needs a library call for a 32 bit division:
setAction({G_SDIV, s32}, Libcall);
The main conceptual change to the LegalizerInfo API, is in specifying
how to legalize the type sizes for which a change of size is needed. For
example, in the above example, how to specify how all types from i1 to
i8388607 (apart from s32 and s64 which are legal) need to be legalized
and expressed in terms of operations on the available legal sizes
(again, i32 and i64 in this case). Before, the implementation only
allowed specifying power-of-2-sized types (e.g. setAction({G_ADD, 0,
s128}, NarrowScalar). A worse limitation was that if you'd wanted to
specify how to legalize all the sized types as allowed by the LLVM-IR
LangRef, i1 to i8388607, you'd have to call setAction 8388607-3 times
and probably would need a lot of memory to store all of these
specifications.
Instead, the legalization actions that need to change the size of the
type are specified now using a "SizeChangeStrategy". For example:
setLegalizeScalarToDifferentSizeStrategy(
G_ADD, 0, widenToLargerAndNarrowToLargest);
This example indicates that for type sizes for which there is a larger
size that can be legalized towards, do it by Widening the size.
For example, G_ADD on s17 will be legalized by first doing WidenScalar
to make it s32, after which it's legal.
The "NarrowToLargest" indicates what to do if there is no larger size
that can be legalized towards. E.g. G_ADD on s92 will be legalized by
doing NarrowScalar to s64.
Another example, taken from the ARM backend is:
for (unsigned Op : {G_SDIV, G_UDIV}) {
setLegalizeScalarToDifferentSizeStrategy(Op, 0,
widenToLargerTypesUnsupportedOtherwise);
if (ST.hasDivideInARMMode())
setAction({Op, s32}, Legal);
else
setAction({Op, s32}, Libcall);
}
For this example, G_SDIV on s8, on a target without a divide
instruction, would be legalized by first doing action (WidenScalar,
s32), followed by (Libcall, s32).
The same principle is also followed for when the number of vector lanes
on vector data types need to be changed, e.g.:
setAction({G_ADD, LLT::vector(8, 8)}, LegalizerInfo::Legal);
setAction({G_ADD, LLT::vector(16, 8)}, LegalizerInfo::Legal);
setAction({G_ADD, LLT::vector(4, 16)}, LegalizerInfo::Legal);
setAction({G_ADD, LLT::vector(8, 16)}, LegalizerInfo::Legal);
setAction({G_ADD, LLT::vector(2, 32)}, LegalizerInfo::Legal);
setAction({G_ADD, LLT::vector(4, 32)}, LegalizerInfo::Legal);
setLegalizeVectorElementToDifferentSizeStrategy(
G_ADD, 0, widenToLargerTypesUnsupportedOtherwise);
As currently implemented here, vector types are legalized by first
making the vector element size legal, followed by then making the number
of lanes legal. The strategy to follow in the first step is set by a
call to setLegalizeVectorElementToDifferentSizeStrategy, see example
above. The strategy followed in the second step
"moreToWiderTypesAndLessToWidest" (see code for its definition),
indicating that vectors are widened to more elements so they map to
natively supported vector widths, or when there isn't a legal wider
vector, split the vector to map it to the widest vector supported.
Therefore, for the above specification, some example legalizations are:
* getAction({G_ADD, LLT::vector(3, 3)})
returns {WidenScalar, LLT::vector(3, 8)}
* getAction({G_ADD, LLT::vector(3, 8)})
then returns {MoreElements, LLT::vector(8, 8)}
* getAction({G_ADD, LLT::vector(20, 8)})
returns {FewerElements, LLT::vector(16, 8)}
2. Key implementation aspects.
How to legalize a specific (operation, type index, size) tuple is
represented by mapping intervals of integers representing a range of
size types to an action to take, e.g.:
setScalarAction({G_ADD, LLT:scalar(1)},
{{1, WidenScalar}, // bit sizes [ 1, 31[
{32, Legal}, // bit sizes [32, 33[
{33, WidenScalar}, // bit sizes [33, 64[
{64, Legal}, // bit sizes [64, 65[
{65, NarrowScalar} // bit sizes [65, +inf[
});
Please note that most of the code to do the actual lowering of
non-power-of-2 sized types is currently missing, this is just trying to
make it possible for targets to specify what is legal, and how non-legal
types should be legalized. Probably quite a bit of further work is
needed in the actual legalizing and the other passes in GlobalISel to
support non-power-of-2 sized types.
I hope the documentation in LegalizerInfo.h and the examples provided in the
various {Target}LegalizerInfo.cpp and LegalizerInfoTest.cpp explains well
enough how this is meant to be used.
This drops the need for LLT::{half,double}...Size().
Differential Revision: https://reviews.llvm.org/D30529
llvm-svn: 317560
Summary:
Calls using invoke in funclet based functions are assumed to clobber
all registers, which causes the stack adjustment using pops to consider
all registers not defined by the call to be undefined, which can
unfortunately include the base pointer, if one is needed.
To prevent this (and possibly other hazards), skip reserved registers
when looking for candidate registers.
This fixes issue #45034 in the Rust compiler.
Reviewers: mkuper
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D39636
llvm-svn: 317551
This combine was already done in two places. The
generic combiner already has done this since
r217610, for adds (with a single use).
This one was added in r303641, and added support for handling
or as well. r313251 later added support to the generic
combine for or. It also turns out the isOrEquivalentToAdd
check is not necessary for this combine.
Additionally, we already reproduce this combine in yet
another place in the backend, although in that version
multiple uses of the add are still folded if it will
allow a fold into the addressing mode. That version needs
to be improved to understand ors though, as well as the
correct legal offsets for private.
llvm-svn: 317526
The EVEX to VEX pass is already assuming this is true under AVX512VL. We had special patterns to use zmm instructions if VLX and F16C weren't available.
Instead just make AVX512 imply F16C to make the EVEX to VEX behavior explicitly legal and remove the extra patterns.
All known CPUs with AVX512 have F16C so this should safe for now.
llvm-svn: 317521
Previously our VEX patterns were checking Subtarget.hasFMA() which checked FMA || AVX512. So we were behaving as if AVX512 implied it anyway. Which means we'd allow VEX encoded 128/256 FMA when AVX512F was enabled but AVX512VL is off. Regardless of the FMA flag.
EVEX to VEX also transforms scalar EVEX FMA instructions to their VEX versions even without the FMA flag. Similarly for 128/256 under AVX512VL.
So this makes AVX512 imply FeatureFMA to make our current behavior explicit.
All known CPUs that support AVX512 have VEX FMA instructions.
llvm-svn: 317520
As discussed on llvm-dev:
http://lists.llvm.org/pipermail/llvm-dev/2016-November/107104.html
and again more recently:
http://lists.llvm.org/pipermail/llvm-dev/2017-October/118118.html
...this is a step in cleaning up our fast-math-flags implementation in IR to better match
the capabilities of both clang's user-visible flags and the backend's flags for SDNode.
As proposed in the above threads, we're replacing the 'UnsafeAlgebra' bit (which had the
'umbrella' meaning that all flags are set) with a new bit that only applies to algebraic
reassociation - 'AllowReassoc'.
We're also adding a bit to allow approximations for library functions called 'ApproxFunc'
(this was initially proposed as 'libm' or similar).
...and we're out of bits. 7 bits ought to be enough for anyone, right? :) FWIW, I did
look at getting this out of SubclassOptionalData via SubclassData (spacious 16-bits),
but that's apparently already used for other purposes. Also, I don't think we can just
add a field to FPMathOperator because Operator is not intended to be instantiated.
We'll defer movement of FMF to another day.
We keep the 'fast' keyword. I thought about removing that, but seeing IR like this:
%f.fast = fadd reassoc nnan ninf nsz arcp contract afn float %op1, %op2
...made me think we want to keep the shortcut synonym.
Finally, this change is binary incompatible with existing IR as seen in the
compatibility tests. This statement:
"Newer releases can ignore features from older releases, but they cannot miscompile
them. For example, if nsw is ever replaced with something else, dropping it would be
a valid way to upgrade the IR."
( http://llvm.org/docs/DeveloperPolicy.html#ir-backwards-compatibility )
...provides the flexibility we want to make this change without requiring a new IR
version. Ie, we're not loosening the FP strictness of existing IR. At worst, we will
fail to optimize some previously 'fast' code because it's no longer recognized as
'fast'. This should get fixed as we audit/squash all of the uses of 'isFast()'.
Note: an inter-dependent clang commit to use the new API name should closely follow
commit.
Differential Revision: https://reviews.llvm.org/D39304
llvm-svn: 317488
We still early-out for X86ISD::PEXTRW/X86ISD::PEXTRB so no actual change in behaviour, but it'll make it easier to add support in a future patch.
llvm-svn: 317485
combineExtractWithShuffle can handle more complex shuffles/bitcasts than we can with the equivalent code in XFormVExtractWithShuffleIntoLoad.
Mainly a compile time improvement now (combineExtractWithShuffle combines will have always failed late on inside XFormVExtractWithShuffleIntoLoad), and will let us merge combineExtractVectorElt_SSE in a future commit.
llvm-svn: 317481
SystemZ can do division and remainder in a single instruction for scalar
integer types, which are now reflected by returning true in this hook for
those cases.
Review: Ulrich Weigand
llvm-svn: 317477
The backend assumes pointer in default addr space is 32 bit, which is not
true for the new addr space mapping and causes assertion for unresolved
functions.
This patch fixes that.
Differential Revision: https://reviews.llvm.org/D39643
llvm-svn: 317476
Previously, the 'movep' instruction was defined for microMIPS32r3 and
shared that definition with microMIPS32R6. 'movep' was re-encoded for
microMIPS32r6, so this patch provides the correct encoding.
Secondly, correct the encoding of the 'rs' and 'rt' operands which have
an instruction specific encoding for the registers those operands accept.
Finally, correct the decoding of the 'dst_regs' operand which was extracting
the relevant field from the instruction, but was actually extracting the
field from the alreadly extracted field.
Reviewers: atanasyan
Differential Revision: https://reviews.llvm.org/D39495
llvm-svn: 317475
Recommit:
This patch contains update of the costs of interleaved loads of v8f32 of stride 3 and 8.
fixed the location of the lit test it works with make check-all.
Differential Revision: https://reviews.llvm.org/D39403
llvm-svn: 317471
Mark all symbols involved with TLS relocations as being TLS symbols.
This resolves PR35140.
Thanks to Alex Crichton for reporting the issue!
Reviewers: atanasyan
Differential Revision: https://reviews.llvm.org/D39591
llvm-svn: 317470
Added TESTM and TESTNM to the list of instructions that already zeroing unused upper bits
and does not need the redundant shift left and shift right instructions afterwards.
Added a pattern for TESTM and TESTNM in iselLowering, so now icmp(neq,and(X,Y), 0) goes folds into TESTM
and icmp(eq,and(X,Y), 0) goes folds into TESTNM
This commit is a preparation for lowering the test and testn X86 intrinsics to IR.
Differential Revision: https://reviews.llvm.org/D38732
llvm-svn: 317465
Summary:
Try to lower a BUILD_VECTOR composed of extract-extract chains that can be
reasoned to be a permutation of a vector by indices in a non-constant vector.
We saw this pattern created by ISPC, which resolts to creating it due to the
requirement that shufflevector's mask operand be a *constant* vector.
I didn't check this but we could possibly use this pattern for lowering the X86 permute
C-instrinsics instead of llvm.x86 instrinsics.
This change can be followed by more improvements:
1. Handle vectors with undef elements.
2. Utilize pshufb and zero-mask-blending to support more effiecient
construction of vectors with constant-0 elements.
3. Use smaller-element vectors of same width, and "interpolate" the indices,
when no native operation available.
Reviewers: RKSimon, craig.topper
Reviewed By: RKSimon
Subscribers: chandlerc, DavidKreitzer
Differential Revision: https://reviews.llvm.org/D39126
llvm-svn: 317463
This patch, together with a matching clang patch (https://reviews.llvm.org/D38683), implements the lowering of X86 broadcastm intrinsics to IR.
Differential Revision: https://reviews.llvm.org/D38684
Change-Id: I709ac0b34641095397e994c8ff7e15d1315b3540
llvm-svn: 317458
Next step is to use them for the legacy FMA scalar intrinsics as well. This will enable the legacy intrinsics to use EVEX encoded opcodes and the extended registers.
llvm-svn: 317453
reverted my changes will be committed later after fixing the failure
This patch contains update of the costs of interleaved loads of v8f32 of stride 3 and 8.
Differential Revision: https://reviews.llvm.org/D39403
llvm-svn: 317433
This patch contains update of the costs of interleaved loads of v8f32 of stride 3 and 8.
Differential Revision: https://reviews.llvm.org/D39403
llvm-svn: 317432
Summary:
AVX512 added RCP14 and RSQRT instructions which improve accuracy over the legacy RCP and RSQRT instruction, but not enough accuracy to remove the need for a Newton Raphson refinement.
Currently we use these new instructions for the legacy packed SSE instrinics, but not the scalar instrinsics. And we use it for fast math optimization of division and reciprocal sqrt.
I think switching the legacy instrinsics maybe surprising to the user since it changes the answer based on which processor you're using regardless of any fastmath settings. It's also weird that we did something different between scalar and packed.
As far at the reciprocal estimation, I think it creates unnecessary deltas in our output behavior (and prevents EVEX->VEX). A little playing around with gcc and icc and godbolt suggest they don't change which instructions they use here.
This patch adds new X86ISD nodes for the RCP14/RSQRT14 and uses those for the new intrinsics. Leaving the old intrinsics to use the old instructions.
Going forward I think our focus should be on
-Supporting 512-bit vectors, which will have to use the RCP14/RSQRT14.
-Using RSQRT28/RCP28 to remove the Newton Raphson step on processors with AVX512ER
-Supporting double precision.
Reviewers: zvi, DavidKreitzer, RKSimon
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D39583
llvm-svn: 317413
AMDGPULibFunc hardcodes address space values of the old address space mapping,
which causes invalid addrspacecast instructions and undefined functions in
APPSDK sample MonteCarloAsianDP.
This patch fixes that.
Differential Revision: https://reviews.llvm.org/D39616
llvm-svn: 317409
This allows masked operations to be used and allows the register allocator to use YMM16-31 if necessary.
As a follow up I'll look into teaching EVEX->VEX how to turn this back into PERM2X128 if any of the additional features don't work out.
llvm-svn: 317403
This header already includes a CodeGen header and is implemented in
lib/CodeGen, so move the header there to match.
This fixes a link error with modular codegeneration builds - where a
header and its implementation are circularly dependent and so need to be
in the same library, not split between two like this.
llvm-svn: 317379
The number of iterations was incorrectly determined for DP FP vector types
and the tests were insufficient to flag this issue.
Differential revision: https://reviews.llvm.org/D39507
llvm-svn: 317349
Change the ISel matching of 'ins', 'dins[mu]' from tablegen code to
C++ code. This resolves an issue where ISel would select 'dins' instead
of 'dinsm' when the instructions size and position were individually in
range but their sum was out of range according to the ISA specification.
Reviewers: atanasyan
Differential Revision: https://reviews.llvm.org/D39117
llvm-svn: 317331
Similar to the existing code to lower to PACKSS, we can use PACKUS if the input vector's leading zero bits extend all the way to the packed/truncated value.
We have to account for pre-SSE41 targets not supporting PACKUSDW
llvm-svn: 317315
We're currently bailing out for Thumb targets while lowering formal
parameters, but there used to be some other checks before it, which
could've caused some functions (e.g. those without formal parameters) to
sneak through unnoticed.
llvm-svn: 317312
Ideally we should probably produce WinEH here as well, but until
then, we can use dwarf exceptions, without any further changes
required in clang, libunwind or libcxxabi.
Differential Revision: https://reviews.llvm.org/D39535
llvm-svn: 317304
This fixes http://llvm.org/PR32560. We were missing a description for
half floating point type and as a result were using the FPR 32 mapping.
Because of the size mismatch the generic code was complaining that the
default mapping is not appropriate. Fix the mapping description so that
the default mapping can be properly applied.
llvm-svn: 317287
MSA stores and loads to the stack are more likely to require an
emergency GPR spill slot due to the smaller offsets available
with those instructions.
Handle this by overestimating the size of the stack by determining
the largest offset presuming that all callee save registers are
spilled and accounting of incoming arguments when determining
whether an emergency spill slot is required.
Reviewers: atanasyan
Differential Revision: https://reviews.llvm.org/D39056
llvm-svn: 317204
The generic dag combiner will fold:
(shl (add x, c1), c2) -> (add (shl x, c2), c1 << c2)
(shl (or x, c1), c2) -> (or (shl x, c2), c1 << c2)
This can create constants which are too large to use as an immediate.
Many ALU operations are also able of performing the shl, so we can
unfold the transformation to prevent a mov imm instruction from being
generated.
Other patterns, such as b + ((a << 1) | 510), can also be simplified
in the same manner.
Differential Revision: https://reviews.llvm.org/D38084
llvm-svn: 317197
Similar to the existing code to lower to PACKSS, we can use PACKUS if the input vector's leading zero bits extend all the way to the packed/truncated value.
We have to account for pre-SSE41 targets not supporting PACKUSDW
llvm-svn: 317128
This will enable us to prefer VALIGND/Q during shuffle lowering in order to get the extended register encoding space when BWI isn't available. But if we end up not using the extended registers we can switch VPALIGNR for the shorter VEX encoding.
Differential Revision: https://reviews.llvm.org/D39401
llvm-svn: 317122
This patch moves the check for opt size and hasPartialRegUpdate into the lower level implementation of foldMemoryOperandImpl to catch the entry point that fast isel uses.
We're still folding undef register instructions in AVX that we should also probably disable, but that's a problem for another patch.
Unfortunately, this requires reordering a bunch of functions which is why the diff is so large. I can do the function reordering separately if we want.
Differential Revision: https://reviews.llvm.org/D39402
llvm-svn: 317112
Summary:
[X86] Teach fast isel to handle i64 sitofp with AVX.
For some reason we only handled i32 sitofp with AVX. But with SSE only we support i64 so we should do the same with AVX.
Also add i686 command lines for the 32-bit tests. 64-bit tests are in a separate file to avoid a fast-isel abort failure in 32-bit mode.
Reviewers: RKSimon, zvi
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D39450
llvm-svn: 317102
This patch aims to provide correct dwarf unwind information in function
epilogue for X86.
It consists of two parts. The first part inserts CFI instructions that set
appropriate cfa offset and cfa register in emitEpilogue() in
X86FrameLowering. This part is X86 specific.
The second part is platform independent and ensures that:
- CFI instructions do not affect code generation
- Unwind information remains correct when a function is modified by
different passes. This is done in a late pass by analyzing information
about cfa offset and cfa register in BBs and inserting additional CFI
directives where necessary.
Changed CFI instructions so that they:
- are duplicable
- are not counted as instructions when tail duplicating or tail merging
- can be compared as equal
Added CFIInstrInserter pass:
- analyzes each basic block to determine cfa offset and register valid at
its entry and exit
- verifies that outgoing cfa offset and register of predecessor blocks match
incoming values of their successors
- inserts additional CFI directives at basic block beginning to correct the
rule for calculating CFA
Having CFI instructions in function epilogue can cause incorrect CFA
calculation rule for some basic blocks. This can happen if, due to basic
block reordering, or the existence of multiple epilogue blocks, some of the
blocks have wrong cfa offset and register values set by the epilogue block
above them.
CFIInstrInserter is currently run only on X86, but can be used by any target
that implements support for adding CFI instructions in epilogue.
Patch by Violeta Vukobrat.
Differential Revision: https://reviews.llvm.org/D35844
llvm-svn: 317100
So far we've only been using PACKSS truncations with 'all-bits or zero-bits' patterns (vector comparison results etc.). When really we can safely use it for any case as long as the number of sign bits reach down to the last 16-bits (or 8-bits if we're truncating to bytes).
The next steps after this is add the equivalent support for PACKUS and to support packing to sub-128 bit vectors for truncating stores etc.
Differential Revision: https://reviews.llvm.org/D39476
llvm-svn: 317086
extract subvector of vXi1 from vYi1 is poorly supported by LLVM and most of the time end with an assertion.
This patch fixes this issue by adding new patterns to the TD file.
Reviewers:
1. guyblank
2. igorb
3. zvi
4. ayman
5. craig.topper
Differential Revision: https://reviews.llvm.org/D39292
Change-Id: Ideb4d7e946c8d40cfce2920891f2d89fe64c58f8
llvm-svn: 316981
Unfortunately, this weakens our ability to do domain fixing when AVX512DQ is not enabled, but it is consistent with our 256-bit behavior.
Maybe we should add custom handling to domain fixing to allow EVEX integer XOR/AND/OR/ANDN to switch to VEX encoded fp instructions if the high registers aren't being used?
llvm-svn: 316978
Revert r316478.
A test case has failed.
Will recommit this change once we find and fix the failure.
This reverts commit 7c330fabaedaba3d02c58bc3cc1198896c895f34.
llvm-svn: 316952
Summary: The two 32-bit words were swapped. Update a test omitted in reverted r316270.
Reviewers: jtony, aaron.ballman
Subscribers: nemanjai, kbarton
Differential Revision: https://reviews.llvm.org/D39163
llvm-svn: 316916
Summary:
INC/DEC don't update the carry flag so we need to make sure we don't try to use it.
This patch introduces new X86ISD opcodes for locked INC/DEC. Teaches lowerAtomicArithWithLOCK to emit these nodes if INC/DEC is not slow or the function is being optimized for size. An additional flag is added that allows the INC/DEC to be disabled if the caller determines that the carry flag is being requested.
The test_sub_1_cmp_1_setcc_ugt test is currently showing this bug. The other test case changes are recovering cases that were regressed in r316860.
This should fully fix PR35068 finishing the fix started in r316860.
Reviewers: RKSimon, zvi, spatel
Reviewed By: zvi
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D39411
llvm-svn: 316913
Identifies kernels which performs device side kernel enqueues and emit
metadata for the associated hidden kernel arguments. Such kernels are
marked with calls-enqueue-kernel function attribute by
AMDGPUOpenCLEnqueueKernelLowering pass and later on
hidden kernel arguments metadata HiddenDefaultQueue and
HiddenCompletionAction are emitted for them.
Differential Revision: https://reviews.llvm.org/D39255
llvm-svn: 316907
- Targets that want to support memcmp expansions now return the list of
supported load sizes.
- Expansion codegen does not assume that all power-of-two load sizes
smaller than the max load size are valid. For examples, this is not the
case for x86(32bit)+sse2.
Fixes PR34887.
llvm-svn: 316905
I have a future patch that wants to make use of the one of the partial functions in one of the earlier memory folding methods and the current ordering prevents that.
llvm-svn: 316883
If the carry flag is being used, this transformation isn't safe.
This does prevent some test cases from using DEC now, but I'll try to look into that separately.
Fixes PR35068.
llvm-svn: 316860
This code attempted to say that v8i16/v16i16 VSELECT is legal if BWI and VLX are enabled, but the only way we could reach this point is if the condition was not a vXi1 type. Which means it really wasn't legal.
We don't have any tests that exercise this code. So I'm hoping it wasn't really reachable.
llvm-svn: 316851
I think this code is unreachable due to some promotions that occur elsewhere. I'll look into that to be sure, but for now I thought I should at least fix the obvious typo.
llvm-svn: 316840
This is no-functional-change-intended.
This is repackaging the functionality of D30333 (defer switch-to-lookup-tables) and
D35411 (defer folding unconditional branches) with pass parameters rather than a named
"latesimplifycfg" pass. Now that we have individual options to control the functionality,
we could decouple when these fire (but that's an independent patch if desired).
The next planned step would be to add another option bit to disable the sinking transform
mentioned in D38566. This should also make it clear that the new pass manager needs to
be updated to limit simplifycfg in the same way as the old pass manager.
Differential Revision: https://reviews.llvm.org/D38631
llvm-svn: 316835
We were handling the non-hidden case in lib/Target/TargetMachine.cpp,
but the hidden case was handled in architecture dependent code and
only X86_64 and AArch64 were covered.
While it is true that some code sequences in some ABIs might be able
to produce the correct value at runtime, that doesn't seem to be the
common case.
I left the AArch64 code in place since it also forces a got access for
non-pic code. It is not clear if that is needed, but it is probably
better to change that in another commit.
llvm-svn: 316799
As far as I can tell, this matches gcc: -mfloat-abi determines the
calling convention for all functions except those explicitly defined as
soft-float in the ARM RTABI.
This change only affects cases where the user specifies -mfloat-abi to
override the default calling convention derived from the target triple.
Fixes https://bugs.llvm.org//show_bug.cgi?id=34530.
Differential Revision: https://reviews.llvm.org/D38299
llvm-svn: 316708
If the extend type is 64-bits, emit a 32-bit -> 64-bit extend after the UDIVREM8_ZEXT_HREG/UDIVREM8_SEXT_HREG operation.
This gives a shorter encoding for the second extend in the sext case, and allows us to completely remove the second extend in the zext case.
This also adds known bit and num sign bits support for UDIVREM8_ZEXT_HREG/SDIVREM8_SEXT_HREG.
Differential Revision: https://reviews.llvm.org/D38275
llvm-svn: 316702
Instead of loading (a potential ton of) scalar constants, load those as a vector and then insert into it.
Differential Revision: https://reviews.llvm.org/D38756
llvm-svn: 316685
Summary:
This causes a segfault on ARM when (I think) the pass manager is used multiple times.
Reset set the (last) current section to NULL without saving the corresponding LastEMSInfo back into the map. The next use of the streamer then save the LastEMSInfo for the NULL section leaving the LastEMSInfo mapping for the last current section (the one that was there before the reset) NULL which cause the LastEMSInfo to be set to NULL when the section is being used again.
The reuse of the section (pointer) might mean that the map was holding dangling pointers previously which is why I went for clearing the map and resetting the info, making it as similar to the state right after the constructor run as possible. The AArch64 one doesn't have segfault (since LastEMS isn't a pointer) but it seems to have the same issue.
The segfault is likely caused by https://reviews.llvm.org/D30724 which turns LastEMSInfo into a pointer. As mentioned above, it seems that the actual issue was older though.
No test is included since the test is believed to be too complicated for such an obvious fix and not worth doing.
Reviewers: llvm-commits, shankare, t.p.northover, peter.smith, rengolin
Reviewed By: rengolin
Subscribers: mgorny, aemerson, rengolin, javed.absar, kristof.beyls
Differential Revision: https://reviews.llvm.org/D38588
llvm-svn: 316679
Currently we do not represent runtime preemption in the IR, which has several
drawbacks:
1) The semantics of GlobalValues differ depending on the object file format
you are targeting (as well as the relocation-model and -fPIE value).
2) We have no way of disabling inlining of run time interposable functions,
since in the IR we only know if a function is link-time interposable.
Because of this llvm cannot support elf-interposition semantics.
3) In LTO builds of executables we will have extra knowledge that a symbol
resolved to a local definition and can't be preemptable, but have no way to
propagate that knowledge through the compiler.
This patch adds preemptability specifiers to the IR with the following meaning:
dso_local --> means the compiler may assume the symbol will resolve to a
definition within the current linkage unit and the symbol may be accessed
directly even if the definition is not within this compilation unit.
dso_preemptable --> means that the compiler must assume the GlobalValue may be
replaced with a definition from outside the current linkage unit at runtime.
To ease transitioning dso_preemptable is treated as a 'default' in that
low-level codegen will still do the same checks it did previously to see if a
symbol should be accessed indirectly. Eventually when IR producers emit the
specifiers on all Globalvalues we can change dso_preemptable to mean 'always
access indirectly', and remove the current logic.
Differential Revision: https://reviews.llvm.org/D20217
llvm-svn: 316668
These instructions were previously marked as codegen only preventing
them from being assembled as microMIPS or disassembled.
Reviewers: atanasyan, abeserminji
Differential Revision: https://reviews.llvm.org/D39123
llvm-svn: 316656
PR35071 exposed the fact that MipsInstrInfo::removeBranch did not walk past
debug instructions when removing branches for the control flow optimizer, which
lead to duplicated conditional branches. If the target of the branch was a
removable block, only the conditional branch in the terminating position would
have it's MBB operands updated, leaving the first branch with a dangling MBB
operand. The MIPS long branch pass would then trigger an assertion when
attempting to examine the instruction with dangling MBB operand.
This resolves PR35071.
Thanks to Alex Richardson for reporting the issue!
Reviewers: atanasyan
Differential Revision: https://reviews.llvm.org/D39288
llvm-svn: 316654
Currently a record-form instruction is used for comparison of "greater than -1" and "less than 1" by modifying the predicate (e.g. LT 1 into LE 0) in addition to the naive case of comparison against 0.
This patch also enables emitting a record-form instruction for "less than or equal to -1" (i.e. "less than 0") and "greater than or equal to 1" (i.e. "greater than 0") to increase the optimization opportunities.
Differential Revision: https://reviews.llvm.org/D38941
llvm-svn: 316647
In getOffsetRange, Max can be set to 0 to force the extender replacement
to be at or below the original value. This would cause the new offset to
be non-negative, which is preferred for memory instructions (to reduce
the likelihood of it getting constant-extended due to predication). The
problem happens when the range is shifted by an offset (present in the
instruction being examined) and the offset is negative. The entire range
for the allowable deviation will then be strictly negative. This creates
a problem, since 0 is assumed to be a valid deviation.
llvm-svn: 316601
As indicated by Table 1-1 in Intel Architecture Instruction Set Extensions and Future Features Programming Reference from October 2017.
llvm-svn: 316592
We were generating BLX for all the calls, which was incorrect in most
cases. Update ARMCallLowering to generate BL for direct calls, and BLX,
BX_CALL or BMOVPCRX_CALL for indirect calls.
llvm-svn: 316570
Swap the compare operands if the lhs is a shift and the rhs isn't,
as in arm and T2 the shift can be performed by the compare for its
second operand.
Differential Revision: https://reviews.llvm.org/D39004
llvm-svn: 316562
Previously, the dllimport attribute did the right thing in terms
of treating it as a pointer to a value, but this makes sure the
names get mangled properly, and calls to such functions load the
function from the __imp_ pointer.
This is based on SVN r212431 and r212430 where the same was
implemented for ARM.
Differential Revision: https://reviews.llvm.org/D38530
llvm-svn: 316555
If particular target supports volatile memory access operations, we can
avoid AS casting to generic AS. Currently it's only enabled in NVPTX for
loads and stores that access global & shared AS.
Differential Revision: https://reviews.llvm.org/D39026
llvm-svn: 316495
Adding the scheduling information for the Browadwell (BDW) CPU target.
This patch adds the instruction scheduling information for the Broadwell (BDW) architecture target by adding the file X86SchedBroadwell.td located under the X86 Target.
We used the scheduling information retrieved from the Broadwell architects in order to create the file.
The scheduling information includes latency, number of micro-Ops and used ports by each BDW instruction.
The patch continues the scheduling replacement and insertion effort started with the SandyBridge (SNB) target in r310792, the Haswell (HSW) target in r311879, the SkylakeClient (SKL) target in rL313613 + rL315978 and the SkylakeServer (SKX) in rL315175.
Performance fluctuations may be expected due to code alignment effects.
Reviewers: zvi, RKSimon, craig.topper
Differential Revision: https://reviews.llvm.org/D39054
Change-Id: If6f799e5ff60e1091c8d43b05ea78c53581bae01
llvm-svn: 316492
If we have the situation where a Swap feeds a Splat we can sometimes change the
index on the Splat and then remove the Swap instruction.
Fixed the test case that was failing and recommit after pulling the original
commit.
Original revision is here: https://reviews.llvm.org/D39009
llvm-svn: 316478
In BPF backend, we try to optimize away redundant
trunc operations so that kernel verifier rewrite
remains valid. Previous implementation only works
for a single function.
This patch fixed the issue for multiple functions.
It clears internal map data structure before
performing optimization for each function.
Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
llvm-svn: 316469
By using the widest type possible for PACKSS truncation we have a better chance of being able to peek through bitcasts and improves other combines driven by ComputeNumSignBits.
llvm-svn: 316448
Report a diagnostic when we fail to parse a shift in a memory operand because
the shift type is not an identifier. Without this, we were silently ignoring
the whole instruction.
Differential revision: https://reviews.llvm.org/D39237
llvm-svn: 316441
Summary:
r264440 added or/and patterns for storing -1 or 0 with the intention of decreasing code size. However,
X86CallFrameOptimization does not recognize these memory accesses so it will not replace them with push's when profitable.
This patch fixes this problem by teaching X86CallFrameOptimization these store 0/-1 idioms.
An alternative fix would be to prevent the 'store 0/1 idioms' patterns from firing when accessing the stack. This would save
the need to teach the pass about these idioms. However, because X86CallFrameOptimization does not always fire we may result
in cases where neither X86CallFrameOptimization not the patterns for 'store 0/1 idioms' fire.
Fixes pr34863
Reviewers: DavidKreitzer, guyblank, aymanmus
Reviewed By: aymanmus
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38738
llvm-svn: 316431
Summary:
Kill the thread if operand 0 == false.
llvm.amdgcn.wqm.vote can be applied to the operand.
Also allow kill in all shader stages.
Reviewers: arsenm, nhaehnle
Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D38544
llvm-svn: 316427
* Remove the -arm-asm-parser-dev-diags option.
* Use normal DEBUG(dbgs()) printing for the extra development information about
missing diagnostics.
Differential Revision: https://reviews.llvm.org/D39194
llvm-svn: 316423
This is the Thumb encoding, so the Requires list must include IsThumb.
No test because we happen to select the ARM one first, but that's just luck.
Differential Revision: https://reviews.llvm.org/D39190
llvm-svn: 316421
This alias caused a crash when trying to print the "cps #0" instruction in a
diagnostic for thumbv6 (which doesn't have that instruction).
The comment was incorrect, this instruction is UNPREDICTABLE if no flag bits
are set, so I don't think it's worth keeping.
Differential Revision: https://reviews.llvm.org/D39191
llvm-svn: 316420
SelectionDAG inserts a copy of ESP into a virtual register.
X86CallFrameOptimization assumed that the COPY, if present, is always
right after the call-frame setup instruction (ADJCALLSTACKDOWN). This was a
wrong assumption as the COPY can be located anywhere between the call-frame setup
instruction and its first use. If the COPY happened to be located in a different
location than what X86CallFrameOptimization assumed, visiting it while
processing the call chain would lead to a conservative bail-out.
The fix is quite straightfoward, scan ahead for the stack-pointer copy and make note
of it so it can be ignored while processing the call chain.
Fixes pr34903
Differential Revision: https://reviews.llvm.org/D38730
llvm-svn: 316416
Infrastructure designed for padding code with nop instructions in key places such that preformance improvement will be achieved.
The infrastructure is implemented such that the padding is done in the Assembler after the layout is done and all IPs and alignments are known.
This patch by itself in a NFC. Future patches will make use of this infrastructure to implement required policies for code padding.
Reviewers:
aaboud
zvi
craig.topper
gadi.haber
Differential revision: https://reviews.llvm.org/D34393
Change-Id: I92110d0c0a757080a8405636914a93ef6f8ad00e
llvm-svn: 316413
Summary:
The motivation of this change is to enable .mir testing for this pass.
Added one test case to cover the functionality, this same case will be improved by
a future patch.
Reviewers: igorb, guyblank, DavidKreitzer
Reviewed By: guyblank, DavidKreitzer
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38729
llvm-svn: 316412
In HexagonISelLowering, there is code to handle the case when
a function returns an i1 type. In this case, we need to generate
extra nodes to copy the result from R0 to a predicate register.
The code was returning the wrong value for the chain edge which
caused an assert "Wrong topological sorting" when converting the
instructions to MIs.
This patch fixes the problem by returning the chain for the final
copy.
Patch by Brendon Cahoon.
llvm-svn: 316367
If we have the situation where a Swap feeds a Splat we can sometimes change the
index on the Splat and then remove the Swap instruction.
Differential Revision: https://reviews.llvm.org/D39009
llvm-svn: 316366
This patch enables the import of stores. Unfortunately, doing so by itself,
loses an optimization where storing 0 to memory makes use of WZR/XZR.
To mitigate this, this patch also introduces a new feature that allows register
operands to nominate a zero register. When this is done, GlobalISel will
substitute (G_CONSTANT 0) with the nominated register automatically. This
is currently configured to only apply to the stores.
Applying it to GPR32/GPR64 register classes in general will be done after
review see (https://reviews.llvm.org/D39150).
llvm-svn: 316360
The range should be assumed to be the hardware maximum
if a workitem intrinsic is used in a callable function
which does not know the restricted limit of the calling
kernel.
llvm-svn: 316346
I don't think this changes anything functionally yet, but I plan to fix the disassembler to use this to disable matching certain instructions with 0xf3/0xf2/0x66 prefixes.
llvm-svn: 316337
Remove AssertZext and instead add PEXTRW/PEXTRB support to computeKnownBitsForTargetNode to simplify instruction selection.
Differential Revision: https://reviews.llvm.org/D39169
llvm-svn: 316336
Before, loop unrolling was only enabled for loops with a single
block. This restriction has been removed and replaced by:
- allow a maximum of two exiting blocks,
- a four basic block limit for cores with a branch predictor.
Differential Revision: https://reviews.llvm.org/D38952
llvm-svn: 316313
This fixes bugzilla 26810
https://bugs.llvm.org/show_bug.cgi?id=26810
This is intended to prevent sequences like:
movl %ebp, 8(%esp) # 4-byte Spill
movl %ecx, %ebp
movl %ebx, %ecx
movl %edi, %ebx
movl %edx, %edi
cltd
idivl %esi
movl %edi, %edx
movl %ebx, %edi
movl %ecx, %ebx
movl %ebp, %ecx
movl 16(%esp), %ebp # 4 - byte Reload
Such sequences are created in 2 scenarios:
Scenario #1:
vreg0 is evicted from physreg0 by vreg1
Evictee vreg0 is intended for region splitting with split candidate physreg0 (the reg vreg0 was evicted from)
Region splitting creates a local interval because of interference with the evictor vreg1 (normally region spliiting creates 2 interval, the "by reg" and "by stack" intervals. Local interval created when interference occurs.)
one of the split intervals ends up evicting vreg2 from physreg1
Evictee vreg2 is intended for region splitting with split candidate physreg1
one of the split intervals ends up evicting vreg3 from physreg2 etc.. until someone spills
Scenario #2
vreg0 is evicted from physreg0 by vreg1
vreg2 is evicted from physreg2 by vreg3 etc
Evictee vreg0 is intended for region splitting with split candidate physreg1
Region splitting creates a local interval because of interference with the evictor vreg1
one of the split intervals ends up evicting back original evictor vreg1 from physreg0 (the reg vreg0 was evicted from)
Another evictee vreg2 is intended for region splitting with split candidate physreg1
one of the split intervals ends up evicting vreg3 from physreg2 etc.. until someone spills
As compile time was a concern, I've added a flag to control weather we do cost calculations for local intervals we expect to be created (it's on by default for X86 target, off for the rest).
Differential Revision: https://reviews.llvm.org/D35816
Change-Id: Id9411ff7bbb845463d289ba2ae97737a1ee7cc39
llvm-svn: 316295
This patch implements dynamic stack (re-)alignment for 16-bit Thumb. When
targeting processors, which support only the 16-bit Thumb instruction set
the compiler ignores the alignment attributes of automatic variables and may
silently generate incorrect code.
Differential revision: https://reviews.llvm.org/D38143
llvm-svn: 316289
The pass scans the function to find instruction chains that define
registers in the same domain (closures).
It then calculates the cost of converting the closure to another domain.
If found profitable, the instructions are converted to instructions in
the other domain and the register classes are changed accordingly.
This commit adds the pass infrastructure and a simple conversion from
the GPR domain to the Mask domain.
Differential Revision:
https://reviews.llvm.org/D37251
Change-Id: Ic2cf1d76598110401168326d411128ae2580a604
llvm-svn: 316288
This introduces a new operand type to encode the whether the index register should be XMM/YMM/ZMM. And new code to fixup the results created by readSIB.
This has the nice effect of removing a bunch of code that hard coded the name of every GATHER and SCATTER instruction to map the index type.
This fixes PR32807.
llvm-svn: 316273
Summary: __multi3 is not available on x86 (32-bit). Setting lib call name for MULI_128 to nullptr forces DAGTypeLegalizer::ExpandIntRes_MUL to generate instructions for 128-bit multiply instead of a call to an undefined function. This fixes PR20871 though it may be worth looking at why licm and indvars combine to generate 65-bit multiplies in that test.
Patch by Riyaz V Puthiyapurayil
Reviewers: craig.topper, schweitz
Reviewed By: craig.topper, schweitz
Subscribers: RKSimon, llvm-commits
Differential Revision: https://reviews.llvm.org/D38668
llvm-svn: 316254
Summary:
This change comes from using lld for i686-windows-msvc. Before this change, lld
emits an error of:
error: relocation against symbol in discarded section: .xdata
It's possible that this could be addressed in lld, but I think this change is
reasonable on its own.
At a high level, this is being generated:
A (.text comdat) -> B (.text) -> C (.xdata comdat)
Where A is a C++ inline function, which references B, an exception handler
thunk, which references C, the exception handling info.
With this structure, lld will error when applying relocations to B if the C it
references has been discarded (some other C has been selected).
This change checks if A is comdat, and if so places the exception registration
thunk (B) in the comdata group of A (and B).
It appears that MSVC makes the __ehhandler function comdat.
Is it possible that duplicate thunks are being emitted into the final binary
with other linkers, or are they stripping the unused thunks?
Reviewers: rnk, majnemer, compnerd, smeenai
Reviewed By: rnk, compnerd
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38940
llvm-svn: 316219
Normally, if the registers holding the induction variable's bounds
are redefined inside of the loop's body, the loop cannot be converted
to a hardware loop. However, if the redefining instruction is actually
loading an immediate value into the register, this conversion is both
possible and legal (since the immediate itself will be used in the
loop setup in the preheader).
llvm-svn: 316218
The commit at https://reviews.llvm.org/rL315888 is causing some failures
with internal testing. Disabling this code until we can resolve the issues.
llvm-svn: 316199
This adds the minimum necessary to support codegen for simple ALU operations
on RV32. Prolog and epilog insertion, support for memory operations etc etc
follow in future patches.
Leave guessInstructionProperties=1 until https://reviews.llvm.org/D37065 is
reviewed and lands.
Differential Revision: https://reviews.llvm.org/D29933
llvm-svn: 316188
x86 has its own copy of integer absolute pattern matching to combine directly to a SUB+CMOV.
This patch removes the x86 combine and adds custom lowering support for ISD::ABS instead, allowing us to use the DAGCombiner version.
Additional test cases are already covered by iabs.ll (rL315706 and rL315711).
Differential Revision: https://reviews.llvm.org/D38895
llvm-svn: 316162
While parameterising by XLen, also take the opportunity to clean up the
formatting of the RISCV .td files.
This commit unifies the in-tree code with my patchset at
<https://github.com/lowrisc/riscv-llvm>.
llvm-svn: 316159
This patch lets the llvm tools handle the new HVX target features that
are added by frontend (clang). The target-features are of the form
"hvx-length64b" for 64 Byte HVX mode, "hvx-length128b" for 128 Byte mode HVX.
"hvx-double" is an alias to "hvx-length128b" and is soon will be deprecated.
The hvx version target feature is upgated form "+hvx" to "+hvxv{version_number}.
Eg: "+hvxv62"
For the correct HVX code generation, the user must use the following
target features.
For 64B mode: "+hvxv62" "+hvx-length64b"
For 128B mode: "+hvxv62" "+hvx-length128b"
Clang picks a default length if none is specified. If for some reason,
no hvx-length is specified to llvm, the compilation will bail out.
There is a corresponding clang patch.
Differential Revision: https://reviews.llvm.org/D38851
llvm-svn: 316101
All loads of form V6_vL32b_{,cur,nt,tmp,nt_cur,nt_tmp}_{ai,pi,ppu} are
predicable on v62 (but not on v60). Mark them all as predicable in the
instruction definitions, and handle the v60 case in HII::isPredicable.
llvm-svn: 316098
r315275 set the IsLittleEndian parameter incorrectly. This patch corrects
this, and adds a test to ensure such mistakes will be caught in the future.
llvm-svn: 316091
In the case where there was a conditional branch followed by a unconditional
branch with debug instruction separating them, MipsInstrInfo::analyzeBranch
would not skip past debug instruction when searching for the second branch
which give erroneous results about the control flow of the block.
This could lead to the branch folder to merge the non-fall through case
into it's predecessor, leaving the conditional branch with a dangling
basic block operand.
This resolves PR34975.
Thanks to Alexander Richardson for reporting the issue!
Reviewers: atanasyan
Differential Revision: https://reviews.llvm.org/D39003
llvm-svn: 316084
This patch adds accurate instructions cost.
The formula presents two cases(stride 3 and stride 4) and calculates the cost according to the VF and stride.
Reviewers:
1. delena
2. Farhana
3. zvi
4. dorit
5. Ayal
Differential Revision: https://reviews.llvm.org/D38762
Change-Id: If4cfbd4ac0e63694e8144cb78c7fa34850647ff7
llvm-svn: 316072
Helper functions to identify sign- and zero-extending machine instruction is introduced in rL315888.
This patch makes PPCInstrInfo::optimizeCompareInstr use the helper functions. It simplifies the code and also makes possible more optimizations since the helper can do more analysis than the original check code; I observed about 5000 more compare instructions are eliminated while building LLVM.
Also, this patch fixes a bug in helpers on ANDIo instruction handling due to the order of checks. This bug causes a failure in an existing test case for optimizeCompareInstr.
Differential Revision: https://reviews.llvm.org/D38988
llvm-svn: 316071
Note that cyclone itself doesn't fuse, but newer apple chips do and we
are using cyclone as the default when targeting apple OSes.
The current code also does not capture all fusion patterns of apple CPUs
yet; I am still looking for ways to refactor the code nicely to extend
it.
llvm-svn: 316036
If the address of a local is used in a comparison, AArch64 can fold the
address-calculation into the comparison via "adds". Unfortunately, a couple of
places (both hit in this one test) are not ready to deal with that yet and just
assume the first source operand is a register.
llvm-svn: 316035
Summary:
Make sure the map is cleared before processing a new module. Similar to what is done on `StackMaps`.
This issue is similar to D38588, though this time for FaultMaps (on x86) rather than ARM/AArch64. Other than possible mixing of information between modules, the crash is caused by the pointers values in the map that was allocated by the bump pointer allocator that is unwinded when emitting the next file. This issue has been around since 3.8.
This issue is likely much harder to write a test for since AFAICT it requires emitting something much more compilcated (and possibly real code) instead of just some random bytes.
Reviewers: skatkov, sanjoy
Reviewed By: skatkov, sanjoy
Subscribers: sanjoy, aemerson, llvm-commits, kristof.beyls
Differential Revision: https://reviews.llvm.org/D38924
llvm-svn: 315990
Updated the scheduling information for the SkylakeClient target with the following changes:
1. regrouped the instructions after adding load and store latencies.
2. regrouped the instructions after adding identified missing ports in several groups.
The changes were made after revisiting the latencies impact of all the load and store uOps.
Reviewers: zvi, RKSimon, craig.topper
Differential Revision: https://reviews.llvm.org/D38727
Change-Id: I778a308cc11e490e8fa5e27e2047412a1dca029f
llvm-svn: 315978
This reverts commit r315823, thus re-applying r315781.
Also make sure we don't use G_BITCAST mapping for non-generic registers.
Non-generic registers don't have a type but do have a reg bank.
Something the COPY mapping now how to deal with but the G_BITCAST
mapping don't.
-- Original Commit Message --
We use to resort on the generic implementation to get the mappings for
COPYs. The generic implementation resorts on table lookup and
dynamically allocated objects to get the valid mappings.
Given we already know how to map G_BITCAST and have the static mappings
for them, use that code path for COPY as well. This is much more
efficient.
Improve the compile time of RegBankSelect by up to 20%.
Note: When we eventually generate all the mappings via TableGen, we
wouldn't have to do that dance to shave compile time. The intent of this
change was to make sure that moving to static structure really pays off.
NFC.
llvm-svn: 315947
Previously these instructions were marked codegen only and had
an under-specified instruction description that did not record the
fcc register.
Reviewers: atanasyan, abeserminji
Differential Revision: https://reviews.llvm.org/D38847
llvm-svn: 315905
We came across an llvm bug when compiling some testcases that 64-bit
immediates are silently truncated into 32-bit and then packed into
BPF_JMP | BPF_K encoding. This caused comparison with wrong value.
This bug looks to be introduced by r308080. The Select_Ri pattern is
supposed to be lowered into J*_Ri while the latter only support 32-bit
immediate encoding, therefore Select_Ri should have similar immediate
predicate check as what J*_Ri are doing.
Reported-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Yonghong Song <yhs@fb.com>
llvm-svn: 315889
This patch enables redundant sign- and zero-extension elimination in PowerPC MI Peephole pass.
If the input value of a sign- or zero-extension is known to be already sign- or zero-extended, the operation is redundant and can be eliminated.
One common case is sign-extensions for a method parameter or for a method return value; they must be sign- or zero-extended as defined in PPC ELF ABI.
For example of the following simple code, two extsw instructions are generated before the invocation of int_func and before the return. With this patch, both extsw are eliminated.
void int_func(int);
void ii_test(int a) {
if (a & 1) return int_func(a);
}
Such redundant sign- or zero-extensions are quite common in many programs; e.g. I observed about 60,000 occurrences of the elimination while compiling the LLVM+CLANG.
Differential Revision: https://reviews.llvm.org/D31319
llvm-svn: 315888
Summary:
iPTR is a pointer of subtarget-specific size to any address space. Therefore
type checks on this size derive the SizeInBits from a subtarget hook.
At this point, we can import the simplests G_LOAD rules and select load
instructions using them. Further patches will support for the predicates to
enable additional loads as well as the stores.
The previous commit failed on MSVC due to a failure to convert an
initializer_list to a std::vector. Hopefully, MSVC will accept this version.
Depends on D37457
Reviewers: ab, qcolombet, t.p.northover, rovka, aditya_nandakumar
Reviewed By: qcolombet
Subscribers: kristof.beyls, javed.absar, llvm-commits, igorb
Differential Revision: https://reviews.llvm.org/D37458
llvm-svn: 315887
Summary:
iPTR is a pointer of subtarget-specific size to any address space. Therefore
type checks on this size derive the SizeInBits from a subtarget hook.
At this point, we can import the simplests G_LOAD rules and select load
instructions using them. Further patches will support for the predicates to
enable additional loads as well as the stores.
Depends on D37457
Reviewers: ab, qcolombet, t.p.northover, rovka, aditya_nandakumar
Reviewed By: qcolombet
Subscribers: kristof.beyls, javed.absar, llvm-commits, igorb
Differential Revision: https://reviews.llvm.org/D37458
llvm-svn: 315885
Summary:
It's possible for a ComplexPattern to be used as an operator in a match
pattern. This is used by the load/store patterns in AArch64 to name the
suboperands returned by ComplexPattern predicate so that they can be broken
apart and referenced independently in the result pattern.
This patch adds support for this in order to enable the import of load/store
patterns.
Depends on D37445
Hopefully fixed the ambiguous constructor that a large number of bots reported.
Reviewers: ab, qcolombet, t.p.northover, rovka, aditya_nandakumar
Reviewed By: qcolombet
Subscribers: aemerson, javed.absar, igorb, llvm-commits, kristof.beyls
Differential Revision: https://reviews.llvm.org/D37456
llvm-svn: 315869
Summary:
It's possible for a ComplexPattern to be used as an operator in a match
pattern. This is used by the load/store patterns in AArch64 to name the
suboperands returned by ComplexPattern predicate so that they can be broken
apart and referenced independently in the result pattern.
This patch adds support for this in order to enable the import of load/store
patterns.
Depends on D37445
Reviewers: ab, qcolombet, t.p.northover, rovka, aditya_nandakumar
Reviewed By: qcolombet
Subscribers: aemerson, javed.absar, igorb, llvm-commits, kristof.beyls
Differential Revision: https://reviews.llvm.org/D37456
llvm-svn: 315863
Turns out we have no patterns on the instructions that were using this feature flag for other reasons. These instructions are slow on all modern CPUs so it seems unlikely that we will spend any effort supporting these instructions going forward. So we might as well just kill of the feature flag and just fix up the comments.
llvm-svn: 315862
Summary:
This was impeding our ability to combine the extending shuffles with other shuffles as you can see from the test changes.
There's one special case that needed to be added to use VZEXT directly for v8i8->v8i64 since the custom lowering requires v64i8.
Reviewers: RKSimon, zvi, delena
Reviewed By: delena
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38714
llvm-svn: 315860
Summary: I see nothing in Agner Fog's tables to indicate that this improved between Ivy Bridge and Haswell. It's also set for all Atom CPUs so I assume KNL should have it too.
Reviewers: RKSimon, zvi, gadi.haber
Reviewed By: gadi.haber
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38890
llvm-svn: 315859
Summary:
It's better to use our shuffle lowering code to handle these than loading an immediate into a k-register.
It really feels like this should be a DAG combine optimization rather than a lowering operation, but that's a problem for another day.
Reviewers: RKSimon, delena, zvi
Reviewed By: delena
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38932
llvm-svn: 315849
Fixes cfe/trunk/test/Misc/backend-resource-limit-diagnostics.cl
test after r315808
We may hit few other similar issues, but I want to discuss good
solution offline.
llvm-svn: 315830
- Do not allow amd_amdgpu_isa directives on non-amdgcn architectures
- Do not allow amd_amdgpu_hsa_metadata on non-amdhsa OSes
- Do not allow amd_amdgpu_pal_metadata on non-amdpal OSes
Differential Revision: https://reviews.llvm.org/D38750
llvm-svn: 315812
- Emit NT_AMD_AMDGPU_ISA
- Add assembler parsing for isa version directive
- If isa version directive does not match command line arguments, then return error
Differential Revision: https://reviews.llvm.org/D38748
llvm-svn: 315808
If we are applying a byte mask to a value extracted from a shuffle, see if we can combine the mask into shuffle.
Fixes the last issue with PR22415
llvm-svn: 315807
We don't need a bitconvert as a root pattern in these cases. The types in the other parts of the pattern are sufficient to express the behavior of these instructions.
llvm-svn: 315798
I believe these were added incorrectly under the belief that the load size was smaller than the input register size, but that's not true.
llvm-svn: 315795
We use to resort on the generic implementation to get the mappings for
COPYs. The generic implementation resorts on table lookup and
dynamically allocated objects to get the valid mappings.
Given we already know how to map G_BITCAST and have the static mappings
for them, use that code path for COPY as well. This is much more
efficient.
Improve the compile time of RegBankSelect by up to 20%.
Note: When we eventually generate all the mappings via TableGen, we
wouldn't have to do that dance to shave compile time. The intent of this
change was to make sure that moving to static structure really pays off.
NFC.
llvm-svn: 315781
This is particularly important for AVX512VL where we are better able to recognize the VBROADCAST loads to fold with other operations.
For AVX512VL we now use X86ISD::VBROADCAST for all of the patterns and remove the 128-bit X86ISD::VMOVDDUP.
We may be able to use this for AVX1 as well which would allow us to remove more isel patterns.
I also had to add X86ISD::VBROADCAST as a node to call combineShuffle for so that we treat it similar to X86ISD::MOVDDUP.
Differential Revision: https://reviews.llvm.org/D38836
llvm-svn: 315768
Summary:
There's only a tablegen testcase for IntImmLeaf and not a CodeGen one
because the relevant rules are rejected for other reasons at the moment.
On AArch64, it's because there's an SDNodeXForm attached to the operand.
On X86, it's because the rule either emits multiple instructions or has
another predicate using PatFrag which cannot easily be supported at the
same time.
Reviewers: ab, t.p.northover, qcolombet, rovka, aditya_nandakumar
Reviewed By: qcolombet
Subscribers: aemerson, javed.absar, igorb, llvm-commits, kristof.beyls
Differential Revision: https://reviews.llvm.org/D36569
llvm-svn: 315761
Summary:
The purpose of this patch is to expose more information about ImmLeaf-like
PatLeaf's so that GlobalISel can learn to import them. Previously, ImmLeaf
could only be used to test int64_t's produced by sign-extending an APInt.
Other tests on immediates had to use the generic PatLeaf and extract the
constant using C++.
With this patch, tablegen will know how to generate predicates for APInt,
and APFloat. This will allow it to 'do the right thing' for both SelectionDAG
and GlobalISel which require different methods of extracting the immediate
from the IR.
This is NFC for SelectionDAG since the new code is equivalent to the
previous code. It's also NFC for FastISel because FastIselShouldIgnore is 1
for the ImmLeaf subclasses. Enabling FastIselShouldIgnore == 0 for these new
subclasses will require a significant re-factor of FastISel.
For GlobalISel, it's currently NFC because the relevant code to import the
affected rules is not yet present. This will be added in a later patch.
Depends on D36086
Reviewers: ab, t.p.northover, qcolombet, rovka, aditya_nandakumar
Reviewed By: qcolombet
Subscribers: bjope, aemerson, rengolin, javed.absar, igorb, llvm-commits, kristof.beyls
Differential Revision: https://reviews.llvm.org/D36534
llvm-svn: 315747
This is only currently used for mad/fma transforms.
This is the only case where it should be used for AMDGPU,
so add an opcode to be sure.
llvm-svn: 315740
Each constant extender requires an extra instruction, which adds to the
code size and also reduces the number of available slots in an instruction
packet. In most cases, the value of a repeated constant extender could be
loaded into a register, and the instructions using the extender could be
replaced with their counterparts that use that register instead.
This patch adds a pass that tries to reduce the number of constant
extenders, including extenders which differ only in an immediate offset
known at compile time, e.g. @global and @global+12.
llvm-svn: 315735
This adds Intel's Knights Mill CPU to valid CPU names for the backend. For now its an alias of "knl", but ultimately we need to support AVX5124FMAPS and AVX5124VNNIW instruction sets for it.
Differential Revision: https://reviews.llvm.org/D38811
llvm-svn: 315722
Summary: We seem to inconsistently create CMOV nodes some with a Glue result and some without. But I can't find any cases that use the Glue result. So I've tried to remove all the place that did this.
Reviewers: RKSimon, spatel, zvi
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38664
llvm-svn: 315686
There's no advantage to using these instructions when they aren't masked. This enables some additional execution domain switching without needing to update the table.
llvm-svn: 315674
Reverting to investigate layering effects of MCJIT not linking
libCodeGen but using TargetMachine::getNameWithPrefix() breaking the
lldb bots.
This reverts commit r315633.
llvm-svn: 315637
Merge LLVMTargetMachine into TargetMachine.
- There is no in-tree target anymore that just implements TargetMachine
but not LLVMTargetMachine.
- It should still be possible to stub out all the various functions in
case a target does not want to use lib/CodeGen
- This simplifies the code and avoids methods ending up in the wrong
interface.
Differential Revision: https://reviews.llvm.org/D38489
llvm-svn: 315633
WMMA = "Warp Level Matrix Multiply-Accumulate".
These are the new instructions introduced in PTX6.0 and available
on sm_70 GPUs.
Differential Revision: https://reviews.llvm.org/D38645
llvm-svn: 315601
Add profitability checks for modifying counted loops to use the mtctr instruction.
The latency of mtctr is only justified if there are more than 4 comparisons that
will be removed as a result. Usually counted loops are formed relatively early
and before unrolling, so most low trip count loops often don't survive. However
we want to ensure that if they do, we do not mistakenly update them to mtctr loops.
Use CodeMetrics to ensure we are only doing this for small loops with small trip counts.
Differential Revision: https://reviews.llvm.org/D38212
llvm-svn: 315592
Summary:
The interpolation mode workaround ensures that at least one
interpolation mode is enabled in PSInputAddr. It does not also check
PSInputEna on the basis that the user might enable bits in that
depending on run-time state.
However, for amdpal os type, the user does not enable some bits after
compilation based on run-time states; the register values being
generated here are the final ones set in the hardware. Therefore, apply
the workaround to PSInputAddr and PSInputEnable together. (The case
where a bit is set in PSInputAddr but not in PSInputEnable is where the
frontend set up an input arg for a particular interpolation mode, but
nothing uses that input arg. Really we should have an earlier pass that
removes such an arg.)
Reviewers: arsenm, nhaehnle, dstuttard
Subscribers: kzhuravl, wdng, yaxunl, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D37758
llvm-svn: 315591
Summary:
Add LLVM_FORCE_ENABLE_DUMP cmake option, and use it along with
LLVM_ENABLE_ASSERTIONS to set LLVM_ENABLE_DUMP.
Remove NDEBUG and only use LLVM_ENABLE_DUMP to enable dump methods.
Move definition of LLVM_ENABLE_DUMP from config.h to llvm-config.h so
it'll be picked up by public headers.
Differential Revision: https://reviews.llvm.org/D38406
llvm-svn: 315590
MCObjectStreamer owns its MCCodeEmitter -- this fixes the types to reflect that,
and allows us to remove the last instance of MCObjectStreamer's weird "holding
ownership via someone else's reference" trick.
llvm-svn: 315531
- Use HSA metadata streamer directly from AMDGPUAsmPrinter
- Make naming consistent with PAL metadata
Differential Revision: https://reviews.llvm.org/D38746
llvm-svn: 315526
- Move PAL metadata definitions to AMDGPUMetadata
- Make naming consistent with HSA metadata
Differential Revision: https://reviews.llvm.org/D38745
llvm-svn: 315523
- Rename AMDGPUCodeObjectMetadata to AMDGPUMetadata (PAL metadata will be included in this file in the follow up change)
- Rename AMDGPUCodeObjectMetadataStreamer to AMDGPUHSAMetadataStreamer
- Introduce HSAMD namespace
- Other minor name changes in function and test names
llvm-svn: 315522
Summary:
This adds a set of new directives that describe 32-bit x86 prologues.
The directives are limited and do not expose the full complexity of
codeview FPO data. They are merely a convenience for the compiler to
generate more readable assembly so we don't need to generate tons of
labels in CodeGen. If our prologue emission changes in the future, we
can change the set of available directives to suit our needs. These are
modelled after the .seh_ directives, which use a different format that
interacts with exception handling.
The directives are:
.cv_fpo_proc _foo
.cv_fpo_pushreg ebp/ebx/etc
.cv_fpo_setframe ebp/esi/etc
.cv_fpo_stackalloc 200
.cv_fpo_endprologue
.cv_fpo_endproc
.cv_fpo_data _foo
I tried to follow the implementation of ARM EHABI CFI directives by
sinking most directives out of MCStreamer and into X86TargetStreamer.
This helps avoid polluting non-X86 code with WinCOFF specific logic.
I used cdb to confirm that this can show locals in parent CSRs in a few
cases, most importantly the one where we use ESI as a frame pointer,
i.e. the one in http://crbug.com/756153#c28
Once we have cdb integration in debuginfo-tests, we can add integration
tests there.
Reviewers: majnemer, hans
Subscribers: aemerson, mgorny, kristof.beyls, llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D38776
llvm-svn: 315513
Currently we produce a bunch of unnecessary code when emitting the
prologue/epilogue for spills/restores. Namely, if the load from stack
slot/store to stack slot instruction is an X-Form instruction, we will
always produce an LIS/ORI sequence for the stack offset.
Furthermore, we have not exploited the P9 vector D-Form loads/stores for this
purpose.
This patch address both issues.
Specifying the D-Form load as the instruction to use for stack spills/reloads
should be safe because:
1. The stack should be aligned according to the ABI
2. If the stack isn't aligned, PPCRegisterInfo::eliminateFrameIndex() will
check for the offset being a multiple of 16 and will convert it to an
X-Form instruction if it isn't.
Differential Revision : https://reviews.llvm.org/D38758
llvm-svn: 315500
Legalization of fp128 assumes things that we should have asserts for,
so that's another potential improvement.
Differential Revision: https://reviews.llvm.org/D38771
llvm-svn: 315485
The software pipeliner and the packetizer try to break dependence
between the post-increment instruction and the dependent memory
instructions by changing the base register and the offset value.
However, in some cases, the existing logic didn't work properly
and created incorrect offset value.
Patch by Jyotsna Verma.
llvm-svn: 315468
The pipeliner is generating a serial sequence that causes poor
register allocation when a post-increment instruction appears
prior to the use of the post-increment register. This occurs when
there is a circular set of dependences involved with a sequence
of instructions in the same cycle. In this case, there is no
serialization of the parallel semantics that will not cause an
additional register to be allocated.
This patch fixes the problem by changing the instructions so that
the post-increment instruction is used by the subsequent
instruction, which enables the register allocator to make a
better decision and not require another register.
Patch by Brendon Cahoon.
llvm-svn: 315466
This adds debug tracing to the table-generated assembly instruction matcher,
enabled by the -debug-only=asm-matcher option.
The changes in the target AsmParsers are to add an MCInstrInfo reference under
a consistent name, so that we can use it from table-generated code. This was
already being used this way for targets that use deprecation warnings, but 5
targets did not have it, and Hexagon had it under a different name to the other
backends.
llvm-svn: 315445
MCObjectStreamer owns its MCAsmBackend -- this fixes the types to reflect that,
and allows us to remove another instance of MCObjectStreamer's weird "holding
ownership via someone else's reference" trick.
llvm-svn: 315410
Rather than using the AdditionalPredicates mechanism to guard
the microMIPS instructions, use the existing predicates to properly
guard those instructions.
This also resolves a case where an instruction pattern was incorrectly
available for microMIPS32R6, which caused a register allocation failure
as the registers specified in the pattern were not available.
Reviewers: nitesh.jain, atanasyan
Differential Revision: https://reviews.llvm.org/D38451
llvm-svn: 315362
opt-bisect/optnone disable the AMDGPUUniformAnnotateValues pass.
The heuristic in the custom selector for brcond deferred the
branch uniformity check to the pattern, which would fail.
llvm-svn: 315360
This patch adds a post-linking pass which replaces the function pointer of enqueued
block kernel with a global variable (runtime handle) and adds
runtime-handle attribute to the enqueued block kernel.
In LLVM CodeGen the runtime-handle metadata will be translated to
RuntimeHandle metadata in code object. Runtime allocates a global buffer
for each kernel with RuntimeHandel metadata and saves the kernel address
required for the AQL packet into the buffer. __enqueue_kernel function
in device library knows that the invoke function pointer in the block
literal is actually runtime handle and loads the kernel address from it
and puts it into AQL packet for dispatching.
This cannot be done in FE since FE cannot create a unique global variable
with external linkage across LLVM modules. The global variable with internal
linkage does not work since optimization passes will try to replace loads
of the global variable with its initialization value.
Differential Revision: https://reviews.llvm.org/D38610
llvm-svn: 315352
functions.
This makes the ownership of the resulting MCObjectWriter clear, and allows us
to remove one instance of MCObjectStreamer's bizarre "holding ownership via
someone else's reference" trick.
llvm-svn: 315327
Summary:
The pass to fix function bitcasts generates thunks for functions that
are called directly with a mismatching signature. It was also generating
thunks in cases where the function was address-taken, causing aliasing
problems in otherwise valid cases.
This patch tightens the restrictions for when the pass runs.
Reviewers: sunfish, dschuff
Subscribers: jfb, sbc100, llvm-commits, aheejin
Differential Revision: https://reviews.llvm.org/D38640
llvm-svn: 315326
Add instruction definitions for FP32 mode for recip.d and rsqrt.d.
Previously these instructions were only defined when targeting the
full 64-bit FPU model but were not guarded properly.
Reviewers: nitesh.jain, atanasyan
Differential Revision: https://reviews.llvm.org/D38400
llvm-svn: 315318
A number of record form instructions were missing from the P9 scheduling
model. Added those instructions and marked the P9 model as complete.
Differential Revision: https://reviews.llvm.org/D38560
llvm-svn: 315313
Previously, the parsing of the 'subu $reg, ($reg,) imm' relied on a parser
which also rendered the operand to the instruction. In some cases the
general parser could construct an MCExpr which was not a MCConstantExpr
which MipsAsmParser was expecting.
Address this by altering the special handling to cope with unexpected inputs
and fine-tune the handling of cases where an register name that is not
available in the current ABI is regarded as not a match for the custom parser
but also not as an outright error.
Also enforces the binutils restriction that only constants are accepted.
This partially resolves PR34391.
Thanks to Ed Maste for reporting the issue!
Reviewers: nitesh.jain, arichardson
Differential Revision: https://reviews.llvm.org/D37476
llvm-svn: 315310
Previously, the code that implemented the GNU assembler aliases for the
LDRD and STRD instructions (where the second register is omitted)
assumed that the input was a valid instruction. This caused assertion
failures for every example in ldrd-strd-gnu-bad-inst.s.
This improves this code so that it bails out if the instruction is not
in the expected format, the check bails out, and the asm parser is run
on the unmodified instruction.
It also relaxes the alias on thumb targets, so that unaligned pairs of
registers can be used. The restriction that Rt must be even-numbered
only applies to the ARM versions of these instructions.
Differential revision: https://reviews.llvm.org/D36732
llvm-svn: 315305
This adds diagnostic strings for the ARM floating-point register
classes, which will be used when these classes are expected by the
assembler, but the provided operand is not valid.
One of these, DPR, requires C++ code to select the correct error
message, as that class contains different registers depending on the
FPU. The rest can all have their diagnostic strings stored in the
tablegen decription of them.
Differential revision: https://reviews.llvm.org/D36693
llvm-svn: 315304
This adds diagnostic strings for the ARM general-purpose register
classes, which will be used when these classes are expected by the
assembler, but the provided operand is not valid.
One of these, rGPR, requires C++ code to select the correct error
message, as that class contains different registers in pre-v8 and v8
targets. The rest can all have their diagnostic strings stored in the
tablegen description of them.
Differential revision: https://reviews.llvm.org/D36692
llvm-svn: 315303
Summary:
Atomic buffer operations do not work (and trap on gfx9) when the
components are unaligned, even if their sum is aligned.
Previously, we generated an offset of 4156 without an SGPR by
splitting it as 4095 + 61 (immediate + inline constant). The
highest offset for which we can do this correctly is 4156 = 4092 + 64.
Fixes dEQP-GLES31.functional.ssbo.atomic.*
Reviewers: arsenm
Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye
Differential Revision: https://reviews.llvm.org/D37850
llvm-svn: 315302
The issue is that we assume operand zero of the input to the add instruction
is a register. In this case, the input comes from inline assembly and
operand zero is not a register thereby causing a crash.
The code will bail anyway if the input instruction doesn't have the right
opcode. So do that check first and let short-circuiting prevent the crash.
llvm-svn: 315285
This makes the .seh_ directives slightly more usable from standalone
assembly files.
This removes a large number of report_fatal_errors and recovers from the
error by ignoring the directive.
llvm-svn: 315262
to WasmObjectWriter's constructor.
Fixes the same ownership issue for COFF that r315245 did for MachO:
WasmObjectWriter takes ownership of its MCWasmObjectTargetWriter, so we want to
pass this through to the constructor via a unique_ptr, rather than a raw ptr.
llvm-svn: 315260
createWinCOFFObjectWriter to WinCOFFObjectWriter's constructor.
Fixes the same ownership issue for COFF that r315245 did for MachO:
WinCOFFObjectWriter takes ownership of its MCWinCOFFObjectTargetWriter, so we
want to pass this through to the constructor via a unique_ptr, rather than a
raw ptr.
llvm-svn: 315257
ELFObjectWriter's constructor.
Fixes the same ownership issue for ELF that r315245 did for MachO:
ELFObjectWriter takes ownership of its MCELFObjectTargetWriter, so we want to
pass this through to the constructor via a unique_ptr, rather than a raw ptr.
llvm-svn: 315254
to MCObjectWriter's constructor.
MCObjectWriter takes ownership of its MCMachObjectTargetWriter argument -- this
patch plumbs that ownership relationship through the constructor (which
previously took raw MCMachObjectTargetWriter*) and the createMachObjectWriter
function.
llvm-svn: 315245
We end up creating COPY's that are either truncating/extending and this
should be illegal.
https://reviews.llvm.org/D37640
Patch for X86 and ARM by igorb, rovka
llvm-svn: 315240
Summary:
On behalf of julia.koval@intel.com
The patch transforms canonical version of unsigned saturation, which is sub(max(a,b),a) or sub(a,min(a,b)) to special psubus insturuction on targets, which support it(8bit and 16bit uints).
umax(a,b) - b -> subus(a,b)
a - umin(a,b) -> subus(a,b)
There is also extra case handled, when right part of sub is 32 bit and can be truncated, using UMIN(this transformation was discussed in https://reviews.llvm.org/D25987).
The example of special case code:
```
void foo(unsigned short *p, int max, int n) {
int i;
unsigned m;
for (i = 0; i < n; i++) {
m = *--p;
*p = (unsigned short)(m >= max ? m-max : 0);
}
}
```
Max in this example is truncated to max_short value, if it is greater than m, or just truncated to 16 bit, if it is not. It is vaid transformation, because if max > max_short, result of the expression will be zero.
Here is the table of types, I try to support, special case items are bold:
| Size | 128 | 256 | 512
| ----- | ----- | ----- | -----
| i8 | v16i8 | v32i8 | v64i8
| i16 | v8i16 | v16i16 | v32i16
| i32 | | **v8i32** | **v16i32**
| i64 | | | **v8i64**
Reviewers: zvi, spatel, DavidKreitzer, RKSimon
Reviewed By: zvi
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D37534
llvm-svn: 315237
E.g. if we have a (xor(overflow-bit), 1) where overflow-bit comes from an
intrinsic like llvm.sadd.with.overflow then we can kill the xor and use the
inverted condition code for the CSEL.
rdar://28495949
Reviewed By: kristof.beyls
Differential Revision: https://reviews.llvm.org/D38160
llvm-svn: 315205
We believe that despite AMD's documentation, that they really do support all 32 comparision predicates under AVX.
Differential Revision: https://reviews.llvm.org/D38609
llvm-svn: 315201
Return the combined shuffle from combineX86ShufflesRecursively and perform the combineTo in the caller.
Makes it easier for future patches to use this in functions that aren't actually shuffles themselves.
llvm-svn: 315195
Summary:
We currently disable some converting of shuffles to MOVSS/MOVSD during legalization if SSE41 is enabled. But later during shuffle combining we go back to prefering MOVSS/MOVSD.
Additionally we have patterns that look for BLENDIs to detect scalar arithmetic operations. I believe due to the combining using MOVSS/MOVSD these are unnecessary.
Interestingly, we still codegen blend instructions even though lowering/isel emit movss/movsd instructions. Turns out machine CSE commutes them to blend, and then commuting those blends back into blends that are equivalent to the original movss/movsd.
This patch fixes the inconsistency in legalization to prefer MOVSS/MOVSD. The one test change was caused by this change. The problem is that we have integer types and are mostly selecting integer instructions except for the shufps. This shufps forced the execution domain, but the vpblendw couldn't have its domain changed with a naive instruction swap. We could fix this by special casing VPBLENDW based on the immediate to widen the element type.
The rest of the patch is removing all the excess scalar patterns.
Long term we should probably add isel patterns to make MOVSS/MOVSD emit blends directly instead of relying on the double commute. We may also want to consider emitting movss/movsd for optsize. I also wonder if we should still use the VEX encoded blendi instructions even with AVX512. Blends have better throughput, and that may outweigh the register constraint.
Reviewers: RKSimon, zvi
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38023
llvm-svn: 315181
Adding the scheduling information for the SkylakeServer (SKX) target.
This patch adds the instruction scheduling information for the SkylakeServer (SKX) architecture target by adding the file X86SchedSkylakeServer.td located under the X86 Target.
We used the scheduling information retrieved from the Skylake architects in order to create the file.
The scheduling information includes latency, number of micro-Ops and used ports by each SKL instruction.
The patch continues the scheduling replacement and insertion effort started with the SNB target in r310792, the HSW target in r311879 and the SkylakeClient (SKL) target in rL313613.
Please expect some performance fluctuations due to code alignment effects.
Reviewers: zvi, RKSimon, craig.topper, chandlerc, aymanmu
Differential Revision: https://reviews.llvm.org/D38443
Change-Id: I5c228fcc09e9e5a99b6116e62b356c4f9b971185
llvm-svn: 315175
After the original commit ([[ https://reviews.llvm.org/rL304088 | rL304088 ]]) was reverted, a discussion in llvm-dev was opened on 'how to accomplish this task'.
In the discussion we concluded that the best way to achieve our goal (which is to automate the folding tables and remove the manually maintained tables) is:
# Commit the tablegen backend disabled by default.
# Proceed with an incremental updating of the manual tables - while checking the validity of each added entry.
# Repeat previous step until we reach a state where the generated and the manual tables are identical. Then we can safely remove the manual tables and include the generated tables instead.
# Schedule periodical (1 week/2 weeks/1 month) runs of the pass:
- if changes appear (new entries):
- make sure the entries are legal
- If they are not, mark them as illegal to folding
- Commit the changes (if there are any).
CMake flag added for this purpose is "X86_GEN_FOLD_TABLES". Building with this flags will run the pass and emit the X86GenFoldTables.inc file under build/lib/Target/X86/ directory which is a good reference for any developer who wants to take part in the effort of completing the current folding tables.
Differential Revision: https://reviews.llvm.org/D38028
llvm-svn: 315173
This attribute will be used in a tablegen backend that generated the X86 memory folding tables which will be added in a future pass.
Instructions with this attribute unset will be excluded from the full set of X86 instructions available for the pass.
Differential Revision: https://reviews.llvm.org/D38027
llvm-svn: 315171
Recognise cases when we can merge the shuffles with their horizontal (HADD/HSUB/PACK) instruction inputs.
Replaces an older implementation which performed some of this during lowering, expanding an existing target shuffle combine stage instead.
Differential Revision: https://reviews.llvm.org/D38506
llvm-svn: 315150
The SjLj intrinsics in the X86 backend are intended for use with
SjLj exception handling as well, since SVN r271244.
Differential Revision: https://reviews.llvm.org/D38532
llvm-svn: 315146
Say you have two identical linkonceodr functions, one in M1 and one in M2.
Say that the outliner outlines A,B,C from one function, and D,E,F from another
function (where letters are instructions). Now those functions are not
identical, and cannot be deduped. Locally to M1 and M2, these outlining
choices would be good-- to the whole program, however, this might not be true!
To mitigate this, this commit makes it so that the outliner sees linkonceodr
functions as unsafe to outline from. It also adds a flag,
-enable-linkonceodr-outlining, which allows the user to specify that they
want to outline from such functions when they know what they're doing.
Changing this handles most code size regressions in the test suite caused by
competing with linker dedupe. It also doesn't have a huge impact on the code
size improvements from the outliner. There are 6 tests that regress > 5% from
outlining WITH linkonceodrs to outlining WITHOUT linkonceodrs. Overall, most
tests either improve or are not impacted.
Not outlined vs outlined without linkonceodrs:
https://hastebin.com/raw/qeguxavuda
Not outlined vs outlined with linkonceodrs:
https://hastebin.com/raw/edepoqoqic
Outlined with linkonceodrs vs outlined without linkonceodrs:
https://hastebin.com/raw/awiqifiheb
Numbers generated using compare.py with -m size.__text. Tests run for AArch64
with -Oz -mllvm -enable-machine-outliner -mno-red-zone.
llvm-svn: 315136
Patch to fix ternlog instructions with a folded
broadcast. The broadcast decorator, e.g. {1toX}, was missing.
Differential Revision: https://reviews.llvm.org/D38649
llvm-svn: 315122
Old expansion was 20 VGPRs, 78 SGPRs and ~380 instructions.
This expansion is 11 VGPRs, 12 SGPRs and ~120 instructions.
Passes OpenCL conformance test_integer_ops quick_[u]long_math
Differential Revision: https://reviews.llvm.org/D38607
llvm-svn: 315081
Unfortunately TableGen doesn't handle this yet:
Unable to deduce gMIR opcode to handle Src (which is a leaf).
Just add some temporary hand-written code to generate the proper MOVsr.
llvm-svn: 315071
The machine scheduler (before register allocation) is enabled by default for
SystemZ.
The SelectionDAG scheduling preference now becomes source order scheduling
(was regpressure).
Review: Ulrich Weigand
https://reviews.llvm.org/D37977
llvm-svn: 315063
Add extending loads and constant offset patterns
A bit more refactoring of the tablegen to make the patterns fairly nice and
uniform between the regular and atomic loads.
Differential Revision: https://reviews.llvm.org/D38523
llvm-svn: 315022
The new format is changeAddrMode_xx_yy, where xx is the current mode,
and yy is the new one.
Old name: New name:
getBaseWithImmOffset changeAddrMode_abs_io
getAbsoluteForm changeAddrMode_io_abs
getBaseWithRegOffset changeAddrMode_io_rr
xformRegToImmOffset changeAddrMode_rr_io
getBaseWithLongOffset changeAddrMode_rr_ur
getRegShlForm changeAddrMode_ur_rr
llvm-svn: 315013
Sink the insertion of "pop ebp" out of the frame size calculation
branches. They all check for HasFP.
Our handling of CLEANUPRET and CATCHRET was equivalent, both are
funclets and use the same frame size. We can eliminate the CLEANUPRET
case.
Hoist the hasFP(MF) query into a local bool.
Rename TargetMBB to CatchRetTarget to be more descriptive.
Eliminate the Optional<unsigned> RetOpcode local, now that it has one
use.
It's only a net savings of 10 lines, but hopefully it's *slightly* more
readable.
llvm-svn: 315000
Implement .set dspr2 directive with appropriate feature bits. This
directive is a counterpart of -mattr=dspr2 command line option with the
exception that it does not influence elf header flags.
Patch by Milos Stojanovic.
Differential Revision: https://reviews.llvm.org/D38537
llvm-svn: 314994
The old algoritm was not correct, although it worked most of the time.
Avoid the complex reachability analysis and simply calculate the maximal
registers out of the set of all referenced registers.
llvm-svn: 314991
The code which lowers BUILD_VECTOR of consecutive loads into a single vector
load doesn't update chains properly. As a result the vector load can be
reordered with the store to the same location.
The current code in EltsFromConsecutiveLoads only updates the chain following
the first load. The fix is to update the chains following all the loads
comprising the vector.
This is a fix for PR10114.
Reviewed By: niravd
Differential Revision: https://reviews.llvm.org/D38547
llvm-svn: 314988
Previously, instructions that were defined to use the FGR64 register class
were associated with the Mips64 table which was incorrect.
Reviewers: nitesh.jain, atanasyan
Differential Revision: https://reviews.llvm.org/D38454
llvm-svn: 314976
This is a follow-up to https://reviews.llvm.org/D38138.
I fixed the capitalization of some functions because we're changing those
lines anyway and that helped verify that we weren't accidentally dropping
any options by using default param values.
llvm-svn: 314930
It broke the Chromium / SQLite build; see PR34830.
> Summary:
> 1/ Operand folding during complex pattern matching for LEAs has been
> extended, such that it promotes Scale to accommodate similar operand
> appearing in the DAG.
> e.g.
> T1 = A + B
> T2 = T1 + 10
> T3 = T2 + A
> For above DAG rooted at T3, X86AddressMode will no look like
> Base = B , Index = A , Scale = 2 , Disp = 10
>
> 2/ During OptimizeLEAPass down the pipeline factorization is now performed over LEAs
> so that if there is an opportunity then complex LEAs (having 3 operands)
> could be factored out.
> e.g.
> leal 1(%rax,%rcx,1), %rdx
> leal 1(%rax,%rcx,2), %rcx
> will be factored as following
> leal 1(%rax,%rcx,1), %rdx
> leal (%rdx,%rcx) , %edx
>
> 3/ Aggressive operand folding for AM based selection for LEAs is sensitive to loops,
> thus avoiding creation of any complex LEAs within a loop.
>
> Reviewers: lsaba, RKSimon, craig.topper, qcolombet, jmolloy
>
> Reviewed By: lsaba
>
> Subscribers: jmolloy, spatel, igorb, llvm-commits
>
> Differential Revision: https://reviews.llvm.org/D35014
llvm-svn: 314919
This patch redefines the MOVSS/MOVSD instructions to take VR128 as its second input. This allows the MOVSS/SD->BLEND commute to work without requiring a COPY to be inserted.
This should fix PR33079
Overall this looks to be an improvement in the generated code. I haven't checked the EXPENSIVE_CHECKS build but I'll do that and update with results.
Differential Revision: https://reviews.llvm.org/D38449
llvm-svn: 314914
Early out from vector shift by immediates that will exceed eltsize - don't bother making an unnecessary ComputeNumSignBits recursive call.
llvm-svn: 314903
Previously, on long branches (relative jumps of >4 kB), an assertion
failure was hit, as AVRInstrInfo::insertIndirectBranch was not
implemented. Despite its name, it is called by the branch relaxator
for *all* unconditional jumps.
Patch by Thomas Backman.
llvm-svn: 314891
In some cases, the code generator attempts to generate instructions such as:
lddw r24, Y+63
which expands to:
ldd r24, Y+63
ldd r25, Y+64 # Oops! This is actually ld r25, Y in the binary
This commit limits the first offset to 62, and thus the second to 63.
It also updates some asserts in AVRExpandPseudoInsts.cpp, including for
INW and OUTW, which appear to be unused.
Patch by Thomas Backman.
llvm-svn: 314890
This adds diagnostics for invalid immediate operands to the MOVW and MOVT
instructions (ARM and Thumb).
Differential revision: https://reviews.llvm.org/D31879
llvm-svn: 314888
Currently, our diagnostics for assembly operands are not consistent.
Some start with (for example) "immediate operand must be ...",
and some with "operand must be an immediate ...". I think the latter
form is preferable for a few reasons:
* It's unambiguous that it is referring to the expected type of operand, not
the type the user provided. For example, the user could provide an register
operand, and get a message taking about an operand is if it is already an
immediate, just not in the accepted range.
* It allows us to have a consistent style once we add diagnostics for operands
that could take two forms, for example a label or pc-relative memory operand.
Differential revision: https://reviews.llvm.org/D36689
llvm-svn: 314887
Summary:
1/ Operand folding during complex pattern matching for LEAs has been
extended, such that it promotes Scale to accommodate similar operand
appearing in the DAG.
e.g.
T1 = A + B
T2 = T1 + 10
T3 = T2 + A
For above DAG rooted at T3, X86AddressMode will no look like
Base = B , Index = A , Scale = 2 , Disp = 10
2/ During OptimizeLEAPass down the pipeline factorization is now performed over LEAs
so that if there is an opportunity then complex LEAs (having 3 operands)
could be factored out.
e.g.
leal 1(%rax,%rcx,1), %rdx
leal 1(%rax,%rcx,2), %rcx
will be factored as following
leal 1(%rax,%rcx,1), %rdx
leal (%rdx,%rcx) , %edx
3/ Aggressive operand folding for AM based selection for LEAs is sensitive to loops,
thus avoiding creation of any complex LEAs within a loop.
Reviewers: lsaba, RKSimon, craig.topper, qcolombet, jmolloy
Reviewed By: lsaba
Subscribers: jmolloy, spatel, igorb, llvm-commits
Differential Revision: https://reviews.llvm.org/D35014
llvm-svn: 314886
The previous version didn't work if the jump table base address didn't
fit in 32 bit, since it was encoded as an immediate offset. And in case
the jump table is encoded as 32 bit label differences, we need to
load and add them to the table base first.
This solves the first half of the issues mentioned in PR34720.
Also fix some of the errors pointed out by -verify-machineinstrs, by
using GR32_NOSPRegClass.
Differential Revision: https://reviews.llvm.org/D38333
llvm-svn: 314876
Summary:
After r308422 we defer optimizations that can destroy loop canonical forms to
LateSimplifyCFG. Running LateSimplifyCFG after expanding atomic operations
can exploit more control-flow opportunities.
Reviewers: mcrosier, t.p.northover, efriedma
Reviewed By: efriedma
Subscribers: aemerson, rengolin, javed.absar, llvm-commits, kristof.beyls
Differential Revision: https://reviews.llvm.org/D38262
llvm-svn: 314857
This was dead when it landed in r252578. We have this functionality, if
not for stack probe calls, but for regular calls in
X86CallFrameOptimization.cpp.
llvm-svn: 314845
Summary:
For the amdpal OS type:
We write an AMDGPU_PAL_METADATA record in the .note section in the ELF
(or as an assembler directive). It contains key=value pairs of 32 bit
ints. It is a merge of metadata from codegen of the shaders, and
metadata provided by the frontend as _amdgpu_pal_metadata IR metadata.
Where both sources have a key=value with the same key, the two values
are ORed together.
This .note record is part of the amdpal ABI and will be documented in
docs/AMDGPUUsage.rst in a future commit.
Eventually the amdpal OS type will stop generating the .AMDGPU.config
section once the frontend has safely moved over to using the .note
records above instead of .AMDGPU.config.
Reviewers: arsenm, nhaehnle, dstuttard
Subscribers: kzhuravl, wdng, yaxunl, llvm-commits, t-tye
Differential Revision: https://reviews.llvm.org/D37753
llvm-svn: 314829