These are the operations that are trivially identical. Division is omitted for
now because you need to use the correct sign/zero extension.
llvm-svn: 277775
Adding missing tests for OCL type names for half, float, double, char, short, long, and unknown.
Patch by Aaron En Ye Shi.
Differential Revision: https://reviews.llvm.org/D22964
llvm-svn: 277759
Previously, FastISel for WebAssembly wasn't checking the return value of
`getRegForValue` in certain cases, which would generate instructions
referencing NoReg. This patch fixes this behavior.
Patch by Dominic Chen
Differential Revision: https://reviews.llvm.org/D23100
llvm-svn: 277742
Summary:
TargetBaseAlign is no longer required since LSV checks if target allows misaligned accesses.
A constant defining a base alignment is still needed for stack accesses where alignment can be adjusted.
Previous patch (D22936) was reverted because tests were failing. This patch also fixes the cause of those failures:
- x86 failing tests either did not have the right target, or the right alignment.
- NVPTX failing tests did not have the right alignment.
- AMDGPU failing test (merge-stores) should allow vectorization with the given alignment but the target info
considers <3xi32> a non-standard type and gives up early. This patch removes the condition and only checks
for a maximum size allowed and relies on the next condition checking for %4 for correctness.
This should be revisited to include 3xi32 as a MVT type (on arsenm's non-immediate todo list).
Note that checking the sizeInBits for a MVT is undefined (leads to an assertion failure),
so we need to create an EVT, hence the interface change in allowsMisaligned to include the Context.
Reviewers: arsenm, jlebar, tstellarAMD
Subscribers: jholewinski, arsenm, mzolotukhin, llvm-commits
Differential Revision: https://reviews.llvm.org/D23068
llvm-svn: 277735
On modern Intel processors hardware SQRT in many cases is faster than RSQRT
followed by Newton-Raphson refinement. The patch introduces a simple heuristic
to choose between hardware SQRT instruction and Newton-Raphson software
estimation.
The patch treats scalars and vectors differently. The heuristic is that for
scalars the compiler should optimize for latency while for vectors it should
optimize for throughput. It is based on the assumption that throughput bound
code is likely to be vectorized.
Basically, the patch disables scalar NR for big cores and disables NR completely
for Skylake. Firstly, scalar SQRT has shorter latency than NR code in big cores.
Secondly, vector SQRT has been greatly improved in Skylake and has better
throughput compared to NR.
Differential Revision: https://reviews.llvm.org/D21379
llvm-svn: 277725
Enable tail calls by default for (micro)MIPS(64).
microMIPS is slightly more tricky than doing it for MIPS(R6) or microMIPSR6.
microMIPS has two instruction encodings: 16bit and 32bit along with some
restrictions on the size of the instruction that can fill the delay slot.
For safe tail calls for microMIPS, the delay slot filler attempts to find
a correct size instruction for the delay slot of TAILCALL pseudos.
Reviewers: dsanders, vkalintris
Subscribers: jfb, dsanders, sdardis, llvm-commits
Differential Revision: https://reviews.llvm.org/D21138
llvm-svn: 277708
This should ensure that we can atomically write two bytes (on top of the
retq and the one past it) and have those two bytes not straddle cache
lines.
We also move the label past the alignment instruction so that we can refer
to the actual first instruction, as opposed to potential padding before the
aligned instruction.
Update the tests to allow us to reflect the new order of assembly.
Reviewers: rSerge, echristo, majnemer
Subscribers: llvm-commits, mehdi_amini
Differential Revision: https://reviews.llvm.org/D23101
llvm-svn: 277701
This patch fixes pr25548.
Current implementation of PPCBoolRetToInt doesn't handle CallInst correctly, so it failed to do the intended optimization when there is a CallInst with parameters. This patch fixed that.
llvm-svn: 277655
We currently only support combining target shuffles that consist of a single source input (plus elements known to be undef/zero).
This patch generalizes the recursive combining of the target shuffle to collect all the inputs, merging any duplicates along the way, into a full set of src ops and its shuffle mask.
We uncover a number of cases where we have failed to combine a unary shuffle because the input has been duplicated and separated during lowering.
This will allow us to combine to 2-input shuffles in a future patch.
Differential Revision: https://reviews.llvm.org/D22859
llvm-svn: 277631
Summary: Thumb2 supports encoding immediates with specific patterns into mov.w by splatting the low 8 bits into other bytes.
Reviewers: john.brawn, jmolloy
Subscribers: jmolloy, aemerson, rengolin, samparker, llvm-commits
Differential Revision: https://reviews.llvm.org/D23090
llvm-svn: 277610
When the same base address is used to load two different data types, LSR
would assume a memory type of "void". This type is not sized and has no
alignment information. Checking for it causes a crash.
llvm-svn: 277601
Summary:
We also add a test to show what currently happens when we create a
section per function and emit an xray_instr_map. This illustrates the
relationship (or lack thereof) between the per-function section and the
xray_instr_map section.
We also change the code generation slightly so that we don't always
create group sections, but rather only do so if a function where the
table is associated with is in a group.
Also in this change:
- Remove the "merge" flag on the xray_instr_map section.
- Test that we're generating the right table for comdat and non-comdat functions.
Reviewers: echristo, majnemer
Subscribers: llvm-commits, mehdi_amini
Differential Revision: https://reviews.llvm.org/D23104
llvm-svn: 277580
In this particular example we wouldn't want the smmls anyway (the value is
actually unused), but in general smmls does not provide the required flags
register so if that SUBE result is used we can't replace it.
llvm-svn: 277541
We were relying on the misleadingly-names $status result to actually be the
status. Actually it's just a scratch register that may or may not be valid (and
is the inverse of the real ststus anyway). Success can be determined by
comparing the value loaded against the one we wanted to see for "cmpxchg
strong" loops like this.
Should fix PR28819.
llvm-svn: 277513
Summary:
Two types of stores are possible in pixel shaders: stores to memory that are
explicitly requested at the API level, and stores that are an implementation
detail of register spilling or lowering of arrays.
For the first kind of store, we must ensure that helper pixels have no effect
and hence WQM must be disabled. The second kind of store must always be
executed, because the written value may be loaded again in a way that is
relevant for helper pixels as well -- and there are no externally visible
effects anyway.
This is a candidate for the 3.9 release branch.
Reviewers: arsenm, tstellarAMD, mareko
Subscribers: arsenm, kzhuravl, llvm-commits
Differential Revision: https://reviews.llvm.org/D22675
llvm-svn: 277504
Summary:
There are cases where uniform branch conditions are computed in VGPRs, and
we didn't correctly mark those as WQM.
The stray change in basic-branch.ll is because invoking the LiveIntervals
analysis leads to the detection of a dead register that would otherwise not
be seen at -O0.
This is a candidate for the 3.9 branch, as it fixes a possible hang.
Reviewers: arsenm, tstellarAMD, mareko
Subscribers: arsenm, llvm-commits, kzhuravl
Differential Revision: https://reviews.llvm.org/D22673
llvm-svn: 277500
Identify patterns where the address is aligned to an 8-byte boundary,
but both the base address and the constant offset are both proper
multiples of 4. In such cases, extract Base+4 into a separate instruc-
tion, and use S2_storerd_io, instead of using S4_storerd_rr.
llvm-svn: 277497
Recommitting after fixing overaggressive fastpath return in parsing.
Fix intel syntax special case identifier operands that refer to a constant
(e.g. .set <ID> n) to be interpreted as immediate not memory in parsing.
Associated commit to fix clang test commited shortly.
Reviewers: rnk
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D22585
llvm-svn: 277489
We currently use and test these, and select most of them. Mark them
as legal even though we don't go through the full ir->asm flow yet.
This doesn't currently have standalone tests, but the verifier will
soon learn to check that the regbankselect/select tests are legal.
llvm-svn: 277471
Added (sra (shl x, 16), 16) to the sext_16_node PatLeaf for ARM to
simplify some pattern matching. This has allowed several patterns
for smul* and smla* to be removed as well as making it easier to add
the matching for the corresponding instructions for Thumb2 targets.
Also added two Pat classes that are predicated on Thumb2 with the
hasDSP flag and UseMulOps flags. Updated the smul codegen test with
the wider range of patterns plus the ThumbV6 and ThumbV6T2 targets.
Differential Revision: https://reviews.llvm.org/D22908
llvm-svn: 277450
These changes update the schedule model for the P5600 and includes the
rest of the MSA and MIPS32R5 instruction sets.
Reviewers: dsanders, vkalintris
Differential Revision: https://reviews.llvm.org/D21835
llvm-svn: 277441
Summary:
Commit 276701 requires that targets have the DSP extensions to use
certain saturating instructions. This requires some corrections.
For ARM ISA the instructions in question are available in all v6*
architectures.
For Thumb2, the instructions in question are available from v6T2.
SSAT and USAT are part of the base architecture while SSAT16 and
USAT16 require the DSP extensions.
Reviewers: rengolin
Subscribers: aemerson, rengolin, samparker, llvm-commits
Differential Revision: https://reviews.llvm.org/D23010
llvm-svn: 277439
Summary: This patch implements CFI for WebAssembly. It modifies the
LowerTypeTest pass to pre-assign table indexes to functions that are
called indirectly, and lowers type checks to test against the
appropriate table indexes. It also modifies the WebAssembly backend to
support a special ".indidx" assembly directive that propagates the table
index assignments out to the linker.
Patch by Dominic Chen
Differential Revision: https://reviews.llvm.org/D21768
llvm-svn: 277398
Summary: This patch includes asm.js-style exception handling support for
WebAssembly. The WebAssembly MVP does not have any support for
unwinding or non-local control flow. In order to support C++ exceptions,
emscripten currently uses JavaScript exceptions along with some support
code (written in JavaScript) that is bundled by emscripten with the
generated code.
This scheme lowers exception-related instructions for wasm such that
wasm modules can be compatible with emscripten's existing scheme and
share the support code.
Patch by Heejin Ahn
Differential Revision: https://reviews.llvm.org/D22958
llvm-svn: 277391
Scavenging slots were only reserved when pseudo-instruction expansion in
frame lowering created new virtual registers. It is possible to still
need a scavenging slot even if no virtual registers were created, in cases
where the stack is large enough to overflow instruction offsets.
llvm-svn: 277355
Summary:
Allocating an AFGR64 shadows two GPR32's instead of just one.
This fixes an LNT regression detected by our internal buildbots.
Reviewers: sdardis
Subscribers: dsanders, sdardis, llvm-commits
Differential Revision: https://reviews.llvm.org/D23012
llvm-svn: 277348
The branch relaxation pass is computing the wrong offsets because it assumes
TLSDESC_CALLSEQ eats up 4 bytes, when in fact it is lowered to an instruction
sequence taking up 16 bytes. This can become a problem in huge files with lots
of TLS accesses, as it may slowly move branch targets out of the range computed
by the branch relaxation pass.
Fixes PR24234 https://llvm.org/bugs/show_bug.cgi?id=24234
Differential Revision: https://reviews.llvm.org/D22870
llvm-svn: 277331
Initialize all AArch64-specific passes in the TargetMachine so they can be run
by llc. This can lead to conflicts in opt with some command line options that
share the same name as the pass, so I took this opportunity to do some cleanups:
* rename all relevant command line options from "aarch64-blah" to
"aarch64-enable-blah" and update the tests accordingly
* run clang-format on their declarations
* move all these declarations to a common place (the TargetMachine) as opposed
to having them scattered around (AArch64BranchRelaxation and
AArch64AddressTypePromotion were the only offenders)
llvm-svn: 277322
As discussed on PR14593, this patch adds support for lowering to SHLD/SHRD from the patterns generated by DAGTypeLegalizer::ExpandShiftWithKnownAmountBit.
Differential Revision: https://reviews.llvm.org/D23000
llvm-svn: 277299
Removed AssertZext node, which was inserted between X86ISD::SETCC and "truncate to i1".
Differential Revision: https://reviews.llvm.org/D22850
llvm-svn: 277289
These come in two variants for now: G_INTRINSIC and G_INTRINSIC_W_SIDE_EFFECTS.
We may decide to split the latter up with finer-grained restrictions later, if
necessary.
llvm-svn: 277224
Up until now, we only had code to match PSADBW patterns that look like what
comes out of the loop vectorizer - a partial reduction inside the loop body
that gets fed into a horizontal operation in a different basic block.
This adds support for straight-line patterns, like those generated by the
SLP vectorizer.
Differential Revision: https://reviews.llvm.org/D22889
llvm-svn: 277219
Support for lowering to VBROADCASTF128 etc. in D22460 was not correctly ensuring that the only users of the 128-bit vector load were the insertions of the vector into the lower/upper subvectors.
llvm-svn: 277214
The DAG combiner tries to merge stores to adjacent vector wide memory
locations by creating stores which are integral multiples of the vector
width. Discourage this by informing it that this is slow. This should
not affect legalization passes, because all of them ignore the "Fast"
argument.
Patch by Pranav Bhandarkar.
llvm-svn: 277178
For MachineInstrBuilder, having to manually use RegState::Define is ugly and
makes register definitions clunkier than they need to be, so this adds two
convenience functions: addDef and addUse.
For MachineIRBuilder, we want to avoid BuildMI's first-reg-is-def rule because
it's hidden away and causes bugs. So this patch switches buildInstr to
returning a MachineInstrBuilder and adding *all* operands via addDef/addUse.
NFC.
llvm-svn: 277176
Mostly straightforward as we ignore addressing modes and just
use the base + unsigned immediate offset (always 0) variants.
This currently fails to select extloads because we have yet to
agree on a representation.
llvm-svn: 277171
Software pipelining is an optimization for improving ILP by
overlapping loop iterations. Swing Modulo Scheduling (SMS) is
an implementation of software pipelining that attempts to
reduce register pressure and generate efficient pipelines with
a low compile-time cost.
This implementaion of SMS is a target-independent back-end pass.
When enabled, the pass should run just prior to the register
allocation pass, while the machine IR is in SSA form. If the pass
is successful, then the original loop is replaced by the optimized
loop. The optimized loop contains one or more prolog blocks, the
pipelined kernel, and one or more epilog blocks.
This pass is enabled for Hexagon only. To enable for other targets,
a couple of target specific hooks must be implemented, and the
pass needs to be called from the target's TargetMachine
implementation.
Differential Review: http://reviews.llvm.org/D16829
llvm-svn: 277169
If the mask of a vector shuffle has alternating odd or even numbers
starting with 1 or 0 respectively up to the largest possible index
for the given type in the given HVX mode (single of double) we can
generate vpacko or vpacke instruction respectively.
E.g.
%42 = shufflevector <32 x i16> %37, <32 x i16> %41,
<32 x i32> <i32 1, i32 3, ..., i32 63>
is %42.h = vpacko(%41.w, %37.w)
Patch by Pranav Bhandarkar.
llvm-svn: 277168
Rebalances address calculation trees and applies Hexagon-specific
optimizations to the trees to improve instruction selection.
Patch by Tobias Edler von Koch.
llvm-svn: 277151
The post register allocator scheduler can generate poor schedules
because the scoreboard hazard recognizer is unable to identify
hazards for Hexagon precisely. Instead, Hexagon should use a DFA
based hazard recognizer.
Patch by Brendon Cahoon.
llvm-svn: 277143
Summary:
Implements fastLowerArguments() to avoid the need to fall back on
SelectionDAG for 0-4 argument functions that don't do tricky things like
passing double in a pair of i32's.
This allows us to move all except one test to -fast-isel-abort=3. The
remaining one has function prototypes of the form 'i32 (i32, double, double)'
which requires floats to be passed in GPR's.
The previous commit had an uninitialized variable that caused the incoming
argument region to have undefined size. This has been fixed.
Reviewers: sdardis
Subscribers: dsanders, llvm-commits, sdardis
Differential Revision: https://reviews.llvm.org/D22680
llvm-svn: 277136
We currently default to using either generic shuffles or MASK+PACKUS/PACKSS to truncate all integer vectors. For vector comparisons, we know that the result will be either all or zero bits in every element, which can be efficiently truncated by directly using PACKSS to repeatedly halve the size of each element.
Due to the limited input values (-1 or 0) we don't need to account for vector element size, so for simplicity we just use the PACKSS(vXi16,vXi16) implementation in all cases. Additionally for AVX2 PACKSS of 256bit data we must perform a PERMQ shuffle to reorder the data into the correct order. I did investigate performing a single shuffle after all the PACKSS calls but the need to cross 128bit lanes makes this difficult to achieve efficiently.
We avoid performing this on AVX512 as it should have better alternative truncation instructions.
Differential Revision: https://reviews.llvm.org/D22814
llvm-svn: 277132
Summary:
The MOV/MOVT instructions being chosen for struct_byval predicates was
conditional only on Thumb2, resulting in an ARM MOV/MOVT instruction
being incorrectly emitted in Thumb1 mode. This is especially apparent
with v8-m.base targets. This patch ensures that Thumb instructions are
emitted in both Thumb modes.
Reviewers: rengolin, t.p.northover
Subscribers: llvm-commits, aemerson, rengolin
Differential Revision: https://reviews.llvm.org/D22865
llvm-svn: 277128
This adds a target hook getInstSizeInBytes to TargetInstrInfo that a lot of
subclasses already implement.
Differential Revision: https://reviews.llvm.org/D22885
llvm-svn: 277126
A ConstantVector can have ConstantExpr operands and vice versa.
However, the folder had no ability to fold ConstantVectors which, in
some cases, was an optimization barrier.
Instead, rephrase the folder in terms of Constants instead of
ConstantExprs and teach callers how to deal with failure.
llvm-svn: 277099
I'm not convinced the patterns for the rm_Int was correct anyway. It had a tied source that should't exist for the unmasked version. The load form of MOVSS always zeros the most significant bits. I've left the patterns off the masked load instructions as I'm not sure what the correct pattern should be and we don't have any tests currently. Nor do we implement masked scalar load intrinsics in clang currently.
llvm-svn: 277098
Normally, CFI instructions should be inserted after allocframe, but
if allocframe is in the same packet with a call, the CFI instructions
should be inserted before that packet.
llvm-svn: 277020
LLT() has a particular meaning: it's one invalid type. But we really
want selected instructions to have no type whatsoever.
Also verify that types don't linger after ISel, and enable the verifier
on the AArch64 select test.
llvm-svn: 277001
Summary:
Implements fastLowerArguments() to avoid the need to fall back on
SelectionDAG for 0-4 argument functions that don't do tricky things like
passing double in a pair of i32's.
This allows us to move all except one test to -fast-isel-abort=3. The
remaining one has function prototypes of the form 'i32 (i32, double, double)'
which requires floats to be passed in GPR's.
Reviewers: sdardis
Subscribers: dsanders, llvm-commits, sdardis
Differential Revision: https://reviews.llvm.org/D22680
llvm-svn: 276982
Summary:
We were using reserved VGPRs for SGPR spilling and this was causing
some programs with a workgroup size of 1024 to use more than 64
registers, which is illegal.
Reviewers: arsenm, mareko, nhaehnle
Subscribers: nhaehnle, arsenm, llvm-commits, kzhuravl
Differential Revision: https://reviews.llvm.org/D22032
llvm-svn: 276980
Summary:
SI_ELSE is lowered into two parts:
s_or_saveexec_b64 dst, src (at the start of the basic block)
s_xor_b64 exec, exec, dst (at the end of the basic block)
The idea is that dst contains the exec mask of the preceding IF block. It can
happen that SIWholeQuadMode decides to switch from WQM to Exact mode inside
the basic block that contains SI_ELSE, in which case it introduces an instruction
s_and_b64 exec, exec, s[...]
which masks out bits that can correspond to both the IF and the ELSE paths.
So the resulting sequence must be:
s_or_savexec_b64 dst, src
s_and_b64 exec, exec, s[...] <-- added by SIWholeQuadMode
s_and_b64 dst, dst, exec <-- added by SILowerControlFlow
s_xor_b64 exec, exec, dst
Whether to add the additional s_and_b64 dst, dst, exec is currently determined
via the ExecModified tracking. With this change, it is instead determined by
an additional flag on SI_ELSE which is set by SIWholeQuadMode.
Finally: It also occured to me that an alternative approach for the long run
is for SILowerControlFlow to unconditionally emit
s_or_saveexec_b64 dst, src
...
s_and_b64 dst, dst, exec
s_xor_b64 exec, exec, dst
and have a pass that detects and cleans up the "redundant AND with exec"
pattern where possible. This could be useful anyway, because we also add
instructions
s_and_b64 vcc, exec, vcc
before s_cbranch_scc (in moveToALU), and those are often redundant. I have
some pending changes to how KILL is lowered that could also benefit from
such a cleanup pass.
In any case, this current patch could help in the short term with the whole
ExecModified business.
Reviewers: tstellarAMD, arsenm
Subscribers: arsenm, llvm-commits, kzhuravl
Differential Revision: https://reviews.llvm.org/D22846
llvm-svn: 276972
Add unittest to {ARM | AArch64}TargetParser,and by the way correct problems as below:
1.Correct a incorrect indexing problem in AArch64TargetParser. The architecture enumeration
is shared across ARM and AArch64 in original implementation.But In the code,I just used the
index which was offset by the ARM, and this would index into the array incorrectly. To make
AArch64 has its own arch enum,or we will do a lot of slowly iterating.
2.Correct a spelling error. The parameter of llvm::AArch64::getArchExtName.
3.Correct a writing mistake, in llvm::ARM::parseArchISA.
Differential Revision: https://reviews.llvm.org/D21785
llvm-svn: 276957
Before adding a new preheader block, check if there is a candidate block
where the loop setup could be placed speculatively. This will be off by
default.
llvm-svn: 276919
Fix intel syntax special case identifier operands that refer to a constant
(e.g. .set <ID> n) to be interpreted as immediate not memory in parsing.
Reviewers: rnk
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D22585
llvm-svn: 276895
The callee-saved registers that are saved in a function are not pristine,
and so they can be defined and used. In case of shrink-wrapping though,
there are blocks that are outside of the save/restore range, and in those
blocks the saved registers must be treated as pristine. To avoid any uses
of these registers, add them as live-in in all those blocks.
This was already done for blocks reaching function exits after restore,
add code that does the same for blocks reached from the function entry
before save.
llvm-svn: 276886
TargetOptions wants the ExceptionHandling enum. Move that to
MCTargetOptions.h to avoid transitively including Dwarf.h everywhere in
clang. Now you can add a DWARF tag without a full rebuild of clang
semantic analysis.
llvm-svn: 276883
Summary:
This is one possible solution to the problem of ignoring constraints that Simon
raised in D21473 but it's a bit of a hack.
The integrated assembler currently ignores violations of the tied register
constraints when the operands involved in a tie are both present in the AsmText.
For example, 'dati $rs, $rt, $imm' with the '$rs = $rt' will silently replace
$rt with $rs. So 'dati $2, $3, 1' is processed as if the user provided
'dati $2, $2, 1' without any diagnostic being emitted.
This is difficult to solve properly because there are multiple parts of the
matcher that are silently forcing these constraints to be met. Tied operands are
rendered to instructions by cloning previously rendered operands but this is
unnecessary because the matcher was already instructed to render the operand it
would have cloned. This is also unnecessary because earlier code has already
replaced the MCParsedOperand with the one it was tied to (so the parsed input
is matched as if it were 'dati <RegIdx 2>, <RegIdx 2>, <Imm 1>'). As a result,
it looks like fixing this properly amounts to a rewrite of the tied operand
handling which affects all targets.
This patch however, merely inserts a checking hook just before the
substitution of MCParsedOperands and the Mips target overrides it. It's not
possible to accurately check the registers are the same this early (because
numeric registers haven't been bound to a register class yet) so it cheats a
bit and checks that the tokens that produced the operand are lexically
identical. This works because tied registers need to have the same register
class but it does have a flaw. It will reject 'dati $4, $a0, 1' for violating
the constraint even though $a0 ends up as the same register as $4.
Reviewers: sdardis
Subscribers: dsanders, llvm-commits, sdardis
Differential Revision: https://reviews.llvm.org/D21994
llvm-svn: 276867
Avoid implicit conversions from MachineInstrBundleIterator to
MachineInstr* in the PowerPC backend, mainly by preferring MachineInstr&
over MachineInstr* when a pointer isn't nullable and using range-based
for loops.
There was one piece of questionable code in PPCInstrInfo::AnalyzeBranch,
where a condition checked a pointer converted from an iterator for
nullptr. Since this case is impossible (moreover, the code above
guarantees that the iterator is valid), I removed the check when I
changed the pointer to a reference.
Despite that case, there should be no functionality change here.
llvm-svn: 276864
Currently, for ARMCOFFMCAsmInfoMicrosoft, no comment character is set, thus the
idefault, '#', is used.
The hash character doesn't work as comment character in ARM assembly, since '#'
is used for immediate values.
The comment character is set to ';', which is the comment character used by MS
armasm.exe. (The microsoft armasm.exe uses a different directive syntax than
what LLVM currently supports though, similar to ARM's armasm.)
This allows inline assembly with immediate constants to be built (and brings the
assembly output from clang -S closer to being possible to assemble).
A test is added that verifies that ';' is correctly interpreted as comments in
this mode, and verifies that assembling code that includes literal constants
with a '#' works.
Patch by Martin Storsjö.
llvm-svn: 276859
Consider this case:
vreg1 = A2_zxth vreg0 (1)
...
vreg2 = A2_zxth vreg1 (2)
Redundant instruction elimination could delete the instruction (1)
because the user (2) only cares about the low 16 bits. Then it could
delete (2) because the input is already zero-extended. The problem
is that the properties allowing each individual instruction to be
deleted depend on the existence of the other instruction, so either
one can be deleted, but not both.
The existing check for this situation in RIE was insufficient. The
fix is to update all dependent cells when an instruction is removed
(replaced via COPY) in RIE.
llvm-svn: 276792
ABIArgOffset is a problem because properly fsetting the
KernArgSize requires that the reserved area before the
real kernel arguments be correctly aligned, which requires
fixing clover.
llvm-svn: 276766
When the packetizer wants to put a store to a stack slot in the same
packet with an allocframe, it updates the store offset to reflect the
value of SP before it is updated by allocframe. If the store cannot
be packetized with the allocframe after all, the offset needs to be
updated back to the previous value.
llvm-svn: 276749
- More informative message when extension name is not an identifier token.
- Stop parsing directive if extension is unknown (avoid duplicate error
messages).
- Report unsupported extensions with a source location, rather than
report_fatal_error.
Differential Revision: https://reviews.llvm.org/D22806
llvm-svn: 276748
This option, compatible with gas's -mimplicit-it, controls the
generation/checking of implicit IT blocks in ARM/Thumb assembly.
This option allows two behaviours that were not possible before:
- When in ARM mode, emit a warning when assembling a conditional
instruction that is not in an IT block. This is enabled with
-mimplicit-it=never and -mimplicit-it=thumb.
- When in Thumb mode, automatically generate IT instructions when an
instruction with a condition code appears outside of an IT block. This
is enabled with -mimplicit-it=thumb and -mimplicit-it=always.
The default option is -mimplicit-it=arm, which matches the existing
behaviour (allow conditional ARM instructions outside IT blocks without
warning, and error if a conditional Thumb instruction is outside an IT
block).
The general strategy for generating IT blocks in Thumb mode is to keep a
small list of instructions which should be in the IT block, and only
emit them when we encounter something in the input which means we cannot
continue the block. This could be caused by:
- A non-predicable instruction
- An instruction with a condition not compatible with the IT block
- The IT block already contains 4 instructions
- A branch-like instruction (including ALU instructions with the PC as
the destination), which cannot appear in the middle of an IT block
- A label (branching into an IT block is not legal)
- A change of section, architecture, ISA, etc
- The end of the assembly file.
Some of these, such as change of section and end of file, are parsed
outside of the ARM asm parser, so I've added a new virtual function to
AsmParser to ensure any previously-parsed instructions have been
emitted. The ARM implementation of this flushes the currently pending IT
block.
We now have to try instruction matching up to 3 times, because we cannot
know if the current IT block is valid before matching, and instruction
matching changes depending on the IT block state (due to the 16-bit ALU
instructions, which set the flags iff not in an IT block). In the common
case of not having an open implicit IT block and the instruction being
matched not needing one, we still only have to run the matcher once.
I've removed the ITState.FirstCond variable, because it does not store
any information that isn't already represented by CurPosition. I've also
updated the comment on CurPosition to accurately describe it's meaning
(which this patch doesn't change).
Differential Revision: https://reviews.llvm.org/D22760
llvm-svn: 276747
Fixed typo in the intrinsic definitions of (v)cvtsd2ss with memory folding.
This was only unearthed when rL276102 started using the intrinsic again.....
llvm-svn: 276740
MIPS64R6 compact branch support. As the MIPS LLVM backend uses distinct
MachineInstrs for certain 32 and 64 bit instructions (e.g. BEQ & BEQ64) that
map to the same instruction, extend compact branch support for the
corresponding 64bit branches.
Reviewers: dsanders
Differential Revision: https://reviews.llvm.org/D20164
llvm-svn: 276739
Add the instruction alias sgtu (register form only), two operand forms of
s[rl]l and sra, and missing single/two operand forms of dnegu/neg.
Reviewers: dsanders
Differential Revision: https://reviews.llvm.org/D22752
llvm-svn: 276736
The saturation instructions appeared in v6T2, with DSP extensions, but they
were being accepted / generated on any, with the new introduction of the
saturation detection in the back-end. This commit restricts the usage to
DSP-enable only cores.
Fixes PR28607.
llvm-svn: 276701
They're basically i64 for AArch64, but we'll leave them intact for stranger
targets. Also add some tests for the (very few) other cases we can handle right
now.
llvm-svn: 276689
Some targets, notably AArch64 for ILP32, have different relocation encodings
based upon the ABI. This is an enabling change, so a future patch can use the
ABIName from MCTargetOptions to chose which relocations to use. Tested using
check-llvm.
The corresponding change to clang is in: http://reviews.llvm.org/D16538
Patch by: Joel Jones
Differential Revision: https://reviews.llvm.org/D16213
llvm-svn: 276654
Avoid MipsAnalyzeImmediate usage if the constant fits in an 32-bit
integer. This allows us to generate the same instructions for the
materialization of the same constants regardless the width of their
type.
Patch by: Vasileios Kalintiris
Contributions by: Simon Dardis
Reviewers: Daniel Sanders
Differential Review: https://reviews.llvm.org/D21689
llvm-svn: 276628
Follow up to r276624. Changes bits 22-20 to be parameters to
instruction class.
Differential Revision: https://reviews.llvm.org/D22562
llvm-svn: 276626
This places the 132/213/231 form number in front of the SS/SD/PS/PD. Move the Y for 256-bit versions to be after the PS/PD. Change the AVX512 scalar forms to include a Z in the their name. This new format should be consistent with the general naming of instructions.
llvm-svn: 276559
This reverts commit r276298.
Data stored in .rodata can have a negative offset from .text, but we
don't support negative values in relocations yet.
This caused a regression in one of the amp conformance tests:
5_Data_Cont/5_2_a_v/5_2_3_m/Assignment/Test.02.01
llvm-svn: 276498
This adds the actual MachineLegalizeHelper to do the work and a trivial pass
wrapper that legalizes all instructions in a MachineFunction. Currently the
only transformation supported is splitting up a vector G_ADD into one acting on
smaller vectors.
llvm-svn: 276461
The size can exceed s_movk_i32's limit, and we don't
want to use it this early since it inhibits optimizations.
This should probably be merged to the release branch.
llvm-svn: 276438
- FuncNode::findBlock traverses the function every time. Avoid using it,
and keep a cache of block addresses in DataFlowGraph instead.
- The operator[] in the map of definition stacks was very slow. Replace
the map with unordered_map.
llvm-svn: 276429
As reported on PR26235, we don't currently make use of the VBROADCASTF128/VBROADCASTI128 instructions (or the AVX512 equivalents) to load+splat a 128-bit vector to both lanes of a 256-bit vector.
This patch enables lowering from subvector insertion/concatenation patterns and auto-upgrades the llvm.x86.avx.vbroadcastf128.pd.256 / llvm.x86.avx.vbroadcastf128.ps.256 intrinsics to match.
We could possibly investigate using VBROADCASTF128/VBROADCASTI128 to load repeated constants as well (similar to how we already do for scalar broadcasts).
Reapplied with fix for PR28657 - removed intrinsic definitions (clang companion patch to be be submitted shortly).
Differential Revision: https://reviews.llvm.org/D22460
llvm-svn: 276416
functions so that the size computation is available not only in ConstantIslands
but in other passes as well.
Differential Revision: https://reviews.llvm.org/D22640
llvm-svn: 276399
This variant is (as documented in the TD) for disassembler use only, and should
not be used in patterns - it is longer, and is broken on 64-bit.
llvm-svn: 276347
when constraint "w" is used on a 32-bit operand.
This enables compiling the following code, which used to error out in
the backend:
void foo1(int a) {
asm volatile ("sqxtn h0, %s0\n" : : "w"(a):);
}
Fixes PR28633.
llvm-svn: 276344
Summary:
This change also changes findMatchingInsn and
findMatchingUpdateInsnForward to take DBG_VALUE opcodes into account
when tracking register defs and uses, which could potentially inhibit
these optimizations in the presence of debug information.
Reviewers: mcrosier
Subscribers: aemerson, rengolin, mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D22582
llvm-svn: 276293
Under normal circumstances we prefer the higher performance MOVD to extract the 0'th element of a v8i16 vector instead of PEXTRW.
But as detailed on PR27265, this prevents the SSE41 implementation of PEXTRW from folding the store of the 0'th element. Additionally it prevents us from making use of the fact that the (SSE2) reg-reg version of PEXTRW implicitly zero-extends the i16 element to the i32/i64 destination register.
This patch only preferentially lowers to MOVD if we will not be zero-extending the extracted i16, nor prevent a store from being folded (on SSSE41).
Fix for PR27265.
Differential Revision: https://reviews.llvm.org/D22509
llvm-svn: 276289
As requested on D22509, I've pulled out the v8i16 extraction lowering as the SSE41 and pre-SSE41 implementations are effectively the same.
llvm-svn: 276285
As reported on PR26235, we don't currently make use of the VBROADCASTF128/VBROADCASTI128 instructions (or the AVX512 equivalents) to load+splat a 128-bit vector to both lanes of a 256-bit vector.
This patch enables lowering from subvector insertion/concatenation patterns and auto-upgrades the llvm.x86.avx.vbroadcastf128.pd.256 / llvm.x86.avx.vbroadcastf128.ps.256 intrinsics to match.
We could possibly investigate using VBROADCASTF128/VBROADCASTI128 to load repeated constants as well (similar to how we already do for scalar broadcasts).
Differential Revision: https://reviews.llvm.org/D22460
llvm-svn: 276281
classifyLEAReg() deals with switching operands from 32bit to 64bit in
order to use a LEA64_32 instruction (for three address code goodness).
It currently performs a liveness analysis to determine the kill/undef
flag for the newly added operand. This should not be necessary:
- If the previous operand had a kill flag, then the 32bit part of the
register gets killed, this will kill the super register as well.
- If the previous operand had an undef flag then we didn't care what
value we read, just use the same flag on the new operand.
(No matter what an operand with an undef flag won't affect liveness)
This makes the code independent of the presence of kill flags because it
avoids a call to MachineBasicBlock::computeRegisterLiveness().
Differential Revision: http://reviews.llvm.org/D22283
llvm-svn: 276222
At -O0, cmpxchg survives AtomicExpand: it's mostly straightforward
to select it in fast-isel, and let the pseudo be expanded later.
extractvalues on the result are the tricky part: the generic logic
only works for legal types (and it would be painful to make it
support illegal types), so we can only support i32/i64 cmpxchg.
llvm-svn: 276183
This should be all the low-level instruction selection needs to determine how
to implement an operation, with the remaining context taken from the opcode
(e.g. G_ADD vs G_FADD) or other flags not based on type (e.g. fast-math).
llvm-svn: 276158
Avoid unnecessary spills of byval arguments of device functions to
local space on SASS level and subsequent pointer conversion to generic
address space that follows. Instead, make a local copy in IR, provide
a way to access arguments directly, and let LLVM optimize the copy away
when possible.
Differential Review: https://reviews.llvm.org/D21421
llvm-svn: 276153
This patch adds costs for the vectorized implementations of CTPOP, the default values were seriously underestimating the cost of these and was encouraging vectorization on targets where serialized use of POPCNT would be much better.
Differential Revision: https://reviews.llvm.org/D22456
llvm-svn: 276104
Retry r275776 (no changes, we suspect the issue was with another commit).
The current logic for handling inline asm operands in DAGToDAGISel interprets
the operands by looking for constants, which should represent the flags
describing the kind of operand we're dealing with (immediate, memory, register
def etc). The operands representing actual data are skipped only if they are
non-const, with the exception of immediate operands which are skipped explicitly
when a flag describing an immediate is found.
The oversight is that memory operands may be const too (e.g. for device drivers
reading a fixed address), so we should explicitly skip the operand following a
flag describing a memory operand. If we don't, we risk interpreting that
constant as a flag, which is definitely not intended.
Fixes PR26038
Differential Revision: https://reviews.llvm.org/D22103
llvm-svn: 276101
Inference of the 'returned' attribute was fixed in r276008, lets try
turning the backend support back on.
This reverts commit r275677.
llvm-svn: 276081
If 2.5 ulp is acceptable, denormals are not required, and
isn't a reciprocal which will already be handled, replace
with a faster fdiv.
Simplify the lowering tests by using per function
subtarget features.
llvm-svn: 276051
There's not much functional change, but it really is an architectural feature
(on v6T2, v7A, v7R and v7EM) rather than something each CPU implements
individually.
The main functional change is the default behaviour you get when specifying
only "-triple".
llvm-svn: 276013
Only if the value is negative or positive is what matters,
so use a constant that doesn't require an instruction to
materialize.
These should really just emit the write exec directly,
but for stick with the kill pseudo-terminator.
llvm-svn: 275988
D20859 and D20860 attempted to replace the SSE (V)CVTTPS2DQ and VCVTTPD2DQ truncating conversions with generic IR instead.
It turns out that the behaviour of these intrinsics is different enough from generic IR that this will cause problems, INF/NAN/out of range values are guaranteed to result in a 0x80000000 value - which plays havoc with constant folding which converts them to either zero or UNDEF. This is also an issue with the scalar implementations (which were already generic IR and what I was trying to match).
This patch changes both scalar and packed versions back to using x86-specific builtins.
It also deals with the other scalar conversion cases that are runtime rounding mode dependent and can have similar issues with constant folding.
A companion clang patch is at D22105
Differential Revision: https://reviews.llvm.org/D22106
llvm-svn: 275981
Recommitting after r274347 was reverted. This patch introduces some
classes to refactor the 3 and 4 register Thumb2 multiplication
instruction descriptions, plus improved tests for some of those
instructions.
Differential Revision: https://reviews.llvm.org/D21929
llvm-svn: 275979
The standard local dynamic model for TLS on ARM systems needs two
relocations:
- R_ARM_TLS_LDM32 (module idx)
- R_ARM_TLS_LDO32 (offset of object from origin of module TLS block)
In GNU style assembler we use symbol(tlsldm) and symbol(tlsldo) to
produce these relocations.
llvm-mc for ARM supports symbol(tlsldo) but does not support symbol(tlsldm).
This patch wires up the existing symbol(tlsldm) to R_ARM_TLS_LDM32.
TLS for ARM is defined in Addenda to, and Errata in, the ABI for the
ARM Architecture
Differential Revision: https://reviews.llvm.org/D22461
llvm-svn: 275977
Summary:
N32 and N64 follow the standard ELF conventions (.L) whereas O32 uses its own
($).
This fixes the majority of object differences between -fintegrated-as and
-fno-integrated-as.
Reviewers: sdardis
Subscribers: dsanders, sdardis, llvm-commits
Differential Revision: https://reviews.llvm.org/D22412
llvm-svn: 275967
The following condition expression ( a >> n) & 1 is converted to "bt a, n" instruction. It works on all intel targets.
But on AVX-512 it was broken because the expression is modified to (truncate (a >>n) to i1).
I added the new sequence (truncate (a >>n) to i1) to the BT pattern.
Differential Revision: https://reviews.llvm.org/D22354
llvm-svn: 275950
This is to help moveSILowerControlFlow to before regalloc.
There are a couple of tradeoffs with this. The complete CFG
is visible to more passes, the loop body avoids an extra copy of m0,
vcc isn't required, and immediate offsets can be shrunk into s_movk_i32.
The disadvantage is the register allocator doesn't understand that
the single lane's vector is dead within the loop body, so an extra
register is used to outlive the loop block when expanding the
VGPR -> m0 loop. This also now results in worse waitcnt insertion
before the loop instead of after for pending operations at the point
of the indexing, but that should be fixed by future improvements to
cross block waitcnt insertion.
v_movreld_b32's operands are now modeled more correctly since vdst
is not a true output. This is kind of a hack to treat vdst as a
use operand. Extra checking is required in the verifier since
I can't seem to get tablegen to emit an implicit operand for a
virtual register.
llvm-svn: 275934
Taking address of a byval variable in PTX is legal, but currently runs
into miscompilation by ptxas on sm_50+ (NVIDIA issue 1789042).
Work around the issue by enforcing minimum alignment on byval arguments
of device functions.
The change is a no-op on SASS level for sm_3x where ptxas already aligns
local copy by at least 4.
Differential Revision: https://reviews.llvm.org/D22428
llvm-svn: 275893
This is currently only called with GEP users. A direct
alloca would only happen with current typed pointers
for arrays which are a perverse case.
Also fix crashes on 0 x and 1 x arrays.
llvm-svn: 275869
Schedule a load and its use in the same packet in MISched. Previously,
isResourceAvailable was returning false for dependences in the same
packet, which prevented MISched from packetizing a load and its use in
the same packet for v60.
Patch by Ikhlas Ajbar.
llvm-svn: 275804
This patch corresponds to review:
https://reviews.llvm.org/D21354
We use direct moves for extracting integer elements from vectors. We also use
direct moves when converting integers to FP. When these operations are chained,
we get a direct move out of a VSR followed by a direct move back into a VSR.
These are redundant - all we need to do is line up the element and convert.
llvm-svn: 275796
The machine scheduler needs to account for available resources
more accurately in order to avoid scheduling an instruction that
forces a new packet to be created.
This occurs in two ways: First, an instruction without an available
resource may have a large priority due to other metrics and be
scheduled when there are other instructions with available resources.
Second, an instruction with a non-zero latency may become available
prematurely. In both these cases, we attempt change the priority
in order to allow a better instruction to be scheduled.
Patch by Brendon Cahoon.
llvm-svn: 275793
An instruction may have multiple predecessors that are candidates
for using .cur. However, only one of them can use .cur in the
packet. When this case occurs, we need to make sure that only
one of the dependences gets a 0 latency value.
Patch by Brendon Cahoon.
llvm-svn: 275790
When SelectionDAGISel transforms a node representing an inline asm
block, memory constraint information is not preserved. This can cause
constraints to be broken when a memory offset is of the form:
offset + frame index
when the frame is resolved.
By propagating the constraints all the way to the backend, targets can
enforce memory operands of inline assembly to conform to their constraints.
For MIPSR6, some instructions had their offsets reduced to 9 bits from
16 bits such as ll/sc. This becomes problematic when using inline assembly
to perform atomic operations, as an offset can generated that is too big to
encode in the instruction.
Reviewers: dsanders, vkalintris
Differential Review: https://reviews.llvm.org/D21615
llvm-svn: 275786
Summary:
The work item intrinsics are not available for the shader
calling conventions. And even if we did hook them up most
shader stages haves some extra restrictions on the amount
of available LDS.
Reviewers: tstellarAMD, arsenm
Subscribers: nhaehnle, arsenm, llvm-commits, kzhuravl
Differential Revision: https://reviews.llvm.org/D20728
llvm-svn: 275779
The current logic for handling inline asm operands in DAGToDAGISel interprets
the operands by looking for constants, which should represent the flags
describing the kind of operand we're dealing with (immediate, memory, register
def etc). The operands representing actual data are skipped only if they are
non-const, with the exception of immediate operands which are skipped explicitly
when a flag describing an immediate is found.
The oversight is that memory operands may be const too (e.g. for device drivers
reading a fixed address), so we should explicitly skip the operand following a
flag describing a memory operand. If we don't, we risk interpreting that
constant as a flag, which is definitely not intended.
Fixes PR26038
Differential Revision: https://reviews.llvm.org/D22103
llvm-svn: 275776
At higher optimization levels, we generate the libcall for DIVREM_Ix, which is
fine: aeabi_{u|i}divmod. At -O0 we generate the one for REM_Ix, which is the
default {u}mod{q|h|s|d}i3.
This commit makes sure that we don't generate REM_Ix calls for ABIs that
don't support them (i.e. where we need to use DIVREM_Ix instead). This is
achieved by bailing out of FastISel, which can't handle non-double multi-reg
returns, and letting the legalization infrastructure expand the REM_Ix calls.
It also updates the divmod-eabi.ll test to run under -O0 as well, and adds some
Windows checks to it to make sure we don't break things for it.
Fixes PR27068
Differential Revision: https://reviews.llvm.org/D21926
llvm-svn: 275773
r275042 reverted function-attribute inference for the 'returned' attribute
because the feature triggered self-hosting failures on ARM and AArch64. James
Molloy determined that the this-return argument forwarding feature, which
directly ties the returned input argument to the returned value, was the cause.
It seems likely that this forwarding code contains, or triggers, a subtle bug.
Disabling for now until we can track that down.
llvm-svn: 275677
Initializing them in LLVMInitializeARMTarget() makes them visible early
enough for "llc -run-pass usage".
This required the pass to be renamed from "arm-load-store-opt" to
"arm-ldst-opt", because there already exists an arm-load-store-opt
cl::opt switch which would now clash with the passname getting added as
a switch in opt. On the bright side the pass name now matches the
DEBUG_TYPE name. Renamed "arm-prera-load-store-opt" to
"arm-repra-ldst-opt" as well for consistency.
llvm-svn: 275661
In this situation:
%VGPR2<def> = BUFFER_LOAD_DWORD_OFFSET %SGPR8_SGPR9_SGPR10_SGPR11,
%VGPR7<def,tied3> = V_MAC_F32_e32 %VGPR0<undef>, %VGPR1<kill>, %VGPR7<kill,tied0>, %EXEC<imp-use>
%VGPR3_VGPR4_VGPR5_VGPR6<def> = COPY %VGPR0_VGPR1_VGPR2_VGPR3
%VGPR4<def> = COPY %VGPR2
The copy for VGPR1 -> VGPR4 was an error from reading undefined VGPR1,
but VGPR4 is defined immediately after this copy.
llvm-svn: 275635
The same value for EM_BPF is being propagated to glibc,
elfutils, and binutils.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
llvm-svn: 275633
The Hexagon schedulers need to handle instructions with a latency
of 0 or 2 more accurately. The problem, in v60, is that a dependence
between two instructions with a 2 cycle latency can use a .cur version
of the source to achieve a 0 cycle latency when the use is in the
same packet. Any othe use, must be at least 2 packets later, or a
stall occurs. In other words, the compiler does not want to schedule
the dependent instructions 1 cycle later.
To achieve this, the latency adjustment code allows only a single
dependence to have a zero latency. All other instructions have the
other value, which is typically 2 cycles. We use a heuristic to
determine which instruction gets the 0 latency.
The Hexagon machine scheduler was also changed to increase the cost
associated with 0 latency dependences than can be scheduled in the
same packet.
Patch by Brendon Cahoon.
llvm-svn: 275625
This mostly just works.
Vectorcall rets are still not supported.
The win64_eh test change is because fast isel doesn't use rsi for temporary
computations, so it doesn't need to be pushed. The test case I'm changing was
originally added to test pushes, but by now there are other test cases in that
file exercising that code path.
https://reviews.llvm.org/D22422
llvm-svn: 275607
Summary:
Instead, we take a single flags arg (a bitset).
Also add a default 0 alignment, and change the order of arguments so the
alignment comes before the flags.
This greatly simplifies many callsites, and fixes a bug in
AMDGPUISelLowering, wherein the order of the args to getLoad was
inverted. It also greatly simplifies the process of adding another flag
to getLoad.
Reviewers: chandlerc, tstellarAMD
Subscribers: jholewinski, arsenm, jyknight, dsanders, nemanjai, llvm-commits
Differential Revision: http://reviews.llvm.org/D22249
llvm-svn: 275592
Summary:
Previously we took an unsigned.
Hooray for type-safety.
Reviewers: chandlerc
Subscribers: dsanders, llvm-commits
Differential Revision: http://reviews.llvm.org/D22282
llvm-svn: 275591
- Treat bitwise OR with a frame index as an ADD wherever possible, fold it
into addressing mode.
- Extend patterns for memops to allow memops with frame indexes as address
operands.
llvm-svn: 275569
Added emitting metadata to elf for runtime.
Runtime requires certain information (metadata) about kernels to be able to execute and query them. Such information is emitted to an elf section as a key-value pair stream.
Differential Revision: https://reviews.llvm.org/D21849
llvm-svn: 275566
As discussed on PR28136, lowerShuffleAsRepeatedMaskAndLanePermute was attempting to match repeated masks at the 128-bit level and then permute the resultant lanes at the 128-bit (AVX1) or 64-bit (AVX2) sub-lane level.
This change allows us to create the repeated masks at the sub-lane level (and then concat them together to create a 128-bit repeated mask) and then select which sub-lane to permute. This has no effect on the AVX1 codegen.
Fixes PR28136.
llvm-svn: 275543
Thumb-1 doesn't have post-inc or pre-inc load or store instructions. However the LDM/STM instructions with writeback can function as post-inc load/store:
ldm r0!, {r1} @ load from r0 into r1 and increment r0 by 4
Obviously, this only works if the post increment is 4.
llvm-svn: 275540
... When we emit several calls to the same function in the same basic block.
An indirect call uses a "BLX r0" instruction which has a 16-bit encoding. If many calls are made to the same target, this can enable significant code size reductions.
llvm-svn: 275537
Also stop trying to insert skip blocks at end_cf. This
was inserting them at the end of the block which doesn't make
sense. The skip should be inserted at the beginning of the block
right after the end cf. Just remove this for now since no tests
seem to stress this and I think this can be handled more generally
later.
Fixes bug 28550
llvm-svn: 275510
If a subtarget has both ZCZeroing and CustomCheapAsMoveHandling features (now
only Kryo has both), set COPY (W|X)ZR isAsCheapAsAMove.
Differential Revision: http://reviews.llvm.org/D22360
llvm-svn: 275503
This improves the situation discussed in D19228 where we were forcing VPERMPD/VPERMQ where VPERM2F128/VPERM2I128 would have been better.
This was incorrectly reverted in rL275421 during triage of PR28552.
llvm-svn: 275497
On Hexagon is it legal to packetize the instructions setting up call
arguments with the call instruction itself. This was already done,
except for tail calls. Make sure tail calls are handled as well.
llvm-svn: 275458
Summary:
Make the target-specific flags in MachineMemOperand::Flags real, bona
fide enum values. This simplifies users, prevents various constants
from going out of sync, and avoids the false sense of security provided
by declaring static members in classes and then forgetting to define
them inside of cpp files.
Reviewers: MatzeB
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D22372
llvm-svn: 275451
Summary:
- Give it a shorter name (because we're going to refer to it often from
SelectionDAG and friends).
- Split the flags and alignment into separate variables.
- Specialize FlagsEnumTraits for it, so we can do bitwise ops on it
without losing type information.
- Make some enum values constants in MachineMemOperand instead.
MOMaxBits should not be a valid Flag.
- Simplify some of the bitwise ops for dealing with Flags.
Reviewers: chandlerc
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D22281
llvm-svn: 275438
We were able to assemble, but not disassemble.
Note that fixupRMValue was truncating EA_REG_BND0-3 because we hit
the uint8_t max. The control registers were already squarely above
it, but I don't think they ever go in .r/m, only in .reg.
I also did notice an extra REX.W in our encoding, but I think that's
fine.
llvm-svn: 275427
stdcall is callee-pop like thiscall, so the thiscall changes already did most
of the work for this. This change only opts stdcall in and adds tests.
llvm-svn: 275414
This improves the situation discussed in D19228 where we were forcing VPERMPD/VPERMQ where VPERM2F128/VPERM2I128 would have been better.
llvm-svn: 275411
Summary:
It was recently discovered that, for Mips's SelectionDAGISel subclasses,
all optimization levels caused SelectionDAGISel to behave like -O2.
This change adds the necessary plumbing to initialize the optimization level.
Reviewers: andrew.w.kaylor
Subscribers: andrew.w.kaylor, sdardis, dean, llvm-commits, vradosavljevic, petarj, qcolombet, probinson, dsanders
Differential Revision: https://reviews.llvm.org/D14900
llvm-svn: 275410
Primarily this is to allow blend with zero instead of having to use vperm2f128, but we can use this in the future to deal with AVX512 cases where we need to keep the original element size to correctly fold masked operations.
llvm-svn: 275406
constant hoisting. It not only takes into account the number of uses and the
cost of expressions in which constants appear, but now also the resulting
integer range of the offsets. Thus, the algorithm maximizes the number of uses
within an integer range that will enable more efficient code generation. On
ARM, for example, this will enable code size optimisations because less
negative offsets will be created. Negative offsets/immediates are not supported
by Thumb1 thus preventing more compact instruction encoding.
Differential Revision: http://reviews.llvm.org/D21183
llvm-svn: 275382
Summary:
In this patch we implement the following parts of XRay:
- Supporting a function attribute named 'function-instrument' which currently only supports 'xray-always'. We should be able to use this attribute for other instrumentation approaches.
- Supporting a function attribute named 'xray-instruction-threshold' used to determine whether a function is instrumented with a minimum number of instructions (IR instruction counts).
- X86-specific nop sleds as described in the white paper.
- A machine function pass that adds the different instrumentation marker instructions at a very late stage.
- A way of identifying which return opcode is considered "normal" for each architecture.
There are some caveats here:
1) We don't handle PATCHABLE_RET in platforms other than x86_64 yet -- this means if IR used PATCHABLE_RET directly instead of a normal ret, instruction lowering for that platform might do the wrong thing. We think this should be handled at instruction selection time to by default be unpacked for platforms where XRay is not availble yet.
2) The generated section for X86 is different from what is described from the white paper for the sole reason that LLVM allows us to do this neatly. We're taking the opportunity to deviate from the white paper from this perspective to allow us to get richer information from the runtime library.
Reviewers: sanjoy, eugenis, kcc, pcc, echristo, rnk
Subscribers: niravd, majnemer, atrick, rnk, emaste, bmakam, mcrosier, mehdi_amini, llvm-commits
Differential Revision: http://reviews.llvm.org/D19904
llvm-svn: 275367
Avoid exposing a cl::opt in a public header and instead promote this
option in the API.
Alternatively, we could land the cl::opt in CommandFlags.h so that
it is available to every tool, but we would still have to find an
option for clang.
llvm-svn: 275348
This happens to make X86CallFrameOptimization in -O0 / FastISel builds as well,
but it's not clear if the pass should run in that setup.
http://reviews.llvm.org/D22314
llvm-svn: 275320
Summary:
v2: don't count SGPRs spilled to scratch twice
I think this is sufficient. It doesn't count private memory usage, which
happens often and uses scratch but isn't technically a spill. The private
memory usage can be computed by:
[scratch_per_thread - vgpr_spills - a random multiple of SGPR spills].
The fact SGPR spills add very high numbers to the scratch size make that
computation a guessing game, but I don't have a solution to that.
Reviewers: tstellarAMD
Subscribers: arsenm, kzhuravl
Differential Revision: http://reviews.llvm.org/D22197
llvm-svn: 275288
We know that pcmp produces all-ones/all-zeros bitmasks, so we can use that behavior to avoid unnecessary constant loading.
One could argue that load+and is actually a better solution for some CPUs (Intel big cores) because shifts don't have the
same throughput potential as load+and on those cores, but that should be handled as a CPU-specific later transformation if
it ever comes up. Removing the load is the more general x86 optimization. Note that the uneven usage of vpbroadcast in the
test cases is filed as PR28505:
https://llvm.org/bugs/show_bug.cgi?id=28505
Differential Revision: http://reviews.llvm.org/D22225
llvm-svn: 275276
- Add new TTI instruction checks
- Don't use const for blocks that are mutated.
- Checking isBranch and isTerminator should be redundant
llvm-svn: 275252
These patterns just extracted the source down to 128-bits to use the instructions. AVX512 seems to have blindly copied them over for VLX, but did not create similar patterns for 512-bit sources. So I'm hoping the backend can't actually produce these cases.
llvm-svn: 275240
This patch corresponds to review:
http://reviews.llvm.org/D20239
It adds exploitation of XXINSERTW and XXEXTRACTUW instructions that
are useful in some cases for inserting and extracting vector elements of
v4[if]32 vectors.
llvm-svn: 275215
With r274952 and r275201 in place there are no cases left where a
forward liveness analysis yields different results than a backward one.
So we can remove the forward stepping logic.
Differential Revision: http://reviews.llvm.org/D22083
llvm-svn: 275204
If a subtarget has both ZCZeroing and CustomCheapAsMoveHandling features (now
only Kryo has both), set FMOVS0 and FMOVD0 isAsCheapAsAMove.
Differential Revision: http://reviews.llvm.org/D22256
llvm-svn: 275178
This patch corresponds to review:
http://reviews.llvm.org/D21358
Vector shifts that have the same semantics as a vector swap are cannonicalized
as such to provide additional opportunities for swap removal optimization to
remove unnecessary swaps.
llvm-svn: 275168
Summary:
Previously, constant index insertelements would be turned into SI_INDIRECT_DST,
which is bound to prevent some optimization opportunities. Worse, it mislead
the heuristic that decides whether immediates should be lowered to S_MOV_B32
or V_MOV_B32 in a way that resulted in unnecessary v_readfirstlanes.
Reviewers: arsenm, tstellarAMD
Subscribers: arsenm, kzhuravl, llvm-commits
Differential Revision: http://reviews.llvm.org/D22217
llvm-svn: 275160
Avoid implicit conversions from MachineInstrBundleIterator to
MachineInstr*, mainly by preferring MachineInstr& over MachineInstr* and
using range-based for loops.
llvm-svn: 275149
Avoid implicit iterator conversions from MachineInstrBundleIterator to
MachineInstr* in the Hexagon backend, mostly by preferring MachineInstr&
over MachineInstr* and switching to range-based for loops.
There's a long tail of API cleanup here, but I'm planning to leave the
rest to the Hexagon maintainers. HexagonInstrInfo defines many of its
own predicates, and most of them still take MachineInstr*. Some of
those actually check for nullptr, so I didn't feel comfortable changing
them to MachineInstr& en masse.
llvm-svn: 275142
Avoid implicit conversions from MachineInstrBundleIterator to
MachineInstr* in the Mips backend, mainly by preferring MachineInstr&
over MachineInstr* when a pointer isn't nullable and using range-based
for loops.
llvm-svn: 275141
Avoid implicit conversions from MachineInstrBundleIterator to
MachineInstr* in the SystemZ backend, mainly by preferring MachineInstr&
over MachineInstr* and using range-based for loops.
llvm-svn: 275137
Immediate branch targets aren't commonly used, but if they are we should make
sure they can actually be encoded. This means they must be divisible by 2 when
targeting Thumb mode, and by 4 when targeting ARM mode.
Also do a little naming cleanup while I was changing everything around anyway.
llvm-svn: 275116
Summary:
Setting MIMG to 0 has a bunch of unexpected side effects, including that
isVMEM returns false which leads to incorrect treatment in the hazard
recognizer. The reason I noticed it is that it also leads to incorrect
treatment in VGPR-to-SGPR copies, which is one cause of the referenced bug.
The only reason why MIMG was set to 0 is to signal the special handling of
dmasks, but that can be checked differently.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96877
Reviewers: arsenm, tstellarAMD
Subscribers: arsenm, kzhuravl, llvm-commits
Differential Revision: http://reviews.llvm.org/D22210
llvm-svn: 275113
Summary:
The main bug fix here is using the 32-bit encoding of V_ADD_I32 in
materializeFrameBaseRegister and resolveFrameIndex, so that arbitrary
immediates work.
The second part is that we may now require the SegmentWaveByteOffset
even when there are initially no stack objects and VGPR spilling isn't
enabled, for stack slots that are allocated later. This means that some
bits become effectively dead and can be cleaned up.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96602
Tested-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Reviewers: arsenm, tstellarAMD
Subscribers: arsenm, llvm-commits, kzhuravl
Differential Revision: http://reviews.llvm.org/D21551
llvm-svn: 275108
Make some AVX and AVX512 cast costs more precise.
Based on part of a patch by Elena Demikhovsky (D15604).
Differential Revision: http://reviews.llvm.org/D22064
llvm-svn: 275106
This bug (llvm.org/PR28124) was introduced by r237977, which refactored
the tail call sequence to be generated in two passes instead of one.
Unfortunately, the stack adjustment produced by the first pass was not
recognized by X86FrameLowering::mergeSPUpdates() in all cases, causing
code such as the following, which clobbers the return address, to be
generated:
popl %edi
popl %edi
pushl %eax
jmp tailcallee # TAILCALL
To fix the problem, the entire stack adjustment is performed in
X86ExpandPseudo::ExpandMI() for tail calls.
Patch by Magnus Lång <margnus1@gmail.com>
Differential Revision: http://reviews.llvm.org/D21325
llvm-svn: 275103
It is an optimization pass, and should not run at -O0. Especially since Fast RA
will not do the required register coalescing anyway, so it's a loss even from
the optimization standpoint.
This also works around (but doesn't quite fix) PR28489.
llvm-svn: 275099
Summary: Add support for the z13 instructions LOCHI and LOCGHI which
conditionally load immediate values. Add target instruction info hooks so
that if conversion will allow predication of LHI/LGHI.
Author: RolandF
Reviewers: uweigand
Subscribers: zhanjunl
Commiting on behalf of Roland.
Differential Revision: http://reviews.llvm.org/D22117
llvm-svn: 275086
At present the only shuffle with a variable mask we recognise is PSHUFB, which influences if its worth the cost of mask creation/loading of a combined target shuffle with a variable mask. This change sets up the infrastructure to support other shuffles in the future but has no effect yet.
llvm-svn: 275059
Calls to matchVectorShuffleAsInsertPS only need to ensure the inputs are 128-bit vectors. Only lowerVectorShuffleAsInsertPS needs to ensure that they are v4f32.
llvm-svn: 275028
This adds a new SystemZ-specific intrinsic, llvm.s390.tdc.f(32|64|128),
which maps straight to the test data class instructions. A new IR pass
is added to recognize instructions that can be converted to TDC and
perform the necessary replacements.
Differential Revision: http://reviews.llvm.org/D21949
llvm-svn: 275016
Avoid implicit conversions from MachineInstrBundleIIterator to
MachineInstr* in the MSP430 backend by preferring MachineInstr& over
MachineInstr* when a pointer isn't nullable.
llvm-svn: 274933
Avoid implicit conversions from MachineInstrBundleIterator to
MachineInstr* in the NVPTX backend, mainly by preferring MachineInstr&
over MachineInstr* when a pointer isn't nullable and using range-based
for loops.
There was one piece of questionable code in
NVPTXInstrInfo::AnalyzeBranch, where a condition checked a pointer
converted from an iterator for nullptr. Since this case is impossible
(moreover, the code above guarantees that the iterator is valid), I
removed the check when I changed the pointer to a reference.
Despite that case, there should be no functionality change here.
llvm-svn: 274931
Avoid implicit conversions from MachineInstrBundleInstr to MachineInstr*
in the AArch64 backend, mainly by preferring MachineInstr& over
MachineInstr* when a pointer isn't nullable.
llvm-svn: 274924
Remove remaining implicit conversions from MachineInstrBundleIterator to
MachineInstr* from the ARM backend. In most cases, I made them less attractive
by preferring MachineInstr& or using a ranged-based for loop.
Once all the backends are fixed I'll make the operator explicit so that this
doesn't bitrot back.
llvm-svn: 274920
Avoid implicit conversions from MachineInstrBundleIterator to
MachineInstr* in the WebAssembly backend by preferring MachineInstr&
over MachineInstr*.
llvm-svn: 274912
Remove remaining implicit conversions from MachineInstrBundleIterator to
MachineInstr* from the AMDGPU backend. In most cases, I made them less
attractive by preferring MachineInstr& or using a ranged-based for loop.
Once all the backends are fixed I'll make the operator explicit so that
this doesn't bitrot back.
llvm-svn: 274906
Change a while loop that was checking for nullptr on an
iterator-to-pointer conversion to an infinite for loop. Now it's clear
that the condition doesn't terminate.
The only change in behaviour is if an invalid iterator (holding nullptr)
was passed into AMDGPUCFGStructurizer::reversePredicateSetter. There
are only two callers, and they both dereference the iterator before
sending it in, so rather than adding an early return to avoid the loop
I've just asserted (using a static_cast, to avoid an implicit conversion
to pointer).
llvm-svn: 274902
Stop using an implicit conversion from the return of
MachineBasicBlock::getFirstTerminator to MachineInstr*. In two cases,
directly dereference to a MachineInstr& since later code assumes it's
valid. In a third case, change to an iterator since later code checks
against MachineBasicBlock::end.
Although the fix for the third case avoids undefined behaviour, I expect
this doesn't cause a functionality change in practice (since the basic
block already has a terminator).
llvm-svn: 274898
The commit message is inaccurate, modifiesRegister
will check for partial defs of exec.
We currently don't ever emit partial defs of exec,
so it doesn't really matter.
llvm-svn: 274886
Summary: Branch off the work to add support for the .word directive,
using addAliasForDirective.
Reviewers: koriakin
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D22142
llvm-svn: 274878
Errata fixes for various errata in different versions of the Leon variants of the Sparc 32 bit processor.
The nature of the errata are listed in the comments preceding the errata fix passes. Relevant unit tests are implemented for each of these.
Note: Running clang-format has changed a few other lines too, unrelated to the implemented errata fixes. These have been left in as this keeps the code formatting consistent.
Differential Revision: http://reviews.llvm.org/D21960
llvm-svn: 274856
Support for the macro fusion of simple ALU ops with branches for the Vulcan sub-target.
Patch by Meador Inge <meadori@gmail.com>
Differential Revision: http://reviews.llvm.org/D22042
llvm-svn: 274837
Until we have a better way to extract constants through bitcasted build vectors (and how to handle undefs of partial lanes etc.) at least accept build vectors that are all zeroes.
llvm-svn: 274833
Windows on ARM uses a pure thumb-2 environment. This means that it can select a
high register when doing a __builtin_longjmp. We would use a tLDRi which would
truncate the register to a low register. Use a t2LDRi12 to get the full
register file access. Tweak the code to just load into PC, as that is an
interworking branch on all supported cores anyways.
llvm-svn: 274815
Summary:
* Similiar to the ARM backend yse the peephole optimizer to generate more conditional ALU operations;
* Add predicated type with default always true to RR instructions in LanaiInstrInfo.td;
* Move LanaiSetflagAluCombiner into optimizeCompare;
* The ASM parser can currently only handle explicitly specified CC, so specify ".t" (true) where needed in the ASM test;
* Remove unused MachineOperand flags;
Reviewers: eliben
Subscribers: aemerson
Differential Revision: http://reviews.llvm.org/D22072
llvm-svn: 274807
xorl + setcc is generally the preferred sequence due to the partial register
stall setcc + movzbl suffers from. As a bonus, it also encodes one byte smaller.
This fixes PR28146.
The original commit tried inserting an 8bit-subreg into a GR32 (not GR32_ABCD)
which was not appreciated by fast regalloc on 32-bit.
llvm-svn: 274802
The commit reinstates r273279, which was informally approved.
Original Review: http://reviews.llvm.org/D21414
This reverts commit ca632c91aaa7cafc50942f890c49f727a046ace1.
llvm-svn: 274790
- Rename the ptx.read.* intrinsics to nvvm.read.ptx.sreg.* - some but
not all of these registers were already accessible via the nvvm
name.
- Rename ptx.bar.sync nvvm.bar.sync, to match nvvm.bar0.
There's a fair amount of code motion here, but it's all very
mechanical.
llvm-svn: 274769
Summary:
A regression showed up in node.js when handling conditional calls.
Fix the regression by recognizing external symbols as a possible
operand type in CallJG.
Reviewers: koriakin
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D22054
llvm-svn: 274761
This is a follow-up for r273544.
The end goal is to get rid of the isSwift / isCortexXY / isWhatever methods.
This commit also removes a command line flag that isn't used in any of the tests:
check-vmlx-hazards. It can be replaced easily with the mattr mechanism, since
this is now a subtarget feature.
There is still some work left regarding FeatureExpandMLx. In the past MLx
expansion was enabled for subtargets with hasVFP2(), until r129775 [1] switched
from that to isCortexA9, without too much justification.
In spite of that, the code performing MLx expansion still contains calls to
isSwift/isLikeA9, although the results of those are pretty clear given that
we're only enabling it for the A9.
We should try to enable it for all targets that have FeatureHasVMLxHazards, as
it seems to be closely related to that behaviour, and if that is possible try to
clean up the MLx expansion pass from all calls to isWhatever. This will require
some performance testing, so it will be done in another patch.
[1] http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20110418/119725.html
Differential Revision: http://reviews.llvm.org/D21798
llvm-svn: 274742
xorl + setcc is generally the preferred sequence due to the partial register
stall setcc + movzbl suffers from. As a bonus, it also encodes one byte smaller.
This fixes PR28146.
Differential Revision: http://reviews.llvm.org/D21774
llvm-svn: 274692
On CPUs with the zero cycle zeroing feature enabled "movi v.2d" should
be used to zero a vector register. This was previously done at
instruction selection time, however the register coalescer sometimes
widened multiple vregs to the Q width because of that leading to extra
spills. This patch leaves the decision on how to zero a register to the
AsmPrinter phase where it doesn't affect register allocation anymore.
This patch also sets isAsCheapAsAMove=1 on FMOVS0, FMOVD0.
This fixes http://llvm.org/PR27454, rdar://25866262
Differential Revision: http://reviews.llvm.org/D21826
llvm-svn: 274686
findScratchNonCalleeSaveRegister() just needs a simple liveness
analysis, use LivePhysRegs for that as it is simpler and does not depend
on the kill flags.
This commit adds a convenience function available() to LivePhysRegs:
This function returns true if the given register is not reserved and
neither the register nor any of its aliases are alive.
Differential Revision: http://reviews.llvm.org/D21865
llvm-svn: 274685
This is "cvtdq2ps" which does not appear to be particularly slow on any CPU
according to Agner's tables. Choosing "5" as a cost here as suggested in:
https://llvm.org/bugs/show_bug.cgi?id=21356
...but it seems very conservative given that the instruction is fully pipelined,
and I think these costs are supposed to model throughput.
Note that related costs are also most likely too high, but this fixes PR21356
and partly fixes PR28434.
llvm-svn: 274658
Cast cost tables are now sorted, for each cast type, lexicographically on
[source base type, source vector width, dest base type, base vector width].
llvm-svn: 274653
On SystemZ, shift and rotate instructions only use the bottom 6 bits of the shift/rotate amount.
Therefore, if the amount is ANDed with an immediate mask that has all of the bottom 6 bits set, we
can remove the AND operation entirely.
Differential Revision: http://reviews.llvm.org/D21854
llvm-svn: 274650
We were checking for 2 insertions (which is caught earlier in the pattern matching loop) instead of the case where we have no insertions.
Turns out this code never fires as we always try to lower to insertps after trying to lower to blendps, which would catch these cases - I'm about to make some changes to support combining to insertps which could cause this to fire so I don't want to remove it.
llvm-svn: 274648
There is a problem in VSXSwapRemoval where it is incorrectly removing permute instructions.
In this case, the permute is feeding both a vector store and also a non-store instruction. In this case, the permute cannot be removed.
The fix is to simply look at all the uses of the vector register defined by the permute and ensure that all the uses are vector store instructions.
This problem was reported in PR 27735 (https://llvm.org/bugs/show_bug.cgi?id=27735).
Test case based on the original problem reported.
Phabricator Review: http://reviews.llvm.org/D21802
llvm-svn: 274645
The cost model should not assume vector casts get completely scalarized, since
on targets that have vector support, the common case is a partial split up to
the legal vector size. So, when a vector cast gets split, the resulting casts
end up legal and cheap.
Instead of pessimistically assuming scalarization, base TTI can use the costs
the concrete TTI provides for the split vector, plus a fudge factor to account
for the cost of the split itself. This fudge factor is currently 1 by default,
except on AMDGPU where inserts and extracts are considered free.
Differential Revision: http://reviews.llvm.org/D21251
llvm-svn: 274642
This is a follow-up for r273544.
The end goal is to get rid of the isSwift / isCortexXY / isWhatever methods.
This commit also removes two command-line flags that weren't used in any of the
tests: widen-vmovs and swift-partial-update-clearance. The former may be easily
replaced with the mattr mechanism, but the latter may not (as it is a subtarget
property, and not a proper feature).
Differential Revision: http://reviews.llvm.org/D21797
llvm-svn: 274620
This is a follow-up for r273544 and r273853.
The end goal is to get rid of the isSwift / isCortexXY / isWhatever methods.
This commit also marks them as obsolete.
Differential Revision: http://reviews.llvm.org/D21796
llvm-svn: 274616
The patch removes redundant kmov instructions (not all, we still have a lot of work here) and redundant "and" instructions after "setcc".
I use "AssertZero" marker between X86ISD::SETCC node and "truncate" to eliminate extra "and $1" instruction.
I also changed zext, aext and trunc patterns in the .td file. It allows to remove extra "kmov" instruictions.
This patch fixes https://llvm.org/bugs/show_bug.cgi?id=28173.
Fast ISEL mode is not supported correctly for AVX-512. ICMP/FCMP scalar instruction should return result in k-reg. It will be fixed in one of the next patches. I redirected handling of "cmp" to the DAG builder mode. (The code looks worse in one specific test case, but without this fix the new patch fails).
Differential revision: http://reviews.llvm.org/D21956
llvm-svn: 274613
Summary:
Since "AMDGPU: Fix verifier errors in SILowerControlFlow", the logic that
ensures that a non-void-returning shader falls off the end of the last
basic block was effectively disabled, since SI_RETURN is now used.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96731
Reviewers: arsenm, tstellarAMD
Subscribers: arsenm, kzhuravl, llvm-commits
Differential Revision: http://reviews.llvm.org/D21975
llvm-svn: 274612
I think the Ops filled out by Regex::match contain pointers into the temporary
std::string returned by StringRef::upper. Its lifetime is extended by the call
to match, but only until the end of that call (not to the uses of Ops later
on).
llvm-svn: 274586
Registers are printed a lot, so don't create temporary
std::strings. Using char instead of a string to an ostream
saves a function call.
llvm-svn: 274581
The way the named arguments for various system instructions are handled at the
moment has a few problems:
- Large-scale duplication between AArch64BaseInfo.h and AArch64BaseInfo.cpp
- That weird Mapping class that I have no idea what I was on when I thought
it was a good idea.
- Searches are performed linearly through the entire list.
- We print absolutely all registers in upper-case, even though some are
canonically mixed case (SPSel for example).
- The ARM ARM specifies sysregs in terms of 5 fields, but those are relegated
to comments in our implementation, with a slightly opaque hex value
indicating the canonical encoding LLVM will use.
This adds a new TableGen backend to produce efficiently searchable tables, and
switches AArch64 over to using that infrastructure.
llvm-svn: 274576
This reverts commit r259387 because it inserts illegal code after legalization
in some backends where i64 OR type is illegal for example.
llvm-svn: 274573
Not all code-paths set the relocation model to static for Windows. This
currently breaks on Windows ARM with `-mlong-calls` when built with clang.
Loosen the assertion to what it was previously. We would ideally ensure that
all the configuration sets Windows to static relocation model.
llvm-svn: 274570
The other use really does only care about the SDNode (it checks the
opcode against a whitelist), but bitFieldPlacement can be misled if
the node produces multiple results.
Patch by Ismail Badawi.
llvm-svn: 274567
Because of the special immediate operand, the constant
bus is already used so SGPRs are never useful.
r263212 changed the name of the immediate operand, which
broke the verifier check for the restriction.
llvm-svn: 274564
Summary:
These have been replaced with TableGen code (except for isConstantLoad,
which is still used for R600). The queries were broken for cases
where MemOperand was a PseudoSourceValue.
Reviewers: arsenm
Subscribers: arsenm, kzhuravl, llvm-commits
Differential Revision: http://reviews.llvm.org/D21684
llvm-svn: 274561
The important thing I was missing was ensuring newly added constants were kept in topological order. Repositioning the node is correct if the constant is newly added (so it has no topological ordering) but wrong if it already existed - positioning it next in the worklist would break the topological ordering.
Original commit message:
[Thumb] Select a BIC instead of AND if the immediate can be encoded more optimally negated
If an immediate is only used in an AND node, it is possible that the immediate can be more optimally materialized when negated. If this is the case, we can negate the immediate and use a BIC instead;
int i(int a) {
return a & 0xfffffeec;
}
Used to produce:
ldr r1, [CONSTPOOL]
ands r0, r1
CONSTPOOL: 0xfffffeec
And now produces:
movs r1, #255
adds r1, #20 ; Less costly immediate generation
bics r0, r1
llvm-svn: 274543
This patch corresponds to review:
http://reviews.llvm.org/D20443
It changes the legalization strategy for illegal vector types from integer
promotion to widening. This only applies for vectors with elements of width
that is a multiple of a byte since we have hardware support for vectors with
1, 2, 3, 8 and 16 byte elements.
Integer promotion for vectors is quite expensive on PPC due to the sequence
of breaking apart the vector, extending the elements and reconstituting the
vector. Two of these operations are expensive.
This patch causes between minor and major improvements in performance on most
benchmarks. There are very few benchmarks whose performance regresses. These
regressions can be handled in a subsequent patch with a DAG combine (similar
to how this patch handles int -> fp conversions of illegal vector types).
llvm-svn: 274535
Summary:
The isGlobalLoad() query was returning true for constant address space loads
with memory types less than 32-bits, which is wrong. This logic has been
replaced with PatFrag in the TableGen files, to provide the same functionality.
Reviewers: arsenm
Subscribers: arsenm, kzhuravl, llvm-commits
Differential Revision: http://reviews.llvm.org/D21696
llvm-svn: 274521
We were using DAG->getConstant instead of DAG->getTargetConstant. This meant that we could inadvertently increase the use count of a constant if stars aligned, which it did in this testcase. Increasing the use count of the constant could cause ISel to fall over (because DAGToDAG lowering assumed the constant had only one use!)
Original commit message:
[Thumb] Select a BIC instead of AND if the immediate can be encoded more optimally negated
If an immediate is only used in an AND node, it is possible that the immediate can be more optimally materialized when negated. If this is the case, we can negate the immediate and use a BIC instead;
int i(int a) {
return a & 0xfffffeec;
}
Used to produce:
ldr r1, [CONSTPOOL]
ands r0, r1
CONSTPOOL: 0xfffffeec
And now produces:
movs r1, #255
adds r1, #20 ; Less costly immediate generation
bics r0, r1
llvm-svn: 274510
This patch adds support for including the avx512 mask register information in the mask/maskz versions of shuffle instruction comments.
This initial version just adds support for MOVDDUP/MOVSHDUP/MOVSLDUP to reduce the mass of test regenerations, other shuffle instructions can be added in due course.
Differential Revision: http://reviews.llvm.org/D21953
llvm-svn: 274459
This could of course be a simple binary search with no global state
involved at all if someone cares enough. Just don't make everyone
linking the hexagon backend pay for it on process startup and shutdown.
llvm-svn: 274437
Due to visit order problems, in the case of an unaligned copy
the legalized DAG fails to eliminate extra instructions introduced
by the expansion of both unaligned parts.
llvm-svn: 274397
There was a combine before to handle the simple copy case.
Split this into handling loads and stores separately.
We might want to change how this handles some of the vector
extloads, since this can result in large code size increases.
llvm-svn: 274394
Enable testing different scheduling variants if sgpr usage
is very high. It was previously disabled because of a bug
in handleMove, but it has been fixed since.
Patch by Axel Davy
llvm-svn: 274372
Summary: The code generation should be independent of the debug info.
Reviewers: zansari, davidxl, mkuper, majnemer
Subscribers: majnemer, llvm-commits
Differential Revision: http://reviews.llvm.org/D21911
llvm-svn: 274357
No functional changes. Just created wrapper classes around the 3
and 4 reg mult and mac instruction classes.
Differential Revision: http://reviews.llvm.org/D21549
llvm-svn: 274347
This was reverted in r268740 because of problems with corresponding Clang change.
Clang change was updated and resubmitted in r274220.
Check calling convention in AMDGPUMachineFunction::isKernel
This will be used for AMDGPU_HSA_KERNEL symbol type in output ELF.
Also, in the future unused non-kernels may be optimized.
Reviewers: tstellarAMD, arsenm
Subscribers: arsenm, joker.eph, llvm-commits
Differential Revision: http://reviews.llvm.org/D19917
llvm-svn: 274341
Summary: dst_sel and dst_unused disabled for VOPC as they have no effect on result
Reviewers: artem.tamazov, tstellarAMD, vpykhtin
Subscribers: arsenm, kzhuravl
Differential Revision: http://reviews.llvm.org/D21376
llvm-svn: 274340
For the most part this simplifies all callers. There were two places in X86 that needed an explicit makeArrayRef to shorten a statically sized array.
llvm-svn: 274337
Change all the methods in LiveVariables that expect non-null
MachineInstr* to take MachineInstr& and update the call sites. This
clarifies the API, and designs away a class of iterator to pointer
implicit conversions.
llvm-svn: 274319
TargetSubtargetInfo::overrideSchedPolicy takes two MachineInstr*
arguments (begin and end) that invite implicit conversions from
MachineInstrBundleIterator. One option would be to change their type to
an iterator, but since they don't seem to have been used since the API
was added in 2010, I'm deleting the dead code.
llvm-svn: 274304
This is a mechanical change to make TargetLowering API take MachineInstr&
(instead of MachineInstr*), since the argument is expected to be a valid
MachineInstr. In one case, changed a parameter from MachineInstr* to
MachineBasicBlock::iterator, since it was used as an insertion point.
As a side effect, this removes a bunch of MachineInstr* to
MachineBasicBlock::iterator implicit conversions, a necessary step
toward fixing PR26753.
llvm-svn: 274287
This processor feature had been left out by mistake from the z13
ProcessorModel.
This time with updated test case. Thanks, Hans.
Reviewed by Ulrich Weigand.
llvm-svn: 274216
This is mostly a mechanical change to make TargetInstrInfo API take
MachineInstr& (instead of MachineInstr* or MachineBasicBlock::iterator)
when the argument is expected to be a valid MachineInstr. This is a
general API improvement.
Although it would be possible to do this one function at a time, that
would demand a quadratic amount of churn since many of these functions
call each other. Instead I've done everything as a block and just
updated what was necessary.
This is mostly mechanical fixes: adding and removing `*` and `&`
operators. The only non-mechanical change is to split
ARMBaseInstrInfo::getOperandLatencyImpl out from
ARMBaseInstrInfo::getOperandLatency. Previously, the latter took a
`MachineInstr*` which it updated to the instruction bundle leader; now,
the latter calls the former either with the same `MachineInstr&` or the
bundle leader.
As a side effect, this removes a bunch of MachineInstr* to
MachineBasicBlock::iterator implicit conversions, a necessary step
toward fixing PR26753.
Note: I updated WebAssembly, Lanai, and AVR (despite being
off-by-default) since it turned out to be easy. I couldn't run tests
for AVR since llc doesn't link with it turned on.
llvm-svn: 274189
[x86] (PR15455) While (ins|outs)[bwld] instructions do not take %dx as a
memory operand, various unofficial references do and objdump
disassembles to this format. Extend special treatment of
similar (in|out)[bwld] operations.
Reviewers: craig.topper, rnk, ab
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D18837
llvm-svn: 274152
When lowering two blended PACKUS, we used to disregard the types
of the PACKUS inputs, indiscriminately generating a v16i8 PACKUS.
This leads to non-selectable things like:
(v16i8 (PACKUS (v4i32 v0), (v4i32 v1)))
Instead, check that the PACKUSes have the same type, and use that
as the final result type.
llvm-svn: 274138
Summary:
This fixes bug: https://llvm.org/bugs/show_bug.cgi?id=28282
Currently the cost model of constant hoisting checks the bit width of the data type of the constants.
However, the actual immediate value is small enough and not need to be hoisted.
This patch checks for the actual bit width needed for the constant.
Reviewers: t.p.northover, rengolin
Subscribers: aemerson, rengolin, llvm-commits
Differential Revision: http://reviews.llvm.org/D21668
llvm-svn: 274073
Summary: LLVM assumes that large clearance will hide the partial register spill penalty. But in our experiment, 16 clearance is too small. As the inserted XOR is normally fairly cheap, we should have a higher clearance threshold to aggressively insert XORs that is necessary to break partial register dependency.
Reviewers: wmi, davidxl, stoklund, zansari, myatsina, RKSimon, DavidKreitzer, mkuper, joerg, spatel
Subscribers: davidxl, llvm-commits
Differential Revision: http://reviews.llvm.org/D21560
llvm-svn: 274068
Summary: SystemZ shift instructions only use the last 6 bits of the shift
amount. When the result of an AND operation is used as a shift amount, this
means that we can use the NILL instruction (which operates on the last 16 bits)
rather than NILF (which operates on the last 32 bits) for a 16-bit savings in
instruction size.
Reviewers: uweigand
Subscribers: llvm-commits
Author: colpell
Committing on behalf of Elliot.
Differential Revision: http://reviews.llvm.org/D21686
llvm-svn: 274066
I think this converts all the simple cases that really just care about
the generated code being position independent or not. The remaining
uses are a bit more complicated and are checking things like "is this
a library or executable" or "can this symbol be preempted".
llvm-svn: 274055