The SSE41 v8i16 shift lowering using (v)pblendvb is great for non-constant shift amounts, but if it is constant then we can efficiently reduce the VSELECT to shuffles with the pre-SSE41 lowering.
llvm-svn: 263383
This patch corresponds to review:
http://reviews.llvm.org/D17712
We were not clearing the TOC vector in PPCAsmPrinter when initializing it. This
caused duplicate definition asserts when the pass is reused on the module
(i.e. with -compile-twice or in JIT contexts).
llvm-svn: 263338
cmpxchg[8|16]b uses RBX as one of its argument.
In other words, using this instruction clobbers RBX as it is defined to hold one
the input. When the backend uses dynamically allocated stack, RBX is used as a
reserved register for the base pointer.
Reserved registers have special semantic that only the target understands and
enforces, because of that, the register allocator don’t use them, but also,
don’t try to make sure they are used properly (remember it does not know how
they are supposed to be used).
Therefore, when RBX is used as a reserved register but defined by something that
is not compatible with that use, the register allocator will not fix the
surrounding code to make sure it gets saved and restored properly around the
broken code. This is the responsibility of the target to do the right thing with
its reserved register.
To fix that, when the base pointer needs to be preserved, we use a different
pseudo instruction for cmpxchg that save rbx.
That pseudo takes two more arguments than the regular instruction:
- One is the value to be copied into RBX to set the proper value for the
comparison.
- The other is the virtual register holding the save of the value of RBX as the
base pointer. This saving is done as part of isel (i.e., we emit a copy from
rbx).
cmpxchg_save_rbx <regular cmpxchg args>, input_for_rbx_reg, save_of_rbx_as_bp
This gets expanded into:
rbx = copy input_for_rbx_reg
cmpxchg <regular cmpxchg args>
rbx = save_of_rbx_as_bp
Note: The actual modeling of the pseudo is a bit more complicated to make sure
the interferes that appears after the pseudo gets expanded are properly modeled
before that expansion.
This fixes PR26883.
llvm-svn: 263325
commit ae14bf6488e8441f0f6d74f00455555f6f3943ac
Author: Mehdi Amini <mehdi.amini@apple.com>
Date: Fri Mar 11 17:15:50 2016 +0000
Remove PreserveNames template parameter from IRBuilder
Summary:
Following r263086, we are now relying on a flag on the Context to
discard Value names in release builds.
Reviewers: chandlerc
Subscribers: mzolotukhin, llvm-commits
Differential Revision: http://reviews.llvm.org/D18023
From: Mehdi Amini <mehdi.amini@apple.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263258
91177308-0d34-0410-b5e6-96231b3b80d8
until we can figure out what to do about clang and Release build testing.
This reverts commit 263258.
llvm-svn: 263321
Improve vector extension of vectors on hardware without dedicated VSEXT/VZEXT instructions.
We already convert these to SIGN_EXTEND_VECTOR_INREG/ZERO_EXTEND_VECTOR_INREG but can further improve this by using the legalizer instead of prematurely splitting into legal vectors in the combine as this only properly helps for lowering to VSEXT/VZEXT.
Removes a lot of unnecessary any_extend + mask pattern - (Fix for PR25718).
Differential Revision: http://reviews.llvm.org/D17932
llvm-svn: 263303
Summary:
Following r263086, we are now relying on a flag on the Context to
discard Value names in release builds.
Reviewers: chandlerc
Subscribers: mzolotukhin, llvm-commits
Differential Revision: http://reviews.llvm.org/D18023
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 263258
Its not enough that we test for SSSE3 - that's only OK for 128-bit vectors - we also need to test for AVX2 / AVX512BW for 256/512 bit vector cases.
llvm-svn: 263239
The constant is now at source operand 1 (previously at 2).
This is also how it is in legacy AMD sp3 assembler.
Update tests.
Differential Revision: http://reviews.llvm.org/D17984
llvm-svn: 263212
tests to run GVN in both modes.
This is mostly the boring refactoring just like SROA and other complex
transformation passes. There is some trickiness in that GVN's
ValueNumber class requires hand holding to get to compile cleanly. I'm
open to suggestions about a better pattern there, but I tried several
before settling on this. I was trying to balance my desire to sink as
much implementation detail into the source file as possible without
introducing overly many layers of abstraction.
Much like with SROA, the design of this system is made somewhat more
cumbersome by the need to support both pass managers without duplicating
the significant state and logic of the pass. The same compromise is
struck here.
I've also left a FIXME in a doxygen comment as the GVN pass seems to
have pretty woeful documentation within it. I'd like to submit this with
the FIXME and let those more deeply familiar backfill the information
here now that we have a nice place in an interface to put that kind of
documentaiton.
Differential Revision: http://reviews.llvm.org/D18019
llvm-svn: 263208
Frontend authors are strongly encouraged to keep allocas
in the entry block, so don't bother visiting every instruction
in the other blocks of the function.
llvm-svn: 263206
Looking at the IR definition of a masked load made me realize
there was no reason to use a shuffle here, so we don't need
to convert the format of the mask at all.
llvm-svn: 263167
Generalise the existing SIGN_EXTEND to SIGN_EXTEND_VECTOR_INREG combine to support zero extension as well and get rid of a lot of unnecessary ANY_EXTEND + mask patterns.
Reapplied with a fix for PR26870 (avoid premature use of TargetConstant in ZERO_EXTEND_VECTOR_INREG expansion).
Differential Revision: http://reviews.llvm.org/D17691
llvm-svn: 263159
Summary:
They correspond to BUFFER_LOAD/STORE_FORMAT_XYZW and will be used by Mesa
to implement the GL_ARB_shader_image_load_store extension.
The intention is that for llvm.amdgcn.buffer.load.format, LLVM will decide
whether one of the _X/_XY/_XYZ opcodes can be used (similar to image sampling
and loads). However, this is not currently implemented.
For llvm.amdgcn.buffer.store, LLVM cannot decide to use one of the "smaller"
opcodes and therefore the intrinsic is overloaded. Currently, only the v4f32
is actually implemented since GLSL also only has a vec4 variant of the store
instructions, although it's conceivable that Mesa will want to be smarter
about this in the future.
BUFFER_LOAD_FORMAT_XYZW is already exposed via llvm.SI.vs.load.input, which
has a legacy name, pretends not to access memory, and does not capture the
full flexibility of the instruction.
Reviewers: arsenm, tstellarAMD, mareko
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D17277
llvm-svn: 263140
When trying to replace an add to esp with pops, we need to choose dead
registers to pop into. Registers clobbered by the call and not imp-def'd
by it should be safe. Except that it's not enough to check the register
itself isn't defined, we also need to make sure no overlapping registers
are defined either.
This fixes PR26711.
Differential Revision: http://reviews.llvm.org/D18029
llvm-svn: 263139
Summary:
Peephole optimization that generates a single TBZ/TBNZ instruction
for test and branch sequences like in the example below. This handles
the cases that miss folding of AND into TBZ/TBNZ during ISelLowering of BR_CC
Examples:
and w8, w8, #0x400
cbnz w8, L1
to
tbnz w8, #10, L1
Reviewers: MatzeB, jmolloy, mcrosier, t.p.northover
Subscribers: aemerson, rengolin, mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D17942
llvm-svn: 263136
This patch adds Cortex-R8 to Target Parser and TableGen.
It also adds CodeGen tests for the build attributes.
Patch by Pablo Barrio.
Differential Revision: http://reviews.llvm.org/D17925
llvm-svn: 263132
The initial change was insufficiently complete for always getting the semantics
of __builtin_longjmp correct. The builtin is translated into a
`tInt_eh_sjlj_longjmp` DAG node. This node set R7 as clobbered. However, the
code would then follow up with a clobber of R11. I had failed to notice the
imp-def,kill on R7 in the isel. Unfortunately, it seems that it is not possible
to conditionalise the Defs list via an !if. Instead, construct a new parallel
WIN node and prefer that when targeting windows. This ensures that we now both
correctly model the __builtin_longjmp as well as construct the frame in a more
ABI conformant manner.
llvm-svn: 263123
WoA uses r11 as the FP even though it is a pure thumb-2 environment in contrast
to AAPCS which states r7. This adjusts __builtin_longjmp to not clobber r7 and
to properly restore the frame pointer on execution.
llvm-svn: 263118
This patch reorders the combining of target shuffle masks so that when a unary shuffle takes a binary shuffle as its input but only references one of its inputs it can correctly combine into a unary shuffle mask.
This is starting to encroach on the purpose of resolveTargetShuffleInputs, but I don't want to remove it until we definitely know we won't need it for full binary shuffle combining.
There is a lot more work before we can properly support binary target shuffle masks but this was an easy case to add support for.
Differential Revision: http://reviews.llvm.org/D17858
llvm-svn: 263102
Operation SCALAR_TO_VECTOR for v64i8 and v32i16 should be lowered if BW feature is "on".
Differential Revision: http://reviews.llvm.org/D17994
llvm-svn: 263097
This change adds a support for a preserve_most calling convention to the AArch64 backend, similar to how it was done for X86-64.
There is also a subsequent patch on top of this one to add a tail-calls support for this calling convention.
Differential Revision: http://reviews.llvm.org/D18016
llvm-svn: 263092
The irony of this patch is that one CPU that is affected is AMD Jaguar, and Jaguar
has a completely double-pumped AVX implementation. But getting the cost model to
reflect that is a much bigger problem. The small goal here is simply to improve on
the lie that !AVX2 == SandyBridge.
Differential Revision: http://reviews.llvm.org/D18000
llvm-svn: 263069
Instead of a variable-blend instruction, form a blend with immediate because those are always cheaper.
Differential Revision: http://reviews.llvm.org/D17899
llvm-svn: 263067
This will allow inline assembler code to utilize these features, but no automatic lowering is provided, except for the previously provided @llvm.trap, which lowers to "ta 5".
The change also separates out the different assembly language syntaxes for V8 and V9 Sparc. Previously, only V9 Sparc assembly syntax was provided.
The change also corrects the selection order of trap disassembly, allowing, e.g. "ta %g0 + 15" to be rendered, more readably, as "ta 15", ignoring the %g0 register. This is per the sparc v8 and v9 manuals.
Check-in includes many extra unit tests to check this works correctly on both V8 and V9 Sparc processors.
Code Reviewed at http://reviews.llvm.org/D17960.
llvm-svn: 263044
Supprot DPP syntax as used in SP3 (except several operands syntax).
Added dpp-specific operands in td-files.
Added DPP flag to TSFlags to determine if instruction is dpp in InstPrinter.
Support for VOP2 DPP instructions in td-files.
Some tests for DPP instructions.
ToDo:
- VOP2bInst:
- vcc is considered as operand
- AsmMatcher doesn't apply mnemonic aliases when parsing operands
- v_mac_f32
- v_nop
- disable instructions with 64-bit operands
- change dpp_ctrl assembler representation to conform sp3
Review: http://reviews.llvm.org/D17804
llvm-svn: 263008
s_setpc_b64 has just one 64-bit source which is the address of instruction to jump to.
Differential Revision: http://reviews.llvm.org/D17888
llvm-svn: 263005
This implements a very simple conservative transformation that doesn't
require more than linear code size growth. There's room for much more
optimization in this space.
llvm-svn: 262982
The fix consisting in using the library call for atomic compare and swap when
the instruction is not safe to use may be incorrect. Indeed the library call may
not exist on all platform. In other words, we need a better fix!
llvm-svn: 262943
Until now curly braces could only be used in MS inline assembly to mark block start/end.
All curly braces were removed completely at a very early stage.
This approach caused bugs like:
"m{o}v eax, ebx" turned into "mov eax, ebx" without any error.
In addition, AVX-512 added special operands (e.g., k registers), which are also surrounded by curly braces that mark them as such.
Now, we need to keep the curly braces and identify at a later stage if they are marking block start/end (if so, ignore them), or surrounding special AVX-512 operands (if so, parse them as such).
This patch fixes the bug described above and enables the use of AVX-512 special operands.
This commit is the the llvm part of the patch.
The clang part of the review is: http://reviews.llvm.org/D17766
The llvm part of the review is: http://reviews.llvm.org/D17767
Differential Revision: http://reviews.llvm.org/D17767
llvm-svn: 262843
Patch to add support for target shuffle combining of X86ISD::VPERMV3 nodes, including support for detecting unary shuffles.
This uncovered several issues with the X86ISD::VPERMV3 shuffle mask decoding of non-64 bit shuffle mask elements - the bit masking wasn't being correctly computed.
Removed non-constant pool mask decode path as we have no way of testing it right now.
Differential Revision: http://reviews.llvm.org/D17916
llvm-svn: 262809
Added support for decoding VPERMILPS variable shuffle masks that aren't in the constant pool.
Added target shuffle mask decoding for SCALAR_TO_VECTOR+VZEXT_MOVL cases - these can happen for v2i64 constant re-materialization
Followup to D17681
llvm-svn: 262784
btver1 is a SSSE3/SSE4a only CPU - it doesn't have AVX and doesn't support XSAVE.
Differential Revision: http://reviews.llvm.org/D17683
llvm-svn: 262782
When the lowering of the setjmp intrinsic requires
a global base pointer to be set, make sure such pointer
gets defined by the CGBR pass.
This fixes PR26742.
llvm-svn: 262762
cmpxchgXXb uses RBX as one of its implicit argument. I.e., when
we use that instruction we need to clobber RBX. This is generally
fine, expect when RBX is a reserved register because in that case,
the register allocator will not track its value and will not
save and restore it when interferences occur.
rdar://problem/24851412
llvm-svn: 262759
The x86 ret instruction has a 16 bit immediate indicating how many bytes
to pop off of the stack beyond the return address.
There is a problem when extremely large structs are passed by value: we
might not be able to fit the number of bytes to pop into the return
instruction.
To fix this, expand RET_FLAG a little later and use a special sequence
to clean the stack:
pop %ecx ; return address is now in %ecx
add $n, %esp ; clean the stack
push %ecx ; bring the return address back on the stack
ret ; pop the return address and jmp to it's value
llvm-svn: 262755
When div+rem calls on the same arguments are found, the ARM back-end merges the
two calls into one __aeabi_divmod call for up to 32-bits values. However,
for 64-bit values, which also have a lib call (__aeabi_ldivmod), it wasn't
merging the calls, and thus calling ldivmod twice and spilling the temporary
results, which generated pretty bad code.
This patch legalises 64-bit lib calls for divmod, so that now all the spilling
and the second call are gone. It also relaxes the DivRem combiner a bit on the
legal type check, since it was already checking for isLegalOrCustom on every
value, so the extra check for isTypeLegal was redundant.
Second attempt, creating TLI.isOperationCustom like isOperationExpand, to make
sure we only emit valid types or the ones that were explicitly marked as custom.
Now, passing check-all and test-suite on x86, ARM and AArch64.
This patch fixes PR17193 (and a long time FIXME in the tests).
llvm-svn: 262738
Summary:
This is necessary for when we run out of VGPRs and can no
longer use v_{read,write}_lane for spilling SGPRs.
Reviewers: arsenm
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D17592
llvm-svn: 262732
Summary:
This allows us to use virtual registers when we need extra registers
for inserting spill instructions in SIRegisterInfo:eliminateFrameIndex().
Once all the frame indices have been eliminated, the
PrologEpilogueInserter does an extra pass over the program to replace
all virtual registers with physical ones.
This allows us to make more efficient use of our emergency spill slots,
so we only need to create one.
Reviewers: arsenm
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D17591
llvm-svn: 262728
These correspond to IMAGE_ATOMIC_* and are going to be used by Mesa for the
GL_ARB_shader_image_load_store extension.
Initial change by Nicolai H.hnle
Differential Revision: http://reviews.llvm.org/D17401
llvm-svn: 262701
The variable mask form of VPERMILPD/VPERMILPS were only partially implemented, with much of it still performed as an intrinsic.
This patch properly defines the instructions in terms of X86ISD::VPERMILPV, permitting the opcode to be easily combined as a target shuffle.
Differential Revision: http://reviews.llvm.org/D17681
llvm-svn: 262635
Fixed the ordering to check first for X86 interrupt handler then for MCU target.
Differential Revision: http://reviews.llvm.org/D17801
llvm-svn: 262628
That's not the case for VPERMV/VPERMV3, which cover all possible
combinations (the C intrinsics use a different order; the AVX vs
AVX512 intrinsics are different still).
Since:
r246981 AVX-512: Lowering for 512-bit vector shuffles.
VPERMV is recognized in getTargetShuffleMask.
This breaks assumptions in most callers, as they expect
the non-mask operands to start at index 0.
VPERMV has the mask as operand #0; VPERMV3 has it in the middle.
Instead of the faulty assumption, have getTargetShuffleMask return
its operands as well.
One alternative we considered was to change the operand order of
VPERMV, but we agreed to stick to the instruction order, as there
are more AVX512 weirdness to cover (vpermt2/vpermi2 in particular).
Differential Revision: http://reviews.llvm.org/D17041
llvm-svn: 262627
Generalise the existing SIGN_EXTEND to SIGN_EXTEND_VECTOR_INREG combine to support zero extension as well and get rid of a lot of unnecessary ANY_EXTEND + mask patterns.
Differential Revision: http://reviews.llvm.org/D17691
llvm-svn: 262599
Patch by: Konstantin Zhuravlyov
Summary: Tools, such as debugger, need to pause execution based on user input (i.e. breakpoint). In order to do this, two S_NOP instructions are inserted for each high level source statement: one before first isa instruction of high level source statement, and one after last isa instruction of high level source statement. Further, debugger may replace S_NOP instructions with S_TRAP instructions based on user input.
Reviewers: tstellarAMD, arsenm
Subscribers: echristo, dblaikie, arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D17454
llvm-svn: 262579
Summary:
When there were no free SGPRs, we were trying to move this value into
some of the reserved registers which was causing a segmentation fault.
Reviewers: arsenm
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D17590
llvm-svn: 262577
The code was previously not able to track a boolean argument
at a call site back to the formal argument of the caller.
Differential Revision: http://reviews.llvm.org/D17786
llvm-svn: 262575
Catch objects with a displacement of zero do not initialize a catch
object. The displacement is relative to %rsp at the end of the
function's prologue for x86_64 targets.
If we place an object at the top-of-stack, we will end up wit a
displacement of zero resulting in our catch object remaining
uninitialized.
Address this by creating our catch objects as fixed objects. We will
ensure that the UnwindHelp object is created after the catch objects so
that no catch object will have a displacement of zero.
Differential Revision: http://reviews.llvm.org/D17823
llvm-svn: 262546
When div+rem calls on the same arguments are found, the ARM back-end merges the
two calls into one __aeabi_divmod call for up to 32-bits values. However,
for 64-bit values, which also have a lib call (__aeabi_ldivmod), it wasn't
merging the calls, and thus calling ldivmod twice and spilling the temporary
results, which generated pretty bad code.
This patch legalises 64-bit lib calls for divmod, so that now all the spilling
and the second call are gone. It also relaxes the DivRem combiner a bit on the
legal type check, since it was already checking for isLegalOrCustom on every
value, so the extra check for isTypeLegal was redundant.
This patch fixes PR17193 (and a long time FIXME in the tests).
llvm-svn: 262507
This reverts commit r262370.
It turns out there is code out there that does sequences of allocas
greater than 4K: http://crbug.com/591404
The goal of this change was to improve the code size of inalloca call
sequences, but we got tangled up in the mess of dynamic allocas.
Instead, we should come back later with a separate MI pass that uses
dominance to optimize the full sequence. This should also be able to
remove the often unneeded stacksave/stackrestore pairs around the call.
llvm-svn: 262505
Most of the time ARM has the CCR.UNALIGN_TRP bit set to false which
means that unaligned loads/stores do not trap and even extensive testing
will not catch these bugs. However the multi/double variants are not
affected by this bit and will still trap. In effect a more aggressive
load/store optimization will break existing (bad) code.
These bugs do not necessarily manifest in the broken code where the
misaligned pointer is formed but often later in perfectly legal code
where it is accessed. This means recompiling system libraries (which
have no alignment bugs) with a newer compiler will break existing
applications (with alignment bugs) that worked before.
So (under protest) I implemented this safe mode which limits the
formation of multi/double operations to cases that are not affected by
user code (stack operations like spills/reloads) or cases where the
normal operations trap anyway (floating point load/stores). It is
disabled by default.
Differential Revision: http://reviews.llvm.org/D17015
llvm-svn: 262504
Summary:
This change enables frame pointer elimination in non-leaf functions.
The -fomit-frame-pointer option still needs to be used when compiling
via clang (or an equivalent method of not setting the
'no-frame-pointer-elim*' function attributes if generating llvm IR via
some other method) to take advantage of this optimization.
This change should be NFC when compiling via clang without
-fomit-frame-pointer.
Reviewers: t.p.northover
Subscribers: aemerson, rengolin, tberghammer, qcolombet, llvm-commits, danalbert, mcrosier, srhines
Differential Revision: http://reviews.llvm.org/D17730
llvm-svn: 262495
We have a number of useful lowering strategies for VBROADCAST instructions (both from memory and register element 0) which the 128-bit form of the MOVDDUP instruction can make use of.
This patch tweaks lowerVectorShuffleAsBroadcast to enable it to broadcast 2f64 args using MOVDDUP as well.
It does require a slight tweak to the lowerVectorShuffleAsBroadcast mechanism as the existing MOVDDUP lowering uses isShuffleEquivalent which can match binary shuffles that can lower to (unary) broadcasts.
Differential Revision: http://reviews.llvm.org/D17680
llvm-svn: 262478
This is going to be used in .hsatext disassembler and can be used
in current assembler parser (lit tests passed on parsing).
Code using this helpers isn't included in this patch.
Benefits:
unified approach
fast field name lookup on parsing
Later I would like to enhance some of the field naming/syntax using this code.
Patch by: Valery Pykhtin
Differential Revision: http://reviews.llvm.org/D17150
llvm-svn: 262473
We modeled the RDFLAGS{32,64} operations as "using" {E,R}FLAGS.
While technically correct, this is not be desirable for folks who want
to examine aspects of the FLAGS register which are not related to
computation like whether or not CPUID is a valid instruction.
Differential Revision: http://reviews.llvm.org/D17782
llvm-svn: 262465
For some instructions the register is not the last operand and the immediate handling had to detect this and hardcode the index to find it. It also required CurOp to be pointing at the last operand handled in the Form switch whereas for any instruction it would be pointing at the next operand.
Now we just capture the value in the Form switch when we know exactly where it is and the CurOp pointer can behave normally.
llvm-svn: 262462
Fix checking the same instruction twice instead of the
second branch that uses vccz. I don't think this matters
currently because s_branch_vccnz is always used currently.
llvm-svn: 262457
This adds some missing generic schedule info definitions, enables
completeness checking for cyclone and fixes a typo uncovered by that.
Differential Revision: http://reviews.llvm.org/D17748
llvm-svn: 262393
This isn't quite NFC because some of the SDLocs may change which could
cause scheduling differences. But no regression tests are affected and
there is no functional change intended.
llvm-svn: 262391
Revert r262248 in an attempt to fix the clang-native-aarch64-full
bot and to investigate a performance regression in
SingleSource/Benchmarks/CoyoteBench/huffbench
llvm-svn: 262388
This reverts commit r262316.
It seems that my change breaks an out-of-tree chromium buildbot, so
I'm reverting this in order to investigate the situation further.
llvm-svn: 262387
TableGen checks at compiletime that for scheduling models with
"CompleteModel = 1" one of the following holds:
- Is marked with the hasNoSchedulingInfo flag
- The instruction is a subclass of Sched
- There are InstRW definitions in the scheduling model
Typical steps necessary to complete a model:
- Ensure all pseudo instructions that are expanded before machine
scheduling (usually everything handled with EmitYYY() functions in
XXXTargetLowering).
- If a CPU does not support some instructions mark the corresponding
resource unsupported: "WriteRes<WriteXXX, []> { let Unsupported = 1; }".
- Add missing scheduling information.
Differential Revision: http://reviews.llvm.org/D17747
llvm-svn: 262384
Summary:
Tablegen was unable to determine that param loads/stores were actually
reading or writing from memory. I think this isn't a problem in
practice for param stores, because those occur in a block right before
we make our call. But param loads don't have to at the very beginning
of a function, so should be annotated as mayLoad so we don't incorrectly
optimize them.
Reviewers: jholewinski
Subscribers: jholewinski, llvm-commits
Differential Revision: http://reviews.llvm.org/D17471
llvm-svn: 262381
Summary: Looks like this was caused by a typo.
Reviewers: jholewinski
Subscribers: jholewinski, llvm-commits, tra
Differential Revision: http://reviews.llvm.org/D17357
llvm-svn: 262380
Summary:
Calls sometimes need to be convergent. This is already handled at the
LLVM IR level, but it also needs to be handled at the MI level.
Ideally we'd propagate convergence from instructions, down through the
selection DAG, and into MIs. But this is Hard, and would affect
optimizations in the SDNs -- right now only SDNs with two operands have
any flags at all.
Instead, here's a much simpler hack: Add new opcodes for NVPTX for
convergent calls, and generate these when lowering convergent LLVM
calls.
Reviewers: jholewinski
Subscribers: jholewinski, chandlerc, joker.eph, jhen, tra, llvm-commits
Differential Revision: http://reviews.llvm.org/D17423
llvm-svn: 262373
Summary:
Also simplify some of the embedded C++ logic.
No functional changes.
Reviewers: jholewinski
Subscribers: llvm-commits, tra, jholewinski
Differential Revision: http://reviews.llvm.org/D17354
llvm-svn: 262371
The _chkstk function is called by the compiler to probe the stack in an
order consistent with Windows' expectations. However, it is possible to
elide the call to _chkstk and manually adjust the stack pointer if we
can prove that the allocation is fixed size and smaller than the probe
size.
This shrinks chrome.dll, chrome_child.dll and chrome.exe by a
cummulative ~133 KB.
Differential Revision: http://reviews.llvm.org/D17679
llvm-svn: 262370
Summary:
This patch impleemnts DS_PERMUTE/DS_BPERMUTE instruction definitions and intrinsics,
which are new since VI.
Reviewers: tstellarAMD, arsenm
Subscribers: llvm-commits, arsenm
Differential Revision: http://reviews.llvm.org/D17614
llvm-svn: 262356
In the code below on 32-bit targets, x would previously get forwarded to g()
without sign-extension to 32 bits as required by the parameter attribute.
void g(signed short);
void f(unsigned short x) {
g(x);
}
llvm-svn: 262352
Idea behind this change is to make code shorter and as much common for all targets as possible. Let's even accept more code than is valid for a particular target, leaving it for the assembler to sort out.
64bit instructions decoding added.
Error\warning messages on unrecognized instructions operands added, InstPrinter allowed to print invalid operands helping to find invalid/unsupported code.
The change is massive and hard to compare with previous version, so it makes sense just to take a look on the new version. As a bonus, with a few TD changes following, it disassembles the majority of instructions. Currently it fully disassembles >300K binary source of some blas kernel.
Previous TODOs were saved whenever possible.
Patch by: Valery Pykhtin
Differential Revision: http://reviews.llvm.org/D17720
llvm-svn: 262332
Summary:
This patch modifies the existing comparison, branch, conditional-move
and select patterns, and adds new ones where needed. Also, the updated
SLT{u,i,iu} set of instructions generate a GPR width result.
The majority of the code changes in the Mips back-end fix the wrong
assumption that the result of SETCC nodes always produce an i32 value.
The changes in the common code path account for the fact that in 64-bit
MIPS targets, i1 is promoted to i32 instead of i64.
Reviewers: dsanders
Subscribers: dsanders, llvm-commits
Differential Revision: http://reviews.llvm.org/D10970
llvm-svn: 262316
Previosy, if actual instruction have one of optional operands then other optional operands listed before this also should be presented.
For example instruction v_fract_f32 v0, v1, mul:2 have one optional operand - OMod and do not have optional operand clamp. Previously this was not allowed because clamp is listed before omod in AsmString:
string AsmString = "v_fract_f32$vdst, $src0_modifiers$clamp$omod";
Making this work required some hacks (both OMod and Clamp match classes have same PredicateMethod).
Now, if MatchInstructionImpl meets formal optional operand that is not presented in actual instruction it skips this formal operand and tries to match current actual operand with next formal.
Patch by: Sam Kolton
Review: http://reviews.llvm.org/D17568
[AMDGPU] Assembler: Check immediate types for several optional operands in predicate methods
With this change you should place optional operands in order specified by asm string:
clamp -> omod
offset -> glc -> slc -> tfe
Fixes for several tests.
Depends on D17568
Patch by: Sam Kolton
Review: http://reviews.llvm.org/D17644
llvm-svn: 262314
Technically you aren't supposed to emit these after type legalization
for some reason, and we use vector extracts of bitcasted integers
as the canonical way to do this.
llvm-svn: 262298
This currently does not have the control over the bitwidth,
and there are missing optimizations to reduce the integer to
32-bit if it can be.
But in most situations we do want the sinking to occur.
llvm-svn: 262296
This is long-standing dirtiness, as acknowledged by r77582:
The current trick is to select it into a merge_values with
the first definition being an implicit_def. The proper solution is
to add new ISD opcodes for the no-output variant.
Doing this before selection will let us combine away some constructs.
Differential Revision: http://reviews.llvm.org/D17659
llvm-svn: 262244
32-bit X86 EH on Windows utilizes a stack of registration nodes
allocated and deallocated on entry/exit. A registration node contains a
bunch of EH personality specific information like which try-state we are
currently in.
Because a setjmp target allows control flow from arbitrary program
points, there is no way to ensure that the try-state we are in is
correctly updated once we transfer control.
MSVC compatible compilers, like MSVC and ICC, utilize runtime helpers to
reinitialize the try-state when a longjmp occurs. This is implemented
by adding additional arguments to _setjmp3: the desired try-state and
a helper routine to update the try-state.
Differential Revision: http://reviews.llvm.org/D17721
llvm-svn: 262241
Corresponds to Phabricator review:
http://reviews.llvm.org/D16592
This fix includes both an update to how we handle the "generic" CPU on LE
systems as well as Anton's fix for the Fast Isel issue.
llvm-svn: 262233
Summary:
The bug was that dextu's operand 3 would print 0-31 instead of 32-63 when
printing assembly. This came up when replacing
MipsInstPrinter::printUnsignedImm() with a version that could handle arbitrary
bit widths.
MipsAsmPrinter::printUnsignedImm*() don't seem to be used so they have been
removed.
Reviewers: vkalintiris
Subscribers: dsanders, llvm-commits
Differential Revision: http://reviews.llvm.org/D15521
llvm-svn: 262231
Summary:
Previously, it would always select DEXT and substitute any invalid matches
for DEXTU/DEXTM during MipsMCCodeEmitter::encodeInstruction(). This works
but causes problems when adding range checked immediates to IAS.
Now isel selects the correct variant up front.
Reviewers: vkalintiris
Subscribers: dsanders, llvm-commits
Differential Revision: http://reviews.llvm.org/D16810
llvm-svn: 262229
The maximum private allocation for the whole GPU is 4G,
so the maximum possible index for a single workitem is the
maximum size divided by the smallest granularity for a dispatch.
This increases the number of known zero high bits, which
enables more offset folding. The maximum private size per
workitem with this is 128M but may be smaller still.
llvm-svn: 262153
Change MachineInstr API to prefer MachineInstr& over MachineInstr*
whenever the parameter is expected to be non-null. Slowly inching
toward being able to fix PR26753.
llvm-svn: 262149
In all but one case, change the DFAPacketizer API to take MachineInstr&
instead of MachineInstr*. In DFAPacketizer::endPacket(), take
MachineBasicBlock::iterator. Besides cleaning up the API, this is in
search of PR26753.
llvm-svn: 262142
Update APIs in MachineInstrBundle.h to take and return MachineInstr&
instead of MachineInstr* when the instruction cannot be null. Besides
being a nice cleanup, this is tacking toward a fix for PR26753.
llvm-svn: 262141
These are all co-processor registers, with the exception of the floating-point deferred-trap queue register.
Although these will not be lowered automatically by any instructions, it allows the use of co-processor
instructions implemented by inline-assembly.
Code Reviewed at http://reviews.llvm.org/D17133, with the exception of a very small change in brace placement in SparcInstrInfo.td,
which was formerly causing a problem in the disassembly of the %fq register.
llvm-svn: 262133
This matches the behavior of the HSAIL clock instruction.
s_realmemtime is used if the subtarget supports it, and falls
back to s_memtime if not.
Also introduces new intrinsics for each of s_memtime / s_memrealtime.
llvm-svn: 262119
Take MachineInstr by reference instead of by pointer in SlotIndexes and
the SlotIndex wrappers in LiveIntervals. The MachineInstrs here are
never null, so this cleans up the API a bit. It also incidentally
removes a few implicit conversions from MachineInstrBundleIterator to
MachineInstr* (see PR26753).
At a couple of call sites it was convenient to convert to a range-based
for loop over MachineBasicBlock::instr_begin/instr_end, so I added
MachineBasicBlock::instrs.
llvm-svn: 262115
Currently we always expand ISD::FNEG. For v4f32 and v2f64 vector types VSX has
native support for this opcode
Phabricator: http://reviews.llvm.org/D17647
llvm-svn: 262079
This is one of the cases shown in:
https://llvm.org/bugs/show_bug.cgi?id=26701
Shift and negate is what InstCombine appears to prefer, so I've started with that pattern.
Note that the 'pcmpeq' instructions are always generating the negative one for the actual
'pcmpgt' comparison in each case (side note: why isn't there an alias mnemonic for that?).
Differential Revision: http://reviews.llvm.org/D17630
llvm-svn: 262036
Add parsing and printing of image operands. Matches legacy sp3 assembler.
Change image instruction order to have data/image/sampler operands in the beginning. This is needed because optional operands in MC are always last.
Update SITargetLowering for new order.
Add basic MC test.
Update CodeGen tests.
Review: http://reviews.llvm.org/D17574
llvm-svn: 261995
Instead of the convoluted if-statment we can just use getColor. This also fixes
a bug where we relied upon the parity of tablegen-generated register indexes
(instead of using the machine encoding).
llvm-svn: 261990
Currently aligned is what is being used so remove the redundant patterns for the unaligned versions. But don't do this for the byte and word vector types since they don't have aligned versions.
llvm-svn: 261985
Summary:
Avoid special case for FP, LR CFI emission and just allow general
AArch64FrameLowering::emitCalleeSavedFrameMoves() to handle them. Also,
stop recalculating the stack offsets in emitCalleeSavedFrameMoves()
since we can just reuse the previously calculated offset stored in the
MachineFrameInfo.
Depends on D17000
Reviewers: t.p.northover, rengolin, mcrosier, jmolloy
Subscribers: aemerson, rengolin, mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D17004
llvm-svn: 261885
Support all instructions with VOP1 encoding with 32 or 64-bit operands for VI subtarget:
VGPR_32 and VReg_64 operand register classes
VS_32 and VS_64 operand register classes with inline and literal constants
Tests for VOP1 instructions.
Patch by: skolton
Reviewers: arsenm, tstellarAMD
Review: http://reviews.llvm.org/D17194
llvm-svn: 261878
Resubmit with index problem fixed. Verified with valgrind.
Prepare to support DPP encodings.
For DPP encodings, we want row_mask/bank_mask/bound_ctrl to be optional operands.
However this means that when parsing instruction which has no mnemonic prefix,
we cannot add both default values for VOP3 and for DPP optional operands
to OperandVector - neither instructions would match. So add default values
for optional operands to MCInst during conversion instead.
Mark more operands as IsOptional = 1 in .td files.
Do not add default values for optional operands to OperandVector in AMDGPUAsmParser.
Add default values for optional operands during conversion using new helper addOptionalImmOperand.
Change to cvtVOP3_2_mod to check instruction flag instead of presence of modifiers. In the future, cvtVOP3* functions can be combined into one.
Separate cvtFlat and cvtFlatAtomic.
Fix CNDMASK_B32 definition to have no modifiers.
Review: http://reviews.llvm.org/D17445
llvm-svn: 261856
Part 2 of 2
This patch add support for combining target shuffles into blends-with-zero.
Differential Revision: http://reviews.llvm.org/D17483
llvm-svn: 261745
Part 1 of 2
This patch attempts to replace the insertion of zero scalars with a vector blend with zero, avoiding the use of the integer insertion instructions (which are particularly slow on many targets).
(Part 2 will add support for combining multiple blends-with-zero).
Differential Revision: http://reviews.llvm.org/D17483
llvm-svn: 261743
Prepare to support DPP encodings.
For DPP encodings, we want row_mask/bank_mask/bound_ctrl to be optional operands. However this means that when parsing instruction which has no mnemonic prefix, we cannot add both default values for VOP3 and for DPP optional operands to OperandVector - neither instructions would match. So add default values for optional operands to MCInst during conversion instead.
Mark more operands as IsOptional = 1 in .td files.
Do not add default values for optional operands to OperandVector in AMDGPUAsmParser.
Add default values for optional operands during conversion using new helper addOptionalImmOperand.
Change to cvtVOP3_2_mod to check instruction flag instead of presence of modifiers. In the future, cvtVOP3* functions can be combined into one.
Separate cvtFlat and cvtFlatAtomic.
Fix CNDMASK_B32 definition to have no modifiers.
Review: http://reviews.llvm.org/D17445
Reviewers: tstellarAMD
llvm-svn: 261742
lit tests passed before and after because it doesn't test the binary representation of amd_kernel_code_t.
Patch by: Valery Pykhtin (Valery.Pykhtin@amd.com)
Reviewers: arsenm
llvm-svn: 261732
PerformShuffleCombine should be usable by unary and binary target shuffles, but was attempting to get the first two operands whatever the instruction type. Since these are only used for VECTOR_SHUFFLE instructions for one particular combine I've moved them inside the relevant if statement.
llvm-svn: 261727
This function is used in exactly one place, and only in asserts
builds. Move it a few lines up before the use and only define it when
asserts are enabled. Fixes the release build under -Werror.
Also remove the forward declaration and commentary that was basically
identical to the code itself.
llvm-svn: 261722
Looks like the global rename last year was a bit over-zealous. These things
really are referred to with ARM64 elsewhere (ld64, libunwind, ...).
llvm-svn: 261698
We were emitting only one half of a the paired relocations needed for these
instructions because we decided that an offset needed a scattered relocation.
In fact, movw/movt relocations can be paired without being scattered.
llvm-svn: 261679
Summary:
Currently, the ARM Constant Island may not converge (or not converge quickly).
This patch let it move to the closest water after the user if it doesn't converge after 15 iterations.
This address https://llvm.org/bugs/show_bug.cgi?id=25339
Reviewers: t.p.northover, srhines, kristof.beyls, aadg, rengolin
Subscribers: weimingz, aemerson, rengolin, llvm-commits
Differential Revision: http://reviews.llvm.org/D16890
llvm-svn: 261665
Implements a mostly-conventional redzone for the userspace
stack. Because we have unsigned load/store offsets we continue to use a
local SP subtracted from the incoming SP but do not write it back to
memory.
Differential Revision: http://reviews.llvm.org/D17525
llvm-svn: 261662
Summary:
Fix a bug in epilog generation where the incoming stack arguments were
not being popped for fastcc functions when -tailcallopt was passed.
Reviewers: t.p.northover, mcrosier, jmolloy, rengolin
Subscribers: aemerson, rengolin, mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D16894
llvm-svn: 261650
Summary:
If we want classify OoO or not, using getSchedModel().isOutOfOrder()
could be more proper way than using Subtarget->isLikeA9().
Reviewers: jmolloy, rengolin
Differential Revision: http://reviews.llvm.org/D17433
llvm-svn: 261623
src1 of s_bfe_u64 is 32-bit (same as s_bfe_i64).
src0 and src1 of s_bfm_b64 are 32-bit.
Update tests.
Review: http://reviews.llvm.org/D17480
Reviewers: arsenm
llvm-svn: 261621
Change TargetInstrInfo API to take `MachineInstr&` instead of
`MachineInstr*` in the functions related to predicated instructions
(I'll try to come back later and get some of the rest). All of these
functions require non-null parameters already, so references are more
clear. As a bonus, this happens to factor away a host of implicit
iterator => pointer conversions.
No functionality change intended.
llvm-svn: 261605
Previously the stack pointer was only written back to memory in the
prolog. But this is wrong for dynamic allocas, for which
target-independent codegen handles SP updates after the prolog (and
possibly even in another BB). Instead update the SP global in
ADJCALLSTACKDOWN which is generated after the SP update sequence.
This will have further refinements when we add red zone support.
llvm-svn: 261579
This is a little embarrassing.
When I reverted r261504 (getIterator() => getInstrIterator()) in
r261567, I did a `git grep` to see if there were new calls to
`getInstrIterator()` that I needed to migrate. There were 10-20 hits,
and I blindly did a `sed ...` before calling `ninja check`.
However, these were `MachineInstrBundleIterator::getInstrIterator()`,
which predated r261567. Perhaps coincidentally, these had an identical
name and return type.
This commit undoes my careless sed and restores
`MachineBasicBlock::iterator::getInstrIterator()`.
llvm-svn: 261577
LLVM converts adds into ors when it can prove that the operands don't share
any non-zero bits. Teach address folding to recognize or instructions with
constant operands with this property that can be folded into addresses as
if they were adds.
llvm-svn: 261562
This is what was meant to be in the initial commit to fix this bug. The
parens were missing. This commit also adds a test case for the bug and
has undergone full testing on PPC and X86.
llvm-svn: 261546
This reverts commit r261510, effectively reapplying r261509. The
original commit missed a caller in AArch64ConditionalCompares.
Original commit message:
Pass non-null arguments by reference in MachineTraceMetrics::Trace,
simplifying future work to remove implicit iterator => pointer
conversions.
llvm-svn: 261511
Delete MachineInstr::getIterator(), since the term "iterator" is
overloaded when talking about MachineInstr.
- Downcast to ilist_node in iplist::getNextNode() and getPrevNode() so
that ilist_node::getIterator() is still available.
- Add it back as MachineInstr::getInstrIterator(). This matches the
naming in MachineBasicBlock.
- Add MachineInstr::getBundleIterator(). This is explicitly called
"bundle" (not matching MachineBasicBlock) to disintinguish it clearly
from ilist_node::getIterator().
- Update all calls. Some of these I switched to `auto` to remove
boiler-plate, since the new name is clear about the type.
There was one call I updated that looked fishy, but it wasn't clear what
the right answer was. This was in X86FrameLowering::inlineStackProbe(),
added in r252578 in lib/Target/X86/X86FrameLowering.cpp. I opted to
leave the behaviour unchanged, but I'll reply to the original commit on
the list in a moment.
llvm-svn: 261504
I missed == and != when I removed implicit conversions between iterators
and pointers in r252380 since they were defined outside ilist_iterator.
Since they depend on getNodePtrUnchecked(), they indirectly rely on UB.
This commit removes all uses of these operators. (I'll delete the
operators themselves in a separate commit so that it can be easily
reverted if necessary.)
There should be NFC here.
llvm-svn: 261498
Add support for the case where we have a consecutive load (which must include the first + last elements) with a mixture of undef/zero elements. We load the vector and then apply a shuffle to clear the zero'd elements.
Differential Revision: http://reviews.llvm.org/D17297
llvm-svn: 261490
Summary:
- Rename `"skylake"` == SkylakeServerProc to `"skylake-avx512"`
- Change `"skylake"` to denote SkylakeClientProc
- Fix the detection of cpu family 6 and model 94 to be
SkylakeClientProc instead of SkylakeServerProc
- Remove the `"cnl"` for CannonLake
Reviewers: craig.topper, delena
Subscribers: zansari, echristo, qcolombet, RKSimon, spatel, DavidKreitzer, mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D17090
llvm-svn: 261482
COFF doesn't have sections with mergeable contents. Instead, each
constant pool entry ends up in a COMDAT section. The linker, when
choosing between COMDAT sections, doesn't choose the max alignment of
the two sections. You just get whatever alignment was on the section.
If one constant needed a higher alignment in one object file from
another one, then we will get into trouble if the linker chooses the
lower alignment one.
Instead, lets promote the alignment of the constant pool entry to make
sure we don't use an under aligned constant with an instruction which
assumed otherwise.
This fixes PR26680.
llvm-svn: 261462
The stack pointer is bumped when there is a frame pointer or when there
are static-size objects, but was only getting written back when there
were static-size objects.
llvm-svn: 261453
The patch has a necessary call to a function inside an assert. Which is fine
when you have asserts turned on. Not so much when they're off. Sorry about
the regression.
llvm-svn: 261447
This patch corresponds to review:
http://reviews.llvm.org/D17294
It ensures that whatever block we are emitting the prologue/epilogue into, we
have the necessary scratch registers. It takes away the hard-coded register
numbers for use as scratch registers as registers that are guaranteed to be
available in the function prologue/epilogue are not guaranteed to be available
within the function body. Since we shrink-wrap, the prologue/epilogue may end
up in the function body.
llvm-svn: 261441
Fixed a bug introduced by D16683 when a binary shuffle is simplified to a unary shuffle (with undef/zero sentinel mask indices) - if this resulted in only the second input being used combineX86ShuffleChain failed to take this into account and still referenced the first input.
llvm-svn: 261434
First small step towards fixing PR26667 - we need to ensure that combineX86ShuffleChain only gets called with a valid shuffle input node (a similar issue was found in D17041).
llvm-svn: 261433
TLSADDR nodes are lowered into actuall calls inside MC. In order to prevent
shrink-wrapping from pushing prologue/epilogue past them (which result
in TLS variables being accessed before the stack frame is set up), we
put markers, so that the stack gets adjusted properly.
Thanks to Quentin Colombet for guidance/help on how to fix this problem!
llvm-svn: 261387
Summary:
Instead of trying to replace SMRD instructions with a VGPR base pointer
with an equivalent MUBUF instruction, we now copy the base pointer to
SGPRs using v_readfirstlane.
This is safe to do, because any load selected as an SMRD instruction
has been proven to have a uniform base pointer, so each thread in the
wave will have the same pointer value in VGPRs.
This will fix some errors on VI from trying to replace SMRD instructions
with addr64-enabled MUBUF instructions that don't exist.
Reviewers: arsenm, cfang, nhaehnle
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D17305
llvm-svn: 261385
Summary:
When optimizing for size, sqrt calls can be incorrectly selected as
AVX512 VSQRT instructions. This is because X86InstrAVX512.td has a
`Requires<[OptForSize]>` in its `avx512_sqrt_scalar` multiclass
definition. Even if the target does not support AVX512, the class can
apparently still be chosen, leading to an incorrect selection of
`vsqrtss`.
In PR26625, this lead to an assertion: Reg >= X86::FP0 && Reg <=
X86::FP6 && "Expected FP register!", because the `vsqrtss` instruction
requires an XMM register, which is not available on i686 CPUs.
Reviewers: grosbach, resistor, joker.eph
Subscribers: spatel, emaste, llvm-commits
Differential Revision: http://reviews.llvm.org/D17414
llvm-svn: 261360
Summary:
This was broken in r260694 which swapped the address and data operands
for flat store instructions. The code in SIInsertWaits assumes
that the data operand always comes before the address operand, so
we need to add a special case for flat.
Reviewers: arsenm
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D17366
llvm-svn: 261330
According to the SystemZ ABI, 128-bit integer types should be
passed and returned via implicit reference. However, this is
not currently implemented at the LLVM IR level for the i128
type. This does not matter when compiling C/C++ code, since
clang will implement the implicit reference itself.
However, it turns out that when calling libgcc helper routines
operating on 128-bit integers, LLVM will use i128 argument and
return value types; the resulting code is not compatible with
the ABI used in libgcc, leading to crashes (see PR26559).
This should be simple to fix, except that i128 currently is not
even a legal type for the SystemZ back end. Therefore, common
code will already split arguments and return values into multiple
parts. The bulk of this patch therefore consists of detecting
such parts, and correctly handling passing via implicit reference
of a value split into multiple parts. If at some time in the
future, i128 becomes a legal type, this code can be removed again.
This fixes PR26559.
llvm-svn: 261325
This is effectively NFC because Atom is the only in-order x86 subtarget currently,
but the predicate would have become wrong if any other in-order CPU came along.
See related discussion in:
http://reviews.llvm.org/D16836
llvm-svn: 261275
This patch is part of the work to make PPCLoopDataPrefetch
target-independent
(http://thread.gmane.org/gmane.comp.compilers.llvm.devel/92758).
Obviously the pass still only used from PPC at this point. Subsequent
patches will start driving this from ARM64 as well.
Due to the previous patch most lines should show up as moved lines.
llvm-svn: 261265
This is done only to make the next patch that move the pass out PPC to
Transforms easier to read. After this most line should show up as moved
lines in that patch.
This patch is part of the work to make PPCLoopDataPrefetch
target-independent
(http://thread.gmane.org/gmane.comp.compilers.llvm.devel/92758).
llvm-svn: 261264
If we know that all of our successors want to be in the exact same
state, it makes sense to hoist the state transition into their common
predecessor.
Differential Revision: http://reviews.llvm.org/D17391
llvm-svn: 261262
In r260133, LLVM was changed to no longer extend i8/i16 return values,
as it's not required by the ABI. However, code was found in the wild
that relies on the old behaviour on Darwin, so this commit reverts
back to that old behaviour for Darwin.
On other platforms, it's less likely that code would be depending on
the old behaviour, as GCC and MSVC haven't been extending such return
values.
llvm-svn: 261235
Summary:
These correspond to IMAGE_LOAD/STORE[_MIP] and are going to be used by Mesa
for the GL_ARB_shader_image_load_store extension.
IMAGE_LOAD is already matched by llvm.SI.image.load. That intrinsic has
a legacy name and pretends not to read memory.
Differential Revision: http://reviews.llvm.org/D17276
llvm-svn: 261224
Compiling Hexagon target with GCC 6 produces "error: should have been
declared inside" due to GCC PR c++/69657 which was merged.
Properly wrapping operator<<() definitions within the namespace llvm
fixes the issue.
Author: domagoj.stolfa
Differential Revision: http://reviews.llvm.org/D17281
llvm-svn: 261220
In cases where the PSHUFB shuffle mask is shared it might not be bitcasted to a vXi8 byte vector. This patch adds support for decoding these wider shuffle masks from the ConstantPool.
The test case in question makes use of this to recognise the shuffle mask is an unary UNPCKL pattern and simplifies accordingly.
llvm-svn: 261201
While we still do want reducible control flow, the RequiresStructuredCFG
flag imposes more strict structure constraints than WebAssembly wants.
Unsetting this flag enables critical edge splitting and tail merging.
Also, disable TailDuplication explicitly, as it doesn't support virtual
registers, and was previously only disabled by the RequiresStructuredCFG
flag.
llvm-svn: 261190
Changes:
- Added disassembler project
- Fixed all decoding conflicts in .td files
- Added DecoderMethod=“NONE” option to Target.td that allows to
disable decoder generation for an instruction.
- Created decoding functions for VS_32 and VReg_32 register classes.
- Added stubs for decoding all register classes.
- Added several tests for disassembler
Disassembler only supports:
- VI subtarget
- VOP1 instruction encoding
- 32-bit register operands and inline constants
[Valery]
One of the point that requires to pay attention to is how decoder
conflicts were resolved:
- Groups of target instructions were separated by using different
DecoderNamespace (SICI, VI, CI) using similar to AssemblerPredicate
approach.
- There were conflicts in IMAGE_<> instructions caused by two
different reasons:
1. dmask wasn’t specified for the output (fixed)
2. There are image instructions that differ only by the number of
the address components but have the same encoding by the HW spec. The
actual number of address components is determined by the HW at runtime
using image resource descriptor starting from the VGPR encoded in an
IMAGE instruction. This means that we should choose only one instruction
from conflicting group to be the rule for decoder. I didn’t find the way
to disable decoder generation for an arbitrary instruction and therefore
made a onelinear fix to tablegen generator that would suppress decoder
generation when DecoderMethod is set to “NONE”. This is a change that
should be reviewed and submitted first. Otherwise I would need to
specify different DecoderNamespace for every instruction in the
conflicting group. I haven’t checked yet if DecoderMethod=“NONE” is not
used in other targets.
3. IMAGE_GATHER decoder generation is for now disabled and to be
done later.
[/Valery]
Patch By: Sam Kolton
Differential Revision: http://reviews.llvm.org/D16723
llvm-svn: 261185
These passes are optimizations, and should be disabled when not
optimizing.
Also create an MCCodeGenInfo so the opt level is correctly plumbed to
the backend pass manager.
Also remove the command line flag for disabling register coloring;
running llc with -O0 should now be useful for debugging, so it's not
necessary.
Differential Revision: http://reviews.llvm.org/D17327
llvm-svn: 261176
After r261154, we were only clearing flags if the known-zero register was
originally live-in to the basic block, but we have to do it even if not when
more than one COPY has been eliminated, otherwise the user of the first COPY
may still have <kill> marked.
E.g.
BB#N:
%X0 = COPY %XZR
STRXui %X0<kill>, <fi#0>
%X0 = COPY %XZR
STRXui %X0<kill>, <fi#1>
We can eliminate both copies, X0 is not live-in, but we must clear the kill on
the first store.
Unfortunately, I've been unable to come up with a non-fragile test for this.
I've only seen it in the wild with regalloc-created spills, and attempts to
reproduce that in a reasonable way run afoul of COPY coalescing. Even volatile
asm clobbers were moved around. Should fix the aarch64 bot though.
llvm-svn: 261175
Mostly, this fixes the bug that if the CBZ guaranteed Xn but Wn was used, we
didn't sort out the use-def chain properly.
I've also made it check more than just the last instruction for a compatible
CBZ (so it can cope without fallthroughs). I'd have liked to do that
separately, but it's helps writing the test.
Finally, I removed some custom loops in favour of MachineInstr helpers and
refactored the control flow to flatten it and avoid possibly quadratic
iterations in blocks with many copies. NFC for these, just a general tidy-up.
llvm-svn: 261154
32-bit x86 Windows targets use a linked-list of nodes allocated on the
stack, referenced to via thread-local storage. The personality routine
interprets one of the fields in the node as a 'state number' which
indicates where the personality routine should transfer control.
State transitions are possible only before call-sites which may throw
exceptions. Our previous scheme had us update the state number before
all call-sites which may throw.
Instead, we can try to minimize the number of times we need to store by
reasoning about the nearest store which dominates the current call-site.
If the last store agrees with the current call-site, then we know that
the state-update is redundant and can be elided.
This is largely straightforward: an RPO walk of the blocks allows us to
correctly forward propagate the information when the function is a DAG.
Currently, loops are not handled optimally and may trigger superfluous
state stores.
Differential Revision: http://reviews.llvm.org/D16763
llvm-svn: 261122
Summary:
Previously the machine instructions for bar.sync &co. were not marked as
convergent. This resulted in some MI passes (such as TailDuplication,
fixed in an upcoming patch) doing unsafe things to these instructions.
Reviewers: jingyue
Subscribers: llvm-commits, tra, jholewinski, hfinkel
Differential Revision: http://reviews.llvm.org/D17318
llvm-svn: 261115
Summary:
Otherwise we'll try to do unsafe optimizations on these MIs, such as
sinking loads below calls.
(I suspect that this is not the only bug in the NVPTX instruction
tablegen files; I need to comb through them.)
Reviewers: jholewinski, tra
Subscribers: jingyue, jhen, llvm-commits
Differential Revision: http://reviews.llvm.org/D17315
llvm-svn: 261113
Bug description:
The bug was discovered when test was compiled with -O0.
In case scatter result is DAG root , VectorLegalizer failed (assert) due to LowerMSCATTER() return kmask as result.
Change LowerMSCATTER() to return chain as original node do.
Differential Revision: http://reviews.llvm.org/D17331
llvm-svn: 261090
This section is used for debug information and has no need to be
in memory at runtime. This patch also fixes an error when compiling
the Linux kernel. The error is that there are relocations within the
.pdr section in a VDSO. SHT_REL was removed as it is a section type
and not a section flag, therefore it does not make sense for it to
be there. With this patch, LLVM now emits the same flags as
the GNU assembler.
llvm-svn: 261083
AVX1 doesn't support the shuffling of 256-bit integer vectors. For 32/64-bit elements we get around this by shuffling as float/double but for 8/16-bit elements (assuming they can't widen) we currently just split, shuffle as 128-bit vectors and concatenate the results back.
This patch adds the ability to lower using the bit-blend patterns before defaulting to the splitting behaviour.
Part 2 of 2
Differential Revision: http://reviews.llvm.org/D17292
llvm-svn: 261082
AVX1 doesn't support the shuffling of 256-bit integer vectors. For 32/64-bit elements we get around this by shuffling as float/double but for 8/16-bit elements (assuming they can't widen) we currently just split, shuffle as 128-bit vectors and concatenate the results back.
This patch adds the ability to lower using the bit-mask patterns before defaulting to the splitting behaviour. In some cases this ends up matching what AVX2 would do anyhow or what AVX1 does on the split vectors.
Part 1 of 2
Differential Revision: http://reviews.llvm.org/D17292
llvm-svn: 261081
Avoid reuse of operand variables, keep them local to a particular lowering - the operand collection is unique to each case anyhow.
Renamed from V to Ops to more closely match their purpose.
llvm-svn: 261078
This fixes very slow compilation on
test/CodeGen/Generic/2010-11-04-BigByval.ll . Note that MaxStoresPerMemcpy
and friends are not yet carefully tuned so the cutoff point is currently
somewhat arbitrary. However, it's important that there be a cutoff point
so that we don't emit unbounded quantities of loads and stores.
llvm-svn: 261050
__chkstk clobbers EAX. If EAX is live across the prologue, then we have
to take extra steps to save it. We already had code to do this if EAX
was a register parameter. This change adapts it to work when shrink
wrapping is used.
llvm-svn: 261039
Currently, we sometimes miscompile this vector pattern:
(c ? -v : v)
We lower it to (because "c" is <4 x i1>, lowered as a vector mask):
(~c & v) | (c & -v)
When we have SSSE3, we incorrectly lower that to PSIGN, which does:
(c < 0 ? -v : c > 0 ? v : 0)
in other words, when c is either all-ones or all-zero:
(c ? -v : 0)
While this is an old bug, it rarely triggers because the PSIGN combine
is too sensitive to operand order. This will be improved separately.
Note that the PSIGN tests are also incorrect. Consider:
%b.lobit = ashr <4 x i32> %b, <i32 31, i32 31, i32 31, i32 31>
%sub = sub nsw <4 x i32> zeroinitializer, %a
%0 = xor <4 x i32> %b.lobit, <i32 -1, i32 -1, i32 -1, i32 -1>
%1 = and <4 x i32> %a, %0
%2 = and <4 x i32> %b.lobit, %sub
%cond = or <4 x i32> %1, %2
ret <4 x i32> %cond
if %b is zero:
%b.lobit = <4 x i32> zeroinitializer
%sub = sub nsw <4 x i32> zeroinitializer, %a
%0 = <4 x i32> <i32 -1, i32 -1, i32 -1, i32 -1>
%1 = <4 x i32> %a
%2 = <4 x i32> zeroinitializer
%cond = or <4 x i32> %a, zeroinitializer
ret <4 x i32> %a
whereas we currently generate:
psignd %xmm1, %xmm0
retq
which returns 0, as %xmm1 is 0.
Instead, use a pure logic sequence, as described in:
https://graphics.stanford.edu/~seander/bithacks.html#ConditionalNegate
Fixes PR26110.
Differential Revision: http://reviews.llvm.org/D17181
llvm-svn: 261023
The register stackifier currently checks for intervening stores (and
loads that may alias them) but doesn't account for the fact that the
instruction being moved may affect intervening loads.
Differential Revision: http://reviews.llvm.org/D17298
llvm-svn: 261014
The usual way to get a 32-bit relocation is to use a constant extender which doubles the size of the instruction, 4 bytes to 8 bytes.
Another way is to put a .word32 and mix code and data within a function. The disadvantage is it's not a valid instruction encoding and jumping over it causes prefetch stalls inside the hardware.
This relocation packs a 23-bit value in to an "r0 = add(rX, #a)" instruction by overwriting the source register bits. Since r0 is the return value register, if this instruction is placed after a function call which return void, r0 will be filled with an undefined value, the prefetch won't be confused, and the callee can access the constant value by way of the link register.
llvm-svn: 261006
Summary:
This change will add a pass to remove unnecessary zero copies in target blocks
of cbz/cbnz instructions. E.g., the copy instruction in the code below can be
removed because the cbz jumps to BB1 when x0 is zero :
BB0:
cbz x0, .BB1
BB1:
mov x0, xzr
Jun
Reviewers: gberry, jmolloy, HaoLiu, MatzeB, mcrosier
Subscribers: mcrosier, mssimpso, haicheng, bmakam, llvm-commits, aemerson, rengolin
Differential Revision: http://reviews.llvm.org/D16203
llvm-svn: 261004
Original message:
Get rid of the ifdefs in TargetLowering.
Introduce a new API used only by GlobalISel: CallLowering.
This API will contain target hooks dedicated to call lowering.
llvm-svn: 260998
CopyToReg nodes don't support FrameIndex operands. Other targets select
the FI to some LEA-like instruction, but since we don't have that, we
need to insert some kind of instruction that can take an FI operand and
produces a value usable by CopyToReg (i.e. in a vreg). So insert a dummy
copy_local between Op and its FI operand. This results in a redundant
copy which we should optimize away later (maybe in the post-FI-lowering
peephole pass).
Differential Revision: http://reviews.llvm.org/D17213
llvm-svn: 260987
Summary: This change renames output operand for VOP instructions from dst to vdst. This is needed to enable decoding named operands for disassembler.
Reviewers: vpykhtin, tstellarAMD, arsenm
Subscribers: arsenm, llvm-commits, nhaustov
Projects: #llvm-amdgpu-spb
Differential Revision: http://reviews.llvm.org/D16920
llvm-svn: 260986
WebAssembly doesn't require full RPO; topological sorting is sufficient and
can preserve more of the MachineBlockPlacement ordering. Unfortunately, this
still depends a lot on heuristics, because while we use the
MachineBlockPlacement ordering as a guide, we can't use it in places where
it isn't topologically ordered. This area will require further attention.
llvm-svn: 260978
This avoids some complications updating LiveIntervals to be aware of the new
register lifetimes, because we can just compute new intervals from scratch
rather than describe how the old ones have been changed.
llvm-svn: 260971
Once a pointer is turned into a reference it cannot be nullptr, clang
rightfully warns about this assert being a tautology. Put the assert
before the reference is created.
llvm-svn: 260949
Summary:
In order to pass the tests, this required marking R_MIPS_16 relocations
as needing to point to the symbol and not the section.
Reviewers: vkalintiris, dsanders
Subscribers: dsanders, llvm-commits
Differential Revision: http://reviews.llvm.org/D17200
llvm-svn: 260896
Summary:
This section is used for debug information and has no need to be
in memory at runtime. With this patch, LLVM now emits the same flags as
the GNU assembler. This patch also fixes an error when compiling
the Linux kernel, The error is that there are relocations within the
.pdr section in a VDSO.
Reviewers: vkalintiris, dsanders
Subscribers: llvm-commits, dsanders
Differential Revision: http://reviews.llvm.org/D17199
llvm-svn: 260879
If KMOVB not supported (require AVX512DQ) only KMOVW can be used so store size should be 2 bytes.
Differential Revision: http://reviews.llvm.org/D17138
llvm-svn: 260878
r180893 added an indirect include of llvm/Config/Targets.def to
llvm/Support/CodeGen.h, which in turn is included by things like
llvm/IR/Module.h. After a full build of LLVM and Clang, ninja had to
rebuild 1274 files after reconfiguring.
This commit strips CodeGen.h back down to just a pile of enums and moves
the expensive includes over to CodeGenCWrappers.h (which is only
included in two places). This gets ninja down to 88 files if you
reconfigure with, e.g., -DLLVM_TARGETS_TO_BUILD=X86.
llvm-svn: 260835
This patch attempts to represent a shuffle as a repeating shuffle (recognisable by is128BitLaneRepeatedShuffleMask) with the source input(s) in their original lanes, followed by a single permutation of the 128-bit lanes to their final destinations.
On AVX2 we can additionally attempt to match using 64-bit sub-lane permutation. AVX2 can also now match a similar 'broadcasted' repeating shuffle.
This patch has several benefits:
* Avoids prematurely matching with lowerVectorShuffleByMerging128BitLanes which can require both inputs to have their input lanes permuted before shuffling.
* Can replace PERMPS/PERMD instructions - although these are useful for cross-lane unary shuffling, they require their shuffle mask to be pre-loaded (and increase register pressure).
* Matching the repeating shuffle makes use of a lot of existing shuffle lowering.
There is an outstanding minor AVX1 regression (combine_unneeded_subvector1 in vector-shuffle-combining.ll) of a previously 128-bit shuffle + subvector splat being converted to a subvector splat + (2 instruction) 256-bit shuffle, I intend to fix this in a followup patch for review.
Differential Revision: http://reviews.llvm.org/D16537
llvm-svn: 260834
As shown in:
https://llvm.org/bugs/show_bug.cgi?id=23203
...we currently die because lowering believes that mfence is allowed without SSE2 on x86-64,
but the instruction def doesn't know that.
I don't know if allowing mfence without SSE is right, but if not, at least now it's consistently wrong. :)
Differential Revision: http://reviews.llvm.org/D17219
llvm-svn: 260828
Tests for the new scalarize all private access options will be
included with a future commit.
The only functional change is to make the split/scalarize behavior
for private access of > 4 element vectors to be consistent
with the flat/global handling. This makes the spilling worse
in the two changed tests.
llvm-svn: 260804