Summary:
(Not so) boringly identical to pattern a (D62786)
Not yet sure how do deal with the last pattern c.
Reviewers: RKSimon, craig.topper, spatel
Reviewed By: RKSimon
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62793
llvm-svn: 364418
Summary:
Finally tying up loose ends here.
The problem is quite simple:
If we have pattern `(x >> start) & (1 << nbits) - 1`,
and then truncate the result, that truncation will be propagated upwards,
into the `and`. And that isn't currently handled.
I'm only fixing pattern `a` here,
the same fix will be needed for patterns `b`/`c` too.
I *think* this isn't missing any extra legality checks,
since we only look past truncations. Similary, i don't think
we can get any other truncation there other than i64->i32.
Reviewers: craig.topper, RKSimon, spatel
Reviewed By: craig.topper
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62786
llvm-svn: 364417
Ideally this needs to be a generic combine in DAGCombiner::visitEXTRACT_SUBVECTOR but there's some nasty regressions in aarch64 due to neon shuffles not handling bitcasts at all.....
llvm-svn: 364407
Summary:
The getFixupKindContainerSizeBytes function returns the size of the
instruction containing a given fixup. Currently fixup_arm_pcrel_9 is
not handled in this function, this causes an assertion failure in
the debug build and incorrect codegen in the release build.
This patch fixes the problem.
Reviewers: ostannard, simon_tatham
Reviewed By: ostannard
Subscribers: javed.absar, kristof.beyls, hiraditya, pbarrio, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63778
llvm-svn: 364404
This patch adds the PseudoCALLReg instruction which allows using an
explicit register operand as the destination for the return address.
GCC can successfully parse this form of the call instruction, which
would be used for calls to functions which do not use ra as the return
address register, such as the __riscv_save libcalls. This patch forms
the first part of an implementation of -msave-restore for RISC-V.
Differential Revision: https://reviews.llvm.org/D62685
llvm-svn: 364403
truncateVectorWithPACK is often used in conjunction with ComputeNumSignBits which struggles when peeking through bitcasts.
This fix tries to avoid bitcast(shuffle(bitcast())) patterns in the 256-bit 64-bit sublane shuffles so we can still see through at least until lowering when the shuffles will need to be bitcasted to widen the shuffle type.
llvm-svn: 364401
PPCMIPeephole::emitRLDICWhenLoweringJumpTables should return a bool
value to indicate optimization is conducted or not.
Differential Revision: https://reviews.llvm.org/D63801
llvm-svn: 364383
This was just an omission in the back end. We have had the instructions for both
single and double precision for a few HW generations, but never got around to
legalizing these.
Differential revision: https://reviews.llvm.org/D63634
llvm-svn: 364373
Summary:
`catch_all` is from the first version of EH proposal and now has been
removed. There were no tests covering this, and thus no tests to remove
or fix.
Reviewers: aardappel
Subscribers: dschuff, sbc100, jgravelle-google, sunfish, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63737
llvm-svn: 364360
Original patch https://reviews.llvm.org/D63659 from
Steven Perron <stevenperron@google.com>
The pass AMDGPUUnifyDivergentExitNodes does not update the phi nodes in
the successors of blocks that is splits. This is fixed by calling
BasicBlock::splitBasicBlock to split the block instead of doing it
manually. This does extra work because a new conditional branch is
created in BB which is immediately replaced, but I think the simplicity
is worth it. It also helps make the code more future proof in case other
things need to be updated.
llvm-svn: 364342
I believe these all get canonicalized to vzext_movl. The only case where that wasn't true was when the load was loadi32 and the load was an extload aligned to 32 bits. But that was fixed in r364207.
Differential Revision: https://reviews.llvm.org/D63701
llvm-svn: 364337
We currently have some isel patterns for treating vzmovl+load the same as vzload, but that shrinks the load which we shouldn't do if the load is volatile.
Rather than adding isel checks for volatile. This patch removes the patterns and teachs DAG combine to merge them into vzload when its legal to do so.
Differential Revision: https://reviews.llvm.org/D63665
llvm-svn: 364333
"To" selects an odd-numbered GPR, and "Te" an even one. There are some
8.1-M instructions that have one too few bits in their register fields
and require registers of particular parity, without necessarily using
a consecutive even/odd pair.
Also, the constraint letter "t" should select an MVE q-register, when
MVE is present. This didn't need any source changes, but some extra
tests have been added.
Reviewers: dmgreen, samparker, SjoerdMeijer
Subscribers: javed.absar, eraman, kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D60709
llvm-svn: 364331
A refactor in r364191 changed register types from an unsigned int to the
llvm:Register class. Adjust the AVR backend to this change.
This fixes build errors when building with the experimental AVR backend
enabled.
Differential Revision: https://reviews.llvm.org/D63776
llvm-svn: 364330
This provides the low-level support to start using MVE vector types in
LLVM IR, loading and storing them, passing them to __asm__ statements
containing hand-written MVE vector instructions, and *if* you have the
hard-float ABI turned on, using them as function parameters.
(In the soft-float ABI, vector types are passed in integer registers,
and combining all those 32-bit integers into a q-reg requires support
for selection DAG nodes like insert_vector_elt and build_vector which
aren't implemented yet for MVE. In fact I've also had to add
`arm_aapcs_vfpcc` to a couple of existing tests to avoid that
problem.)
Specifically, this commit adds support for:
* spills, reloads and register moves for MVE vector registers
* ditto for the VPT predication mask that lives in VPR.P0
* make all the MVE vector types legal in ISel, and provide selection
DAG patterns for BITCAST, LOAD and STORE
* make loads and stores of scalar FP types conditional on
`hasFPRegs()` rather than `hasVFP2Base()`. As a result a few
existing tests needed their llc command lines updating to use
`-mattr=-fpregs` as their method of turning off all hardware FP
support.
Reviewers: dmgreen, samparker, SjoerdMeijer
Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60708
llvm-svn: 364329
Summary:
In Secure PLT ABI, -fpic is similar to -fPIC. The differences are that:
* -fpic stores the address of _GLOBAL_OFFSET_TABLE_ in r30, while -fPIC stores .got2+0x8000.
* -fpic uses an addend of 0 for R_PPC_PLTREL24, while -fPIC uses 0x8000.
Reviewers: hfinkel, jhibbits, joerg, nemanjai, spetrovic
Reviewed By: jhibbits
Subscribers: adalava, kbarton, jsji, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63563
llvm-svn: 364324
The expensive buildbots highlighted the mir tests were broken, which
I've now updated and added --verify-machineinstrs to them. This also
uncovered a couple of bugs in the backend pass, so these have also
been fixed.
llvm-svn: 364323
Including both 'case ARM_AM::uxtw' and 'default' in the getShiftOp
switch caused a buildbot to fail with
error: default label in switch which covers all enumeration values [-Werror,-Wcovered-switch-default]
llvm-svn: 364300
A minor iteration on the MVE VPT Block pass to enable more efficient VPT Block
code generation: consecutive VPT predicated statements, predicated on the same
condition, will be placed within the same VPT Block. This essentially is also
an exercise to write some more tests for the next step, which should be more
generic also merging instructions when they are not consecutive.
Differential Revision: https://reviews.llvm.org/D63711
llvm-svn: 364298
Summary:
The symbols use the processor-specific SHN_AMDGPU_LDS section index
introduced with a previous change. The linker is then expected to resolve
relocations, which are also emitted.
Initially disabled for HSA and PAL environments until they have caught up
in terms of linker and runtime loader.
Some notes:
- The llvm.amdgcn.groupstaticsize intrinsics can no longer be lowered
to a constant at compile times, which means some tests can no longer
be applied.
The current "solution" is a terrible hack, but the intrinsic isn't
used by Mesa, so we can keep it for now.
- We no longer know the full LDS size per kernel at compile time, which
means that we can no longer generate a relevant error message at
compile time. It would be possible to add a check for the size of
individual variables, but ultimately the linker will have to perform
the final check.
Change-Id: If66dbf33fccfbf3609aefefa2558ac0850d42275
Reviewers: arsenm, rampitec, t-tye, b-sumner, jsjodin
Subscribers: qcolombet, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61494
llvm-svn: 364297
Summary:
The directive defines a symbol as an group/local memory (LDS) symbol.
LDS symbols behave similar to common symbols for the purposes of ELF,
using the processor-specific SHN_AMDGPU_LDS as section index.
It is the linker and/or runtime loader's job to "instantiate" LDS symbols
and resolve relocations that reference them.
It is not possible to initialize LDS memory (not even zero-initialize
as for .bss).
We want to be able to link together objects -- starting with relocatable
objects, but possible expanding to shared objects in the future -- that
access LDS memory in a flexible way.
LDS memory is in an address space that is entirely separate from the
address space that contains the program image (code and normal data),
so having program segments for it doesn't really make sense.
Furthermore, we want to be able to compile multiple kernels in a
compilation unit which have disjoint use of LDS memory. In that case,
we may want to place LDS symbols differently for different kernels
to save memory (LDS memory is very limited and physically private to
each kernel invocation), so we can't simply place LDS symbols in a
.lds section.
Hence this solution where LDS symbols always stay undefined.
Change-Id: I08cbc37a7c0c32f53f7b6123aa0afc91dbc1748f
Reviewers: arsenm, rampitec, t-tye, b-sumner, jsjodin
Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, rupprecht, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61493
llvm-svn: 364296
If an FP_EXTEND or FP_ROUND isel dag node converts directly between
f16 and f32 when the target CPU has no instruction to do it in one go,
it has to be done in two steps instead, going via f32.
Previously, this was done implicitly, because all such CPUs had the
storage-only implementation of f16 (i.e. the only thing you can do
with one at all is to convert it to/from f32). So isel would legalize
the f16 into an f32 as soon as it saw it, by inserting an fp16_to_fp
node (or vice versa), and then the fp_extend would already be f32->f64
rather than f16->f64.
But that technique can't support a target CPU which has full f16
support but _not_ f64, such as some variants of Arm v8.1-M. So now we
provide custom lowering for FP_EXTEND and FP_ROUND, which checks
support for f16 and f64 and decides on the best thing to do given the
combination of flags it gets back.
Reviewers: dmgreen, samparker, SjoerdMeijer
Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60692
llvm-svn: 364294
This final batch includes the tail-predicated versions of the
low-overhead loop instructions (LETP); the VPSEL instruction to select
between two vector registers based on the predicate mask without
having to open a VPT block; and VPNOT which complements the predicate
mask in place.
Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover
Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62681
llvm-svn: 364292
This adds the rest of the vector memory access instructions. It
includes contiguous loads/stores, with an ordinary addressing mode
such as [r0,#offset] (plus writeback variants); gather loads and
scatter stores with a scalar base address register and a vector of
offsets from it (written [r0,q1] or similar); and gather/scatters with
a vector of base addresses (written [q0,#offset], again with
writeback). Additionally, some of the loads can widen each loaded
value into a larger vector lane, and the corresponding stores narrow
them again.
To implement these, we also have to add the addressing modes they
need. Also, in AsmParser, the `isMem` query function now has
subqueries `isGPRMem` and `isMVEMem`, according to which kind of base
register is used by a given memory access operand.
I've also had to add an extra check in `checkTargetMatchPredicate` in
the AsmParser, without which our last-minute check of `rGPR` register
operands against SP and PC was failing an assertion because Tablegen
had inserted an immediate 0 in place of one of a pair of tied register
operands. (This matches the way the corresponding check for `MCK_rGPR`
in `validateTargetOperandClass` is guarded.) Apparently the MVE load
instructions were the first to have ever triggered this assertion, but
I think only because they were the first to have a combination of the
usual Arm pre/post writeback system and the `rGPR` class in particular.
Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover
Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62680
llvm-svn: 364291
Introduce three pseudo instructions to be used during DAG ISel to
represent v8.1-m low-overhead loops. One maps to set_loop_iterations
while loop_decrement_reg is lowered to two, so that we can separate
the decrement and branching operations. The pseudo instructions are
expanded pre-emission, where we can still decide whether we actually
want to generate a low-overhead loop, in a new pass:
ARMLowOverheadLoops. The pass currently bails, reverting to an sub,
icmp and br, in the cases where a call or stack spill/restore happens
between the decrement and branching instructions, or if the loop is
too large.
Differential Revision: https://reviews.llvm.org/D63476
llvm-svn: 364288
Scalar extends to s64 can use S_BFE_{I64|U64}, but vector extends need
to extend to the 32-bit half, and then to 64.
I'm not sure what the line should be between what RegBankSelect
handles, and what instruction select does, but for now I'm erring on
the side of RegBankSelect for future post-RBS combines.
llvm-svn: 364212
Summary:
The LLVM disassembler assumes that the unused src0 operand of v_nop is
zero. Other tools can put another value in that field, which is still
valid. This commit fixes the LLVM disassembler to recognize such an
encoding as v_nop, in the same way as we already do for s_getpc.
Differential Revision: https://reviews.llvm.org/D63724
Change-Id: Iaf0363eae26ff92fc4ebc716216476adbff37a6f
llvm-svn: 364208
In LowerBuildVectorv16i8 we took care to use an any_extend if the first pair is in the lower 16-bits of the vector and no elements are 0. So bits [31:16] will be undefined. But we still emitted a vzext_movl to ensure that bits [127:32] are 0. If we don't need any zeroes we should be consistent and make all of 127:16 undefined.
In LowerBuildVectorv8i16 we can just delete the vzext_movl code because we only use the scalar_to_vector when there are no zeroes. So the vzext_movl is always unnecessary.
Found while investigating whether (vzext_movl (scalar_to_vector (loadi32)) patterns are necessary. At least one of the cases where they were necessary was where the loadi32 matched 32-bit aligned 16-bit extload. Seemed weird that we required vzext_movl for that case.
Differential Revision: https://reviews.llvm.org/D63700
llvm-svn: 364207
This patch does a few things to start cleaning up the isFNEG function.
-Remove the Op0/Op1 peekThroughBitcast calls that seem unnecessary. getTargetConstantBitsFromNode has its own peekThroughBitcast inside. And we have a separate peekThroughBitcast on the return value.
-Add a check of the scalar size after the first peekThroughBitcast to ensure we haven't changed the element size and just did something like f32->i32 or f64->i64.
-Remove an unnecessary check that Op1's type is floating point after the peekThroughBitcast. We're just going to look for a bit pattern from a constant. We don't care about its type.
-Add VT checks on several places that consume the return value of isFNEG. Due to the peekThroughBitcasts inside, the type of the return value isn't guaranteed. So its not safe to use it to build other nodes without ensuring the type matches the type being used to build the node. We might be able to replace these checks with bitcasts instead, but I don't have a test case so a bail out check seemed better for now.
Differential Revision: https://reviews.llvm.org/D63683
llvm-svn: 364206
Avoids using a plain unsigned for registers throughoug codegen.
Doesn't attempt to change every register use, just something a little
more than the set needed to build after changing the return type of
MachineOperand::getReg().
llvm-svn: 364191
This needs different handling if the source is known to be a valid
condition or not. Handle turning it into shifts or a select during
regbankselect.
llvm-svn: 364186
This matters for byval uses outside of the entry block, which appear
as copies.
Previously, the only folding done was during selection, which could
not see the underlying frame index. For any uses outside the entry
block, the frame index was materialized in the entry block relative to
the global scratch wave offset.
This may produce worse code in cases where the offset ends up not
fitting in the MUBUF offset field. A better heuristic would be helpfu
for extreme frames.
llvm-svn: 364185
This adds the family of loads and stores with names like VLD20.8 and
VST42.32, which load and store parts of multiple q-registers in such a
way that executing both VLD20 and VLD21, or all four of VLD40..VLD43,
will distribute 2 or 4 vectors' worth of memory data across the lanes
of the same number of registers but in a transposed order.
In addition to the Tablegen descriptions of the instructions
themselves, this patch also adds encode and decode support for the
QQPR and QQQQPR register classes (representing the range of loaded or
stored vector registers), and tweaks to the parsing system for lists
of vector registers to make it return the right format in this case
(since, unlike NEON, MVE regards q-registers as primitive, and not
just an alias for two d-registers).
llvm-svn: 364172
Ideally we'd be able to represent this truncate as a any_extend to
v16i32 and a truncate, but SelectionDAG doens't know how to not
fold those together.
We have isel patterns to use a vpmovzxwd+vpdmovdb for the truncate,
but we aren't able to simultaneously fold the load and the store
from the isel pattern. By pulling the truncate into the store we
can successfully hide it from the DAG combiner. Then we can isel
pattern match the truncstore and load+any_extend separately.
llvm-svn: 364163
Rename masked_load/masked_store to masked_ld/masked_st to discourage
their direct use. We need to check truncating/extending and
compressing/expanding before using them. This revealed that
our scalar masked load/store patterns were misusing these.
With those out of the way, renamed masked_load_unaligned and
masked_store_unaligned to remove the "_unaligned". We didn't
check the alignment anyway so the name was somewhat misleading.
Make the aligned versions inherit from masked_load/store instead
from a separate identical version. Merge the 3 different alignments
PatFrags into a single version that uses the VT from the SDNode to
determine the size that the alignment needs to match.
llvm-svn: 364150
After r248261, the indentation switches, inside a namespace definition,
between indenting and not indenting one level in for that namespace; the
abomination occurs in the middle of a class definition. Fix that.
llvm-svn: 364133
This is useful for allowing code to efficiently take an address
that can be later mapped onto debug info. Currently the hwasan
pass achieves this by taking the address of the current function:
http://llvm-cs.pcc.me.uk/lib/Transforms/Instrumentation/HWAddressSanitizer.cpp#921
but this costs two instructions (plus a GOT entry in PIC code) per function
with stack variables. This will allow the cost to be reduced to a single
instruction.
Differential Revision: https://reviews.llvm.org/D63471
llvm-svn: 364126
To help produce better diagnostics for stack use-after-return, we'd like
to be able to determine the addresses of each HWASANified function's local
variables given a small amount of information recorded on entry to the
function. Currently we require all HWASANified functions to use frame pointers
and record (PC, FP) on function entry. This works better than recording SP
because FP cannot change during the function, unlike SP which can change
e.g. due to dynamic alloca.
However, most variables currently end up using SP-relative locations in their
debug info. This prevents us from recomputing the address of most variables
because the distance between SP and FP isn't recorded in the debug info. To
address this, make the AArch64 backend prefer FP-relative debug locations
when producing debug info for HWASANified functions.
Differential Revision: https://reviews.llvm.org/D63300
llvm-svn: 364117
On Windows ARM64, intrinsic __debugbreak is compiled into brk #0xF000 which is
mapped to llvm.debugtrap in Clang. Instruction brk #F000 is the defined break
point instruction on ARM64 which is recognized by Windows debugger and
exception handling code, so llvm.debugtrap should map to it instead of
redirecting to llvm.trap (brk #1) as the default implementation.
Differential Revision: https://reviews.llvm.org/D63635
llvm-svn: 364115
128/256 bit scalar_to_vectors are canonicalized to (insert_subvector undef, (scalar_to_vector), 0). We have isel patterns that try to match this pattern being used by a vzmovl to use a 128-bit instruction and a subreg_to_reg.
This patch detects the insert_subvector undef portion of this and pulls it through the vzmovl, creating a narrower vzmovl and an insert_subvector allzeroes. We can then match the insertsubvector into a subreg_to_reg operation by itself. Then we can fall back on existing (vzmovl (scalar_to_vector)) patterns.
Note, while the scalar_to_vector case is the motivating case I didn't restrict to just that case. I'm also wondering about shrinking any 256/512 vzmovl to an extract_subvector+vzmovl+insert_subvector(allzeros) but I fear that would have bad implications to shuffle combining.
I also think there is more canonicalization we can do with vzmovl with loads or scalar_to_vector with loads to create vzload.
Differential Revision: https://reviews.llvm.org/D63512
llvm-svn: 364095
This should be unreachable, but bugs can make it reachable. This
adds a debug print so we can see the bad node in the output when
the llvm_unreachable triggers.
llvm-svn: 364091
With this we can now fully code generate jump tables, which is important for code size.
Differential Revision: https://reviews.llvm.org/D63223
llvm-svn: 364086
We already use vmovq for v2i64/v2f64 vzmovl. But we were using a
blendpd+xorpd for v4i64/v4f64/v8i64/v8f64 under opt speed. Or
movsd+xorpd under optsize.
I think the blend with 0 or movss/d is only needed for
vXi32 where we don't have an instruction that can move 32
bits from one xmm to another while zeroing upper bits.
movq is no worse than blendpd on any known CPUs.
llvm-svn: 364079
We sometimes get poor code size because constants of types < 32b are legalized
as 32 bit G_CONSTANTs with a truncate to fit. This works but means that the
localizer can no longer sink them (although it's possible to extend it to do so).
On AArch64 however s8 and s16 constants can be selected in the same way as s32
constants, with a mov pseudo into a W register. If we make s8 and s16 constants
legal then we can avoid unnecessary truncates, they can be CSE'd, and the
localizer can sink them as normal.
There is a caveat: if the user of a smaller constant has to widen the sources,
we end up with an anyext of the smaller typed G_CONSTANT. This can cause
regressions because of the additional extend and missed pattern matching. To
remedy this, there's a new artifact combiner to generate the wider G_CONSTANT
if it's legal for the target.
Differential Revision: https://reviews.llvm.org/D63587
llvm-svn: 364075
Summary:
LLVM Allows Targets to provide information that guides optimisations
made to LLVM IR. This is done with callbacks on a TargetTransformInfo object.
This patch adds a TargetTransformInfo class for RISC-V. This will allow us to
implement RISC-V specific callbacks as they become necessary.
This commit also adds the getIntImmCost callbacks, and tests them with a simple
constant hoisting test. Our immediate costs are on the conservative side, for
the moment, but we prevent hoisting in most circumstances anyway.
Previous review was on D63007
Reviewers: asb, luismarques
Reviewed By: asb
Subscribers: ributzka, MaskRay, llvm-commits, Jim, benna, psnobl, jocewei, PkmX, rkruppe, the_o, brucehoult, MartinMosbeck, rogfer01, edward-jones, zzheng, jrtc27, shiva0217, kito-cheng, niosHD, sabuasal, apazos, simoncook, johnrusso, rbar, hiraditya, mgorny
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63433
llvm-svn: 364046
These instructions let you load half a vector register at once from
two general-purpose registers, or vice versa.
The assembly syntax for these instructions mentions the vector
register name twice. For the move _into_ a vector register, the MC
operand list also has to mention the register name twice (once as the
output, and once as an input to represent where the unchanged half of
the output register comes from). So we can conveniently assign one of
the two asm operands to be the output $Qd, and the other $QdSrc, which
avoids confusing the auto-generated AsmMatcher too much. For the move
_from_ a vector register, there's no way to get round the fact that
both instances of that register name have to be inputs, so we need a
custom AsmMatchConverter to avoid generating two separate output MC
operands. (And even that wouldn't have worked if it hadn't been for
D60695.)
Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover
Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62679
llvm-svn: 364041
This adds the `MVE_qDest_rSrc` superclass and all its instances, plus
a few other instructions that also take a scalar input register or two.
I've also belatedly added custom diagnostic messages to the operand
classes for odd- and even-numbered GPRs, which required matching
changes in two of the existing MVE assembly test files.
Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover
Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62678
llvm-svn: 364040
The sat add/sub tests still have unnecessary extract_subvector((vandnps ymm, ymm), 0) uses that should be split to (vandnps (extract_subvector(ymm, 0), extract_subvector(ymm, 0)), but its getting better.
llvm-svn: 364038
Summary:
This adds the `MVE_qDest_qSrc` superclass and all instructions that
inherit from it. It's not the complete class of _everything_ with a
q-register as both destination and source; it's a subset of them that
all have similar encodings (but it would have been hopelessly unwieldy
to call it anything like MVE_111x11100).
This category includes add/sub with carry; long multiplies; halving
multiplies; multiply and accumulate, and some more complex
instructions.
Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover
Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62677
llvm-svn: 364037
Summary:
These take a pair of vector register to compare, and a comparison type
(written in the form of an Arm condition suffix); they output a vector
of booleans in the VPR register, where predication can conveniently
use them.
Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover
Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62676
llvm-svn: 364027
Every called function could possibly need this to calculate the
absolute address of stack objectst, and this avoids inserting a copy
around every call site in the kernel. It's also somewhat cleaner to
keep this in a callee saved SGPR.
llvm-svn: 363990
Teach RegisterBankInfo to use the correct register class, and tell the
legalizer it's legal. Everything else just works.
The one thing that's slightly weird about this compared to SelectionDAG
isel is that legalization can't distinguish between i64 and <1 x i64>,
so we might end up with more NEON instructions than the user expects.
Differential Revision: https://reviews.llvm.org/D63585
llvm-svn: 363989
Summary:
BLSI sets the C flag is the input is not zero. So if its followed
by a TEST of the input where only the Z flag is consumed, we can
replace it with the opposite check of the C flag.
We should be able to do the same for BLSMSK and BLSR, but the
naive test case for those is being optimized to a subo by
CodeGenPrepare.
Reviewers: spatel, RKSimon
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63589
llvm-svn: 363957
The attribute can specify elimination for leaf or non-leaf, so it
should always be considered. I copied this bug from AArch64, which
probably should also be fixed.
llvm-svn: 363949
This includes integer arithmetic of various kinds (add/sub/multiply,
saturating and not), and the immediate forms of VMOV and VMVN that
load an immediate into all lanes of a vector.
Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover
Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62674
llvm-svn: 363936
The caller of this is looking for comparisons of the input
to these instructions with 0. But the memory instructions
input is an addess not a value input in a register.
llvm-svn: 363907
Introducing VCC defs during SIFixSGPRCopies is generally
problematic. Avoid it by starting with the VOP3 form with the general
condition register. This is the easiest to fix instance, but doesn't
solve any specific problems I'm looking at.
llvm-svn: 363904
The ARMDisassembler changes allow changing between ARM and Thumb mode
based on the MCSubtargetInfo, rather than the Target, which simplifies
the other changes a bit.
I'm not really happy with adding more target-specific logic to
tools/llvm-objdump/, but there isn't any easy way around it: the logic
in question specifically applies to disassembling an object file, and
that code simply isn't located in lib/Target, at least at the moment.
Differential Revision: https://reviews.llvm.org/D60927
llvm-svn: 363903
This is incomplete, and ideally these would all be removed, but it's
better to localize them to the subtarget first with comments about
what they're for.
llvm-svn: 363902
The def instruction for the vreg may not match, because it may be
folding through a reg_sequence. The assert was overly conservative and
not necessary. It's not actually important if DefMI really defined the
register, because the fold that will be done cares about the def of
the value that will be folded.
For some reason copies aren't making it through the reg_sequence,
although they should.
llvm-svn: 363876
Turns out that we can save an instruction by folding the right shift into
the compare.
Differential Revision: https://reviews.llvm.org/D63568
llvm-svn: 363874
This reapplies r363678, using the correct chain for the CopyToReg for
v0. glueCopyToM0 counterintuitively changes the operands of the
original node.
llvm-svn: 363870
This is an exception to the rule that we should prefer xmm ops to ymm ops.
As shown in PR42305:
https://bugs.llvm.org/show_bug.cgi?id=42305
...the store folding opportunity with vextractf128 may result in better
perf by reducing the instruction count.
Differential Revision: https://reviews.llvm.org/D63517
llvm-svn: 363853
We already do this for ZERO_EXTEND/ZERO_EXTEND_VECTOR_INREG - this just extends the pattern matcher to recognize cases where we don't need the zeros in the extension.
llvm-svn: 363841
This includes all the obvious bitwise operations (AND, OR, BIC, ORN,
MVN) in register-to-register forms, and the immediate forms of
AND/OR/BIC/ORN; byte-order reverse instructions; and the VMOVs that
access a single lane of a vector.
Some of those VMOVs (specifically, the ones that access a 32-bit lane)
share an encoding with existing instructions that were disassembled as
accessing half of a d-register (e.g. `vmov.32 r0, d1[0]`), but in
8.1-M they're now written as accessing a quarter of a q-register (e.g.
`vmov.32 r0, q0[2]`). The older syntax is still accepted by the
assembler.
Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover
Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62673
llvm-svn: 363838
Vector load/store instructions support an optional alignment field
that the compiler can use to provide known alignment info to the
hardware. If the field is used (and the information is correct),
the hardware may be able (on some models) to perform faster memory
accesses than otherwise.
This patch adds support for alignment hints in the assembler and
disassembler, and fills in known alignment during codegen.
llvm-svn: 363806
This patch allows immediates (and CSR alias immediates) which start with
a tilde token or an exclaim (!) token to be parsed as intended.
Differential Revision: https://reviews.llvm.org/D57320
llvm-svn: 363783
Since the parser attempts to parse an operand as a register with
parentheses before parsing it as an immediate, immediates in
parentheses should not be parsed by parseRegister. However in the case
where the immediate does not start with an identifier, the LParen is not
unlexed and so the RParen causes an unexpected token error.
This patch adds the missing UnLex, and modifies the existing UnLex to
not use a buffered token, as it should always be unlexing an LParen.
Differential Revision: https://reviews.llvm.org/D57319
llvm-svn: 363782
Summary:
llvm.x86.sse.stmxcsr only writes to memory.
llvm.x86.sse.ldmxcsr only reads from memory, and might generate an FPE.
Reviewers: craig.topper, RKSimon
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62896
llvm-svn: 363773
This patch adds lowering for global TLS addresses for the TLS models of
InitialExec, GlobalDynamic, LocalExec and LocalDynamic.
LocalExec support required using a 4-operand add instruction, which uses
the fourth operand to express a relocation on the symbol. The necessary
fixup is emitted when the instruction is emitted.
Differential Revision: https://reviews.llvm.org/D55305
llvm-svn: 363771
This allows targets to make more decisions about reserved registers
after isel. For example, now it should be certain there are calls or
stack objects in the frame or not, which could have been introduced by
legalization.
Patch by Matthias Braun
llvm-svn: 363757
Summary:
Converting the result *.{all,any}_true to a bool at the source level
generates LLVM IR that compares the result to 0. This check is
redundant since these instructions already return either 0 or 1 and
therefore conform to the BooleanContents setting for WebAssembly. This
CL adds patterns to detect and remove such redundant operations on the
result of Boolean reductions.
Reviewers: dschuff, aheejin
Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63529
llvm-svn: 363756
Summary:
When identifing instructions that can be folded into a MOVCC instruction,
checking for a predicate operand is not enough, also need to check for
thumb2 function, with restrict-IT, is the machine instruction eligible for
ARMv8 IT or not.
Notes in ARMv8-A Architecture Reference Manual, section "Partial deprecation of IT"
https://usermanual.wiki/Pdf/ARM20Architecture20Reference20ManualARMv8.1667877052.pdf
"ARMv8-A deprecates some uses of the T32 IT instruction. All uses of IT that apply to
instructions other than a single subsequent 16-bit instruction from a restricted set
are deprecated, as are explicit references to the PC within that single 16-bit
instruction. This permits the non-deprecated forms of IT and subsequent instructions
to be treated as a single 32-bit conditional instruction."
Reviewers: efriedma, lebedev.ri, t.p.northover, jmolloy, aemerson, compnerd, stoklund, ostannard
Reviewed By: ostannard
Subscribers: ostannard, javed.absar, kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63474
llvm-svn: 363739
Summary:
DAGCombine will normally turn a `(shl (add x, c1), c2)` into `(add (shl x, c2), c1 << c2)`, where `c1` and `c2` are constants. This can be prevented by a callback in TargetLowering.
On RISC-V, materialising the constant `c1 << c2` can be more expensive than materialising `c1`, because materialising the former may take more instructions, and may use a register, where materialising the latter would not.
This patch implements the hook in RISCVTargetLowering to prevent this transform, in the cases where:
- `c1` fits into the immediate field in an `addi` instruction.
- `c1` takes fewer instructions to materialise than `c1 << c2`.
In future, DAGCombine could do the check to see whether `c1` fits into an add immediate, which might simplify more targets hooks than just RISC-V.
Reviewers: asb, luismarques, efriedma
Reviewed By: asb
Subscribers: xbolva00, lebedev.ri, craig.topper, lewis-revill, Jim, hiraditya, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, kito-cheng, shiva0217, jrtc27, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, psnobl, benna, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62857
llvm-svn: 363736
FP_ROUND defaults to Legal for all MVT types and nothing changes
the v4f32 entry way from this default. If we needed this line
we'd also need one for v8f32 with AVX512 which we don't have.
llvm-svn: 363719
Add `IsGP64bit` and `IsPTR64bit` to the list of `UnsupportedFeatures`
of the P5600 scheduling definitions. Also mark some MIPS 64-bit
instructions by PTR_64 and GPR_64 predicates. This reduces number
of "No schedule information for" and "lacks information for" errors
in case of marking this scheduler model as complete.
This patch is one of a series of patches. The goal is to make P5600
scheduler model complete and turn on the `CompleteModel` flag.
Differential Revision: https://reviews.llvm.org/D63237
llvm-svn: 363702
Set the hasNoSchedulingInfo flag for the`MipsAsmPseudoInst`. These
pseudo-instructions are never used by codegen. This flag allows to
reduce number of "No schedule information for" and "lacks information
for" errors in case of marking a scheduler model as complete.
This patch is one of a series of patches. The goal is to make P5600
scheduler model complete and turn on the `CompleteModel` flag.
Differential Revision: https://reviews.llvm.org/D63236
llvm-svn: 363701
This includes saturating and non-saturating shifts, both with
immediate shift count and with the shift counts given by another
vector register; VSHLC (in which the bits shifted out of each active
vector lane are shifted in to the next active lane); and also VMOVL,
which is enough like an immediate shift that it didn't fit too badly
in this category.
Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover
Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62672
llvm-svn: 363696
Summary:
These form a small family of their own, to go with the floating-point
VMINNM/VMAXNM instructions added in a previous commit.
They introduce the first of many special cases in the mnemonic
recognition code, because VMIN with the E suffix used by the VPT
predication system needs to avoid being interpreted as the nonexistent
instruction 'VMI' with an ordinary 'NE' condition suffix.
Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover
Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62671
llvm-svn: 363695
Part of fixing the X86 regression noted in D63281 - I've split this into X86 and generic parts - the generic commit will be coming shortly and will fix the vector-reduce-mul-widen.ll regression introduced here.
llvm-svn: 363693
Summary:
Their names began with a mishmash of `MVE_`, `t2` and no prefix at
all. Now they all start with `MVE_`, which seems like a reasonable
choice on the grounds that (a) NEON is the thing they're most at risk
of being confused with, and (b) MVE implies Thumb-2, so a prefix
indicating MVE is strictly more specific than one indicating Thumb-2.
Reviewers: ostannard, SjoerdMeijer, dmgreen
Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63492
llvm-svn: 363690
This patch adds support for generating calls through the procedure
linkage table where required for a given ExternalSymbol or GlobalAddress
callee.
Differential Revision: https://reviews.llvm.org/D55304
llvm-svn: 363686
Invert the name and return value to better reflect the imprecise
nature.
Force passing in the DefMI, since it's known in the 2 users and could
possibly fail for an arbitrary vreg.
Allow specifying a specific user instruction. Scan through use
instructions, instead of use operands. Add scan thresholds instead of
searching infinitely.
Stop using a set to track seen uses. I didn't understand this usage,
or why it would not check the last use. I don't think the use list has
any particular order.
llvm-svn: 363675
Some more refactoring, like registering the IT Block pass, less cryptic
variable names, and some simplification of loops.
Differential Revision: https://reviews.llvm.org/D63419
llvm-svn: 363666
First step toward addressing the vector-reduce-mul-widen.ll regression in D63281 - we should replace ANY_EXTEND/ANY_EXTEND_VECTOR_INREG in X86ISelDAGToDAG to avoid having to add duplicate patterns when treating any extensions as legal.
In future patches this will also allow us to keep any extension nodes around a lot longer in the DAG, which should mean that we can keep better track of undef elements that otherwise become zeros that we think we have to keep......
Differential Revision: https://reviews.llvm.org/D63326
llvm-svn: 363655
The isel patterns for these use a bitcast and load/store, but
DAG combine should have canonicalized those away.
For the purposes of the memory folding table these opcodes can be
replaced by the MOVSSrm_alt/MOVSDrm_alt and MOVSSmr/MOVSDmr opcodes.
llvm-svn: 363644
Rename the old versions that use FR32/FR64 to MOVSSrm_alt/MOVSDrm_alt.
Use the new versions in patterns that previously used a COPY_TO_REGCLASS
to VR128. These patterns expect the upper bits to be zero. The
current set up appears to work, but I'm not sure we should be
enforcing upper bits being zero through a COPY_TO_REGCLASS.
I wanted to flip the arrangement and use a COPY_TO_REGCLASS to
FR32/FR64 for the patterns that need an f32/f64 result, but that
complicated fastisel and globalisel.
I've been doing some experiments with reducing some isel patterns
and ended up in a situation where I had a
(SUBREG_TO_REG (COPY_TO_RECLASS (VMOVSSrm), VR128)) and our
post-isel peephole was unable to avoid using an instruction for
the SUBREG_TO_REG due to the COPY_TO_REGCLASS. Having a VR128
instruction removes the COPY_TO_REGCLASS that was breaking this.
llvm-svn: 363643
Inter-block localization is the same as what currently happens, except now it
only runs on the entry block because that's where the problematic constants with
long live ranges come from.
The second phase is a new intra-block localization phase which attempts to
re-sink the already localized instructions further right before one of the
multiple uses.
One additional change is to also localize G_GLOBAL_VALUE as they're constants
too. However, on some targets like arm64 it takes multiple instructions to
materialize the value, so some additional heuristics with a TTI hook have been
introduced attempt to prevent code size regressions when localizing these.
Overall, these changes improve CTMark code size on arm64 by 1.2%.
Full code size results:
Program baseline new diff
------------------------------------------------------------------------------
test-suite...-typeset/consumer-typeset.test 1249984 1217216 -2.6%
test-suite...:: CTMark/ClamAV/clamscan.test 1264928 1232152 -2.6%
test-suite :: CTMark/SPASS/SPASS.test 1394092 1361316 -2.4%
test-suite...Mark/mafft/pairlocalalign.test 731320 714928 -2.2%
test-suite :: CTMark/lencod/lencod.test 1340592 1324200 -1.2%
test-suite :: CTMark/kimwitu++/kc.test 3853512 3820420 -0.9%
test-suite :: CTMark/Bullet/bullet.test 3406036 3389652 -0.5%
test-suite...ark/tramp3d-v4/tramp3d-v4.test 8017000 8016992 -0.0%
test-suite...TMark/7zip/7zip-benchmark.test 2856588 2856588 0.0%
test-suite...:: CTMark/sqlite3/sqlite3.test 765704 765704 0.0%
Geomean difference -1.2%
Differential Revision: https://reviews.llvm.org/D63303
llvm-svn: 363632
This is part of the approved D63204 pending parent revision.
This small change is in fact a part of the VOP2b legalization which
does not technically belong to wave32 support, so extracted
separately.
llvm-svn: 363625
AMDGPUPropagateAttributes will not work on function bitcatsts,
so move AMDGPUFixFunctionBitcasts before it.
Differential Revision: https://reviews.llvm.org/D63455
llvm-svn: 363614
Summary:
The purpose of the padding is to guard against stale code being
fetched into the instruction cache by the lowest level prefetching.
We're generating relocatable ELF here, and so the padding should
arguably be added by the linker. This is in fact what Mesa does.
This also fixes multi-part shaders for Mesa.
Change-Id: I6bfede58f20e9f337762ccf39ef9e0e263e69e82
Reviewers: arsenm, rampitec, t-tye
Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63427
llvm-svn: 363602
Basically porting over the behaviour in AArch64ISelLowering to GISel. See
emitComparison for reference.
When we have something like this:
```
lhs = G_SUB 0, y
...
G_ICMP lhs, rhs
```
We can fold away the G_SUB and produce a cmn instead, given that we produce
the same value in NZCV.
Add a test showing that the transformation works, and also showing that we
don't perform the transformation when it's unsafe.
Also factor out the CSet emission into emitCSetForICMP.
Differential Revision: https://reviews.llvm.org/D63163
llvm-svn: 363596
We don't know if its safe to unfold if we're in 32-bit mode.
This is simlar to what was done to some load opcodes in r363523.
I think its pretty unlikely we will try to unfold these anyway so
I don't think this is testable.
llvm-svn: 363595
If a XMM non-temporal store has less than natural alignment, scalarize the vector - with SSE4A we can stay on the vector and use MOVNTSD(f64), else we must move to GPRs and use MOVNTI(i32/i64).
llvm-svn: 363592
The pass works in two modes:
Mode 1: Just set attributes starting from kernels. This can work at
the very beginning of opt and llc pipeline, but cannot clone functions
because it must be a function pass.
Mode 2: Actually clone functions for new attributes. This can only work
after all function passes in the opt pipeline because it has to be a
module pass.
Differential Revision: https://reviews.llvm.org/D63208
llvm-svn: 363586
If a YMM/ZMM non-temporal store has less than natural alignment, split the vector - either they will be satisfactorily aligned or will continue to be split until they are XMMs - at which point the legalizer will scalarize it.
llvm-svn: 363582
When considering a loop containing nontemporal stores or loads for
vectorization, suppress the vectorization if the corresponding
vectorized store or load with the aligment of the original scaler
memory op is not supported with the nontemporal hint on the target.
This adds two new functions:
bool isLegalNTStore(Type *DataType, unsigned Alignment) const;
bool isLegalNTLoad(Type *DataType, unsigned Alignment) const;
to TTI, leaving the target independent default implementation as
returning true, but with overriding implementations for X86 that
check the legality based on available Subtarget features.
This fixes https://llvm.org/PR40759
Differential Revision: https://reviews.llvm.org/D61764
llvm-svn: 363581
I keep using the wrong instruction when manually writing tests. This
really needs to check the number of operands, but I don't see an easy
way to do that right now.
llvm-svn: 363579
This is currently only used for ymm->xmm splitting but we shouldn't hardcode the offsets/alignment.
This is necessary for an upcoming patch to split under-aligned non-temporal vector loads.
llvm-svn: 363570
For loads, pre-SSE41 we can't perform NT loads at all, and after that we can only perform vector aligned loads, so if the alignment is less than for a xmm we'll just end up using the regular unaligned vector loads anyway.
First step towards fixing PR42026 - the next step for stores will be to use SSE4A movntsd where possible and to avoid the stack spill on SSE2 targets.
Differential Revision: https://reviews.llvm.org/D63246
llvm-svn: 363564
The HardwareLoops pass finds exit blocks with a scevable exit count.
If the target specifies to update the loop counter in a register,
through a phi, we need to ensure that the exit block is a latch so
that we can insert the phi with the correct value for the incoming
edge.
Differential Revision: https://reviews.llvm.org/D63336
llvm-svn: 363556
Some GEPs were not being split, presumably because that split would just be
undone by the DAGCombiner. Not performing those splits can prevent important
optimizations, such as preventing the element indices / member offsets from
being (partially) folded into load/store instruction immediates. This patch:
- Makes the splits also occur in the cases where the base address and the GEP
are in the same BB.
- Ensures that the DAGCombiner doesn't reassociate them back again.
Differential Revision: https://reviews.llvm.org/D60294
llvm-svn: 363544
This patch changes MIR stack-id from an integer to an enum,
and adds printing/parsing support for this in MIR files. The default
stack-id '0' is now renamed to 'default'.
This should make MIR tests that have stack objects with different stack-ids
more descriptive. It also clarifies code operating on StackID.
Reviewers: arsenm, thegameg, qcolombet
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D60137
llvm-svn: 363533
Create the ARMBasicBlockUtils class for tracking and querying basic
blocks sizes so we can use them when generating low-overhead loops.
Differential Revision: https://reviews.llvm.org/D63265
llvm-svn: 363530
Summary:
SPE passes doubles the same as soft-float, in register pairs as i32
types. This is all handled by the target-independent layer. However,
this is not optimal when splitting or reforming the doubles, as it
pushes to the stack and loads from, on either side.
For instance, to pass a double argument to a function, assuming the
double value is in r5, the sequence currently looks like this:
evstdd 5, X(1)
lwz 3, X(1)
lwz 4, X+4(1)
Likewise, to form a double into r5 from args in r3 and r4:
stw 3, X(1)
stw 4, X+4(1)
evldd 5, X(1)
This optimizes the fence to use SPE instructions. Now, to pass a double
to a function:
mr 4, 5
evmergehi 3, 5, 5
And to form a double into r5 from args in r3 and r4:
evmergelo 5, 3, 4
This is comparable to the way that gcc generates the double splits.
This also fixes a bug with expanding builtins to libcalls, where the
LowerCallTo() code path was generating intermediate illegal type nodes.
Reviewers: nemanjai, hfinkel, joerg
Subscribers: kbarton, jfb, jsji, llvm-commits
Differential Revision: https://reviews.llvm.org/D54583
llvm-svn: 363526
It would not be safe to unfold the memory form the register form
without checking that we are compiling for 64-bit mode.
This probaby isn't a real functional issue since we are unlikely
to unfold any of these instructions since they don't have any
tied registers, aren't commutable, and don't have any inputs
other than the address.
llvm-svn: 363523
Summary:
Instead of encoding a high-word of 0 using a fake TargetGlobalAddress,
just use a literal target constant. This simplifies some subsequent changes.
The generated assembly is now more explicit about the kind of relocation
that is to be used.
Change-Id: I066835202d23b5941fa7a358eb4b89e9b71ab6f8
Reviewers: arsenm, rampitec
Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61491
llvm-svn: 363516
This is similar logic/motivation to the select splitting in D62969.
In D63233, the pattern changes so that we no longer have an extract_subvector of vselect,
but the operands of the select are still being concatenated.
The closest case is represented in either the first or last test diffs here - we have an
extra instruction, but we converted 3-4 ymm instructions into 4-5 xmm instructions.
I think that's the right trade-off for most AVX1 targets.
In the example based on PR37428:
https://bugs.llvm.org/show_bug.cgi?id=37428
...this makes the loop about 30% faster (tested on Haswell by compiling with -mavx).
Differential Revision: https://reviews.llvm.org/D63364
llvm-svn: 363508
Summary:
If the nested loop is an innermost loop, prefer to a 32-byte alignment, so that
we can decrease cache misses and branch-prediction misses. Actual alignment of
the loop will depend on the hotness check and other logic in alignBlocks.
The old code will only align hot loop to 32 bytes when the LoopSize larger than
16 bytes and smaller than 32 bytes, this patch will align the innermost hot loop
to 32 bytes not only for the hot loop whose size is 16~32 bytes.
Reviewed By: steven.zhang, jsji
Differential Revision: https://reviews.llvm.org/D61228
llvm-svn: 363495