This matches the other MIB methods, none of which modify the builder.
Without this, we can't chain copyImplicitOps.
Also reformat the few users, in PPCEarlyReturn.
llvm-svn: 255828
The access function has a short entry and a short exit, the initialization
block is only run the first time. To improve the performance, we want to
have a short frame at the entry and exit.
We explicitly handle most of the CSRs via copies. Only the CSRs that are not
handled via copies will be in CSR_SaveList.
Frame lowering and prologue/epilogue insertion will generate a short frame
in the entry and exit according to CSR_SaveList. The majority of the CSRs will
be handled by register allcoator. Register allocator will try to spill and
reload them in the initialization block.
We add CSRsViaCopy, it will be explicitly handled during lowering.
1> we first set FunctionLoweringInfo->SplitCSR if conditions are met (the target
supports it for the given machine function and the function has only return
exits). We also call TLI->initializeSplitCSR to perform initialization.
2> we call TLI->insertCopiesSplitCSR to insert copies from CSRsViaCopy to
virtual registers at beginning of the entry block and copies from virtual
registers to CSRsViaCopy at beginning of the exit blocks.
3> we also need to make sure the explicit copies will not be eliminated.
The target independent portion was committed as r255353.
rdar://problem/23557469
Differential Revision: http://reviews.llvm.org/D15341
llvm-svn: 255821
The SystemZ linkers provide an optimization to transform a general-
or local-dynamic TLS sequence into an initial-exec sequence if possible.
Do do that, the compiler generates a function call to __tls_get_offset,
which is a brasl instruction annotated with *two* relocations:
- a R_390_PLT32DBL to install __tls_get_offset as branch target
- a R_390_TLS_GDCALL / R_390_TLS_LDCALL to inform the linker that
the TLS optimization should be performed if possible
If the optimization is performed, the brasl is replaced by an ld load
instruction.
However, *both* relocs are processed independently by the linker.
Therefore it is crucial that the R_390_PLT32DBL is processed *first*
(installing the branch target for the brasl) and the R_390_TLS_GDCALL
is processed *second* (replacing the whole brasl with an ld).
If the relocs are swapped, the linker will first replace the brasl
with an ld, and *then* install the __tls_get_offset branch target
offset. Since ld has a different layout than brasl, this may even
result in a completely different (or invalid) instruction; in any
case, the resulting code is corrupted.
Unfortunately, the way the MC common code sorts relocations causes
these two to *always* end up the wrong way around, resulting in
wrong code generation by the linker and crashes.
This patch overrides the sortRelocs routine to detect this particular
pair of relocs and enforce the required order.
llvm-svn: 255787
When comparing a zero-extended value against a constant small enough to
be in range of the inner type, it doesn't matter whether a signed or
unsigned compare operation (for the outer type) is being used. This is
why the code in adjustSubwordCmp had this assertion:
assert(C.ICmpType == SystemZICMP::Any &&
"Signedness shouldn't matter here.");
assuming the the caller had already detected that fact. However, it
turns out that there cases, in particular with always-true or always-
false conditions that have not been eliminated when compiling at -O0,
where this is not true.
Instead of failing an assertion if C.ICmpType is not SystemZICMP::Any
here, we can simply *set* it safely to SystemZICMP::Any, however.
llvm-svn: 255786
This removes an unpleasant hack involving a global variable for special
lowering of certain memcpy calls. These are now lowered as intended in
EmitTargetCodeForMemcpy in the same way that other targets do it.
llvm-svn: 255785
ARMv8.2-A adds 16-bit floating point versions of all existing SIMD
floating-point instructions. This is an optional extension, so all of
these instructions require the FeatureFullFP16 subtarget feature.
Note that VFP without SIMD is not a valid combination for any version of
ARMv8-A, but I have ensured that these instructions all depend on both
FeatureNEON and FeatureFullFP16 for consistency.
Differential Revision: http://reviews.llvm.org/D15039
llvm-svn: 255764
ARMv8.2-A adds 16-bit floating point versions of all existing VFP
floating-point instructions. This is an optional extension, so all of
these instructions require the FeatureFullFP16 subtarget feature.
The assembly for these instructions uses S registers (AArch32 does not
have H registers), but the instructions have ".f16" type specifiers
rather than ".f32" or ".f64". The top 16 bits of each source register
are ignored, and the top 16 bits of the destination register are set to
zero.
These instructions are mostly the same as the 32- and 64-bit versions,
but they use coprocessor 9 rather than 10 and 11.
Two new instructions, VMOVX and VINS, have been added to allow packing
and extracting two 16-bit floats stored in the top and bottom halves of
an S register.
New fixup kinds have been added for the PC-relative load and store
instructions, but no ELF relocations have been added as they have a
range of 512 bytes.
Differential Revision: http://reviews.llvm.org/D15038
llvm-svn: 255762
This folds (ashr (shl a, [56,48,32,24,16]), SarConst)
into (shl, (sext (a), [56,48,32,24,16] - SarConst))
or into (lshr, (sext (a), SarConst - [56,48,32,24,16]))
depending on sign of (SarConst - [56,48,32,24,16])
sexts in X86 are MOVs.
The MOVs have the same code size as above SHIFTs (only SHIFT by 1 has lower code size).
However the MOVs have 2 advantages to SHIFTs on x86:
1. MOVs can write to a register that differs from source.
2. MOVs accept memory operands.
This fixes PR24373.
Patch by: evgeny.v.stupachenko@intel.com
Differential Revision: http://reviews.llvm.org/D13161
llvm-svn: 255761
It adjusts from RSP-after-prologue to RBP, which is what SEH filters
need to do before they can use llvm.localrecover.
Fixes SEH filter captures, which were broken in r250088.
Issue reported by Alex Crichton.
llvm-svn: 255707
This patch improves on the suggested codegen from PR24475:
https://llvm.org/bugs/show_bug.cgi?id=24475
but only for the fmaxf() case to start, so we can sort out any bugs before
extending to fmin, f64, and vectors.
The fmax / maxnum definitions provide us flexibility for signed zeros, so the
only thing we have to worry about in this replacement sequence is NaN handling.
Note 1: It may be better to implement this as lowerFMAXNUM(), but that exposes
a problem: SelectionDAGBuilder::visitSelect() transforms compare/select
instructions into FMAXNUM nodes if we declare FMAXNUM legal or custom. Perhaps
that should be checking for NaN inputs or global unsafe-math before transforming?
As it stands, that bypasses a big set of optimizations that the x86 backend
already has in PerformSELECTCombine().
Note 2: The v2f32 test reveals another bug; the vector is extended to v4f32, so
we have completely unnecessary operations happening on undef elements of the
vector.
Differential Revision: http://reviews.llvm.org/D15294
llvm-svn: 255700
Summary: I'm not sure how things worked before without this.
Reviewers: arsenm
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D15492
llvm-svn: 255692
Add instruction patterns for matching load and store instructions with constant
offsets in addresses. The code is fairly redundant due to the need to replicate
everything between imm, tglobaldadr, and texternalsym, but this appears to be
common tablegen practice. The main alternative appears to be to introduce
matching functions with C++ code, but sticking with purely generated matchers
seems better for now.
Also note that this doesn't yet support offsets from getelementptr, which will
be the most common case; that will depend on a change in target-independent code
in order to set the NoUnsignedWrap flag, which I'll submit separately. Until
then, the testcase uses ptrtoint+add+inttoptr with a nuw on the add.
Also implement isLegalAddressingMode with an approximation of this.
Differential Revision: http://reviews.llvm.org/D15538
llvm-svn: 255681
Summary:
We were previously selecting all constant loads to SMRD instructions and legalizing
the SMRDs with non-uniform addresses during the SIFixSGPRCopesPass.
This new solution is more simple and also generates much better code, because
the instruction selector is able to take advantage of all the MUBUF addressing
modes that are legalization pass wasn't able to.
We also no longer need to generate v_add_* instructions when we
have a uniform pointer and a non-uniform offset, as this is now folded into the
MUBUF instruction during instruction selection.
Reviewers: arsenm
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D15425
llvm-svn: 255672
A large number of loop utility functions take a `Pass *` and reach
into it to find out which analyses to preserve. There are a number of
problems with this:
- The APIs have access to pretty well any Pass state they want, so
it's hard to tell what they may or may not do.
- Other APIs have copied these and pass around a `Pass *` even though
they don't even use it. Some of these just hand a nullptr to the API
since the callers don't even have a pass available.
- Passes in the new pass manager don't work like the current ones, so
the APIs can't be used as is there.
Instead, we should explicitly thread the analysis results that we
actually care about through these APIs. This is both simpler and more
reusable.
llvm-svn: 255669
On SparcV8, doubles get passed in two 32-bit integer registers. The call
code was already handling endianness correctly, but the incoming
argument code was not -- it got the two halves in opposite order.
Also remove some dead code in LowerFormalArguments_32 to handle
less-than-32bit values, which can't actually happen.
Finally, add some test cases for the 32-bit calling convention, cribbed
from the 64abi.ll test, and run for both big and little-endian.
llvm-svn: 255668
We only want to emit CFI adjustments when actually using DWARF.
This fixes PR25828.
Differential Revision: http://reviews.llvm.org/D15522
llvm-svn: 255664
The radeonsi fp64 support can hit these now that some redundant bitcasts
are folded.
Patch by: Michel Dänzer
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
llvm-svn: 255657
"movl $-1, %eax" is 5 bytes, "xorl %eax, %eax; decl %eax" is 3 bytes.
This commit makes LLVM use the latter when optimizing for size.
Differential Revision: http://reviews.llvm.org/D14971
llvm-svn: 255656
Summary:
These are meant to be used instead of the llvm.SI.tid intrinsic which will
be deprecated at some point.
Reviewers: arsenm
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D15475
llvm-svn: 255652
Summary:
These are meant to be used instead of the llvm.SI.fs.interp intrinsic which
will be deprecated at some point.
Reviewers: arsenm
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D15474
llvm-svn: 255651
the feature flag is essential for RDPKRU and WRPKRU instruction
more about the instruction can be found in the SDM rev 56, vol 2 from http://www.intel.com/sdm
Differential Revision: http://reviews.llvm.org/D15491
llvm-svn: 255644
This patch corresponds to review:
http://reviews.llvm.org/D15117
In preparation for supporting IEEE Quad precision floating point,
this patch simply defines a feature to specify the target supports this.
For now, nothing is done with the target feature, we just don't want
warnings from the Clang FE when a user specifies -mfloat128.
Calling convention and other related work will add to this patch in
the near future.
llvm-svn: 255642
Full type legalizer that works with all vectors length - from 2 to 16, (i32, i64, float, double).
This intrinsic, for example
void @llvm.masked.scatter.v2f32(<2 x float>%data , <2 x float*>%ptrs , i32 align , <2 x i1>%mask )
requires type widening for data and type promotion for mask.
Differential Revision: http://reviews.llvm.org/D13633
llvm-svn: 255629
Prior to this patch, we would wrongly stick to the variant with imm8 encoding
even when the relocation could not fit that size.
rdar://problem/23785506
llvm-svn: 255583
Prior to this patch, we would wrongly stick to the variant with imm8 encoding
even when the relocation could not fit that size.
rdar://problem/23785506
llvm-svn: 255570
Add return type information to call and call_indirect instructions. This
allows them to be disambiguated without knowledge of the callee.
Differential Revision: http://reviews.llvm.org/D15484
llvm-svn: 255565
Implement a new BLOCK scope placement algorithm which better handles
early-return blocks and early exists from nested scopes.
Differential Revision: http://reviews.llvm.org/D15368
llvm-svn: 255564
Part 1 was submitted in http://reviews.llvm.org/D15134.
Changes in this part:
* X86RegisterInfo.td, X86RecognizableInstr.cpp: Add FR128 register class.
* X86CallingConv.td: Pass f128 values in XMM registers or on stack.
* X86InstrCompiler.td, X86InstrInfo.td, X86InstrSSE.td:
Add instruction selection patterns for f128.
* X86ISelLowering.cpp:
When target has MMX registers, configure MVT::f128 in FR128RegClass,
with TypeSoftenFloat action, and custom actions for some opcodes.
Add missed cases of MVT::f128 in places that handle f32, f64, or vector types.
Add TODO comment to support f128 type in inline assembly code.
* SelectionDAGBuilder.cpp:
Fix infinite loop when f128 type can have
VT == TLI.getTypeToTransformTo(Ctx, VT).
* Add unit tests for x86-64 fp128 type.
Differential Revision: http://reviews.llvm.org/D11438
llvm-svn: 255558
The WebAssemblyStoreResults pass runs before LiveVariables, so it doesn't
expect to have to keep dead flags up to date; check this with an assert.
llvm-svn: 255551
This will make the depedence graph more accurate if an alias analysis
is provided. If nullptr is specified in its place, the behavior will
remain as it is currently.
llvm-svn: 255540
This is the second in a set of patches for soft float support for ppc32,
it enables soft float operations.
Patch by Strahinja Petrovic.
Differential Revision: http://reviews.llvm.org/D13700
llvm-svn: 255516
The .even directive aligns content to an evan-numbered address.
In at&t syntax .even
In Microsoft syntax even (without the dot).
Differential Revision: http://reviews.llvm.org/D15413
llvm-svn: 255462
This patch adds some missing calls to MBB::normalizeSuccProbs() in several
locations where it should be called. Those places are found by checking if the
sum of successors' probabilities is approximate one in MachineBlockPlacement
pass with some instrumented code (not in this patch).
Differential revision: http://reviews.llvm.org/D15259
llvm-svn: 255455
EABI attributes should only be emitted on EABI targets. This prevents the
emission of the optimization goals EABI attribute on Windows ARM.
llvm-svn: 255448
While we have successfully implemented a funclet-oriented EH scheme on
top of LLVM IR, our scheme has some notable deficiencies:
- catchendpad and cleanupendpad are necessary in the current design
but they are difficult to explain to others, even to seasoned LLVM
experts.
- catchendpad and cleanupendpad are optimization barriers. They cannot
be split and force all potentially throwing call-sites to be invokes.
This has a noticable effect on the quality of our code generation.
- catchpad, while similar in some aspects to invoke, is fairly awkward.
It is unsplittable, starts a funclet, and has control flow to other
funclets.
- The nesting relationship between funclets is currently a property of
control flow edges. Because of this, we are forced to carefully
analyze the flow graph to see if there might potentially exist illegal
nesting among funclets. While we have logic to clone funclets when
they are illegally nested, it would be nicer if we had a
representation which forbade them upfront.
Let's clean this up a bit by doing the following:
- Instead, make catchpad more like cleanuppad and landingpad: no control
flow, just a bunch of simple operands; catchpad would be splittable.
- Introduce catchswitch, a control flow instruction designed to model
the constraints of funclet oriented EH.
- Make funclet scoping explicit by having funclet instructions consume
the token produced by the funclet which contains them.
- Remove catchendpad and cleanupendpad. Their presence can be inferred
implicitly using coloring information.
N.B. The state numbering code for the CLR has been updated but the
veracity of it's output cannot be spoken for. An expert should take a
look to make sure the results are reasonable.
Reviewers: rnk, JosephTremoulet, andrew.w.kaylor
Differential Revision: http://reviews.llvm.org/D15139
llvm-svn: 255422
We don't need to pass OutStreamer as a parameter to LowerSTACKMAP and
LowerPATCHPOINT. It is a member variable of PPCAsmPrinter, and thus, is already
available. NFC.
llvm-svn: 255418
Summary: This patch adds support of conversion (mul x, 2^N + 1) => (add (shl x, N), x) and (mul x, 2^N - 1) => (sub (shl x, N), x) if the multiplication can not be converted to LEA + SHL or LEA + LEA. LLVM has already supported this on ARM, and it should also be useful on X86. Note the patch currently only applies to cases where the constant operand is positive, and I am planing to add another patch to support negative cases after this.
Reviewers: craig.topper, RKSimon
Subscribers: aemerson, llvm-commits
Differential Revision: http://reviews.llvm.org/D14603
llvm-svn: 255415
This branch adds hints for highly biased branches on the PPC architecture. Even
in absence of profiling information, LLVM will mark code reaching unreachable
terminators and other exceptional control flow constructs as highly unlikely to
be reached.
Patch by Tom Jablin!
llvm-svn: 255398
Many tests are now passing due to eliminateFrameIndex implementation and
the list needs to be re-triaged because it unblocks other failures, and
some previous failures are different. However I'm about to churn it more
by implementing more lowering, so will wait on that.
llvm-svn: 255396
Summary:
Use the SP32 physical register as the base for FrameIndex
lowering. Update it and the __stack_pointer global var in the prolog and
epilog. Extend the mapping of virtual registers to wasm locals to
include the physical registers.
Rather than modify the target-independent PrologEpilogInserter (which
asserts that there are no virtual registers left) include a
slightly-modified copy for Wasm that does not have this assertion and
only clears the virtual registers if scavenging was needed (which of
course it isn't for wasm).
Differential Revision: http://reviews.llvm.org/D15344
llvm-svn: 255392
Summary: This patch adds support of conversion (mul x, 2^N + 1) => (add (shl x, N), x) and (mul x, 2^N - 1) => (sub (shl x, N), x) if the multiplication can not be converted to LEA + SHL or LEA + LEA. LLVM has already supported this on ARM, and it should also be useful on X86. Note the patch currently only applies to cases where the constant operand is positive, and I am planing to add another patch to support negative cases after this.
Reviewers: craig.topper, RKSimon
Subscribers: aemerson, llvm-commits
Differential Revision: http://reviews.llvm.org/D14603
llvm-svn: 255391
After much discussion, ending here:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151123/315620.html
it has been decided that, instead of having the vectorizer directly generate
special absdiff and horizontal-add intrinsics, we'll recognize the relevant
reduction patterns during CodeGen. Accordingly, these intrinsics are not needed
(the operations they represent can be pattern matched, as is already done in
some backends). Thus, we're backing these out in favor of the current
development work.
r248483 - Codegen: Fix llvm.*absdiff semantic.
r242546 - [ARM] Use [SU]ABSDIFF nodes instead of intrinsics for VABD/VABA
r242545 - [AArch64] Use [SU]ABSDIFF nodes instead of intrinsics for ABD/ABA
r242409 - [Codegen] Add intrinsics 'absdiff' and corresponding SDNodes for absolute difference operation
llvm-svn: 255387
computeRegisterLiveness() was broken in that it reported dead for a
register even if a subregister was alive. I assume this was because the
results of analayzePhysRegs() are hard to understand with respect to
subregisters.
This commit: Changes the results of analyzePhysRegs (=struct
PhysRegInfo) to be clearly understandable, also renames the fields to
avoid silent breakage of third-party code (and improve the grammar).
Fix all (two) users of computeRegisterLiveness() in llvm: By reenabling
it and removing workarounds for the bug.
This fixes http://llvm.org/PR24535 and http://llvm.org/PR25033
Differential Revision: http://reviews.llvm.org/D15320
llvm-svn: 255362
These are redundant pairs of nodes defined for
INSERT_VECTOR_ELEMENT/EXTRACT_VECTOR_ELEMENT.
insertelement/extractelement are slightly closer to the corresponding
C++ node name, and has stricter type checking so prefer it.
Update targets to only use these nodes where it is trivial to do so.
AArch64, ARM, and Mips all have various type errors on simple replacement,
so they will need work to fix.
Example from AArch64:
def : Pat<(sext_inreg (vector_extract (v16i8 V128:$Rn), VectorIndexB:$idx), i8),
(i32 (SMOVvi8to32 V128:$Rn, VectorIndexB:$idx))>;
Which is trying to do sext_inreg i8, i8.
llvm-svn: 255359
Summary:
ADJCALLSTACK{DOWN,UP} (aka CALLSEQ_{START,END}) MIs are supposed to use
and def the stack pointer. Since they do not, all the nodes are being
eliminated by DeadMachineInstructionElim, so they aren't in the IR when
PrologEpilogInserter/eliminateCallFramePseudo needs them.
This change fixes that, but since RegStackify will not stackify across
them (and it runs early, before PEI), change LowerCall to only emit them
when the call frame size is > 0. That makes the current code work the
same way and makes code handled by D15344 also work the same way. We can
expand the condition beyond NumBytes > 0 in the future if needed.
Reviewers: sunfish, jfb
Subscribers: jfb, dschuff, llvm-commits
Differential Revision: http://reviews.llvm.org/D15459
llvm-svn: 255356
Access to aligned globals gives us a chance to peephole optimize nonzero
offsets. If a struct is 4 byte aligned, then accesses to bytes 0-3 won't
overflow the available displacement. For example:
addis 3, 2, b4v@toc@ha
addi 4, 3, b4v@toc@l
lbz 5, b4v@toc@l(3) ; This is the result of the current peephole
lbz 6, 1(4) ; optimizer
lbz 7, 2(4)
lbz 8, 3(4)
If b4v is 4-byte aligned, we can skip using register 4 because we know
that b4v@toc@l+{1,2,3} won't overflow 32K, and instead generate:
addis 3, 2, b4v@toc@ha
lbz 4, b4v@toc@l(3)
lbz 5, b4v@toc@l+1(3)
lbz 6, b4v@toc@l+2(3)
lbz 7, b4v@toc@l+3(3)
Saving a register and an addition.
Larger alignments allow larger structures/arrays to be optimized.
llvm-svn: 255319
Previously in the conversion cost table there are no entries for integer-integer
conversions on SSE2. This will result in imprecise costs for certain vectorized
operations. This patch adds those entries for SSE2 and SSE4.1. The cost numbers
are counted from the result of running llc on the new test case in this patch.
Differential revision: http://reviews.llvm.org/D15132
llvm-svn: 255315
This was causing bad code gen and assembly that won't assemble, as
mixed altivec and vsx code would end up with a vsx high register
assigned to an altivec instruction, which won't work. Constraining the
classes allows the optimization to proceed.
llvm-svn: 255299
This patch corresponds to review:
http://reviews.llvm.org/D15286
LLVM IR frequently contains bitcast operations between floating point and
integer values of the same width. Doing this through memory operations is
quite expensive on PPC. This patch allows the use of direct register moves
between FPRs and GPRs for lowering bitcasts.
llvm-svn: 255246
SystemZ needs to do its scheduling after branch relaxation, which can
only happen after block placement, and therefore the standard
PostRAScheduler point in the pass sequence is too early.
TargetMachine::targetSchedulesPostRAScheduling() is a new method that
signals on returning true that target will insert the final scheduling
pass on its own.
Reviewed by Hal Finkel
llvm-svn: 255234
ISD::FCOPYSIGN permits its operands to have differing types, and DAGCombiner
uses this. Add some def : Pat rules to expand this out into an explicit
conversion and a normal copysign operation.
llvm-svn: 255220
Summary:
This allows us to remove the END_OF_TEXT_LABEL hack we had been using
and simplifies the fixups used to compute the address of constant
arrays.
Reviewers: arsenm
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D15257
llvm-svn: 255204
without a frame pointer when unwind may happen.
This is a workaround for a bug in the way we emit the CFI directives for
frameless unwind information. See PR25614.
llvm-svn: 255175
Reinteroduce the code for moving ARGUMENTS back to the top of the basic block.
While the ARGUMENTS physical register prevents sinking and scheduling from
moving them, it does not appear to be sufficient to prevent SelectionDAG from
moving them down in the initial schedule. This patch introduces a patch that
moves them back to the top immediately after SelectionDAG runs.
This is still hopefully a temporary solution. http://reviews.llvm.org/D14750 is
one alternative, though the review has not been favorable, and proposed
alternatives are longer-term and have other downsides.
This fixes the main outstanding -verify-machineinstrs failures, so it adds
-verify-machineinstrs to several tests.
Differential Revision: http://reviews.llvm.org/D15377
llvm-svn: 255125
We mutated the DAG, which invalidated the node we were trying to use
as a base register. Sometimes we got away with it, but other times the
node really did get deleted before it was finished with.
Should fix PR25733
llvm-svn: 255120
The bots are now running the torture tests properly. Bin all failures from the GCC C torture tests so that we can tackle failures and make the tree go red on regressions.
llvm-svn: 255111
Summary:
Although the multiclass for i32 selects might seem redundant as it has
only one instantiation, we will use it to replace the correspondent
patterns in Mips64r6InstrInfo.td in follow-up commits.
Reviewers: dsanders
Subscribers: llvm-commits, dsanders
Differential Revision: http://reviews.llvm.org/D14612
llvm-svn: 255110
Commited patch was intended to implement LH, LHE, LHU and LHUE instructions.
After commit test-suite failed with error message in the form of:
fatal error: error in backend: Cannot select: t124: i32,ch = load<LD2[%d](tbaa=<0x94acc48>), sext from i16> t0, t2, undef:i32
For that reason I decided to revert commit r254897 and make new patch which besides implementation and standard regression tests will also have dedicated tests (CodeGen) for the above error.
llvm-svn: 255109
Otherwise, we think that most types that look like they'd fit in a
legal vector type are legal (so, basically, *any* vector type with a
size between 33 and 128 bits, I think, since we use pow2 alignment;
e.g., v2i25, v3f32, ...).
DataLayout::getTypeAllocSize rounds up based on alignment.
When checking for target intrinsic legality, that's not what we want:
if rounding makes a difference, the type isn't legal, and the
target intrinsics shouldn't be used, as they are always assumed legal.
One could make the argument that alloc size is ultimately the most
relevant here, since we're dealing with LD/ST intrinsics. That's only
true if we did legalize them though; that's a problem for another day.
Use DataLayout::getTypeSizeInBits instead of getTypeAllocSizeInBits.
Type::getSizeInBits can't be used because that'd gratuitously break
pointer vector support.
Some of these uses are currently fine, because we only hit them when
the type is already known legal (e.g., r114454). Update them for
consistency. It's faster to avoid the rounding anyway!
llvm-svn: 255089
Summary:
This fixes failure when trying to select
insertelement <4 x half> undef, half %a, i64 0
which gets transformed to a scalar_to_vector node.
The accompanying v4 and v8 tests fail instruction selection without this
patch.
Reviewers: ab, jmolloy
Subscribers: srhines, llvm-commits
Differential Revision: http://reviews.llvm.org/D15322
llvm-svn: 255072
On AVX and AVX2, BROADCAST instructions can load a scalar into all elements of a target vector.
This patch improves the lowering of 'splat' shuffles of a loaded vector into a broadcast - currently the lowering only works for cases where we are splatting the zero'th element, which is now generalised to any element.
Fix for PR23022
Differential Revision: http://reviews.llvm.org/D15310
llvm-svn: 255061
Summary:
Before ARMv5T, Thumb1 code could not pop PC, as described at D14357 and D14986;
so we need the special fixup in the epilogue.
Reviewers: jroelofs, qcolombet
Subscribers: aemerson, llvm-commits, rengolin
Differential Revision: http://reviews.llvm.org/D15126
llvm-svn: 255047
AND/BIC instructions do accept SP/PC, so the register class should be
more generic (rGPR -> GPR) to cope with that case. Adding more tests.
llvm-svn: 255034
Summary:
We don't check the size operand on ext/dext*/ins/dins* yet because the
permitted range depends on the pos argument and we can't check that using
this mechanism.
The bug was that dextu/dinsu accepted 0..31 in the pos operand instead of 32..63.
Reviewers: vkalintiris
Subscribers: llvm-commits, dsanders
Differential Revision: http://reviews.llvm.org/D15190
llvm-svn: 255015
ARMv8.2-A adds 16-bit floating point versions of all existing SIMD
floating-point instructions. This is an optional extension, so all of
these instructions require the FeatureFullFP16 subtarget feature.
Note that VFP without SIMD is not a valid combination for any version of
ARMv8-A, but I have ensured that these instructions all depend on both
FeatureNEON and FeatureFullFP16 for consistency.
The ".2h" vector type specifier is now legal (for the scalar pairwise
reduction instructions), so some unrelated tests have been modified as
different error messages are emitted. This is not a problem as the
invalid operands are still caught.
llvm-svn: 255010
FP logic instructions are supported in DQ extension on AVX-512 target.
I use integer operations instead.
Added tests.
I also enabled FABS in this patch in order to check ANDPS.
The operations are FOR, FXOR, FAND, FANDN.
The instructions, that supported for 512-bit vector under DQ are:
VORPS/PD, VXORPS/PD, VANDPS/PD, FANDNPS/PD.
Differential Revision: http://reviews.llvm.org/D15110
llvm-svn: 254913
Summary: This reverts r254234, and adds a simple fix for the annoying case of use-after-free.
Reviewers: rengolin
Subscribers: aemerson, llvm-commits, rengolin
Differential Revision: http://reviews.llvm.org/D15236
llvm-svn: 254912
Patterns were missing for KNL target for <8 x i32>, <8 x float> masked load/store.
This intrinsic comes with all legal types:
<8 x float> @llvm.masked.load.v8f32(<8 x float>* %addr, i32 align, <8 x i1> %mask, <8 x float> %passThru),
but still requires lowering, because VMASKMOVPS, VMASKMOVDQU32 work with 512-bit vectors only.
All data operands should be widened to 512-bit vector.
The mask operand should be widened to v16i1 with zeroes.
Differential Revision: http://reviews.llvm.org/D15265
llvm-svn: 254909
According to x86 spec, loopz and loopnz should be supported for Intel syntax, where loopz is equivalent to loope and loopnz is equivalent to loopne.
Differential Revision: http://reviews.llvm.org/D15148
llvm-svn: 254877
This removes the code path that generate "synchronous" (only correct at call site) CFA.
We will probably want to re-introduce it once we are capable of emitting different
.eh_frame and .debug_frame sections.
Differential Revision: http://reviews.llvm.org/D14948
llvm-svn: 254874
This patch introduces a codegen-only instruction currently named br_unless,
which makes it convenient to implement ReverseBranchCondition and re-enable
the MachineBlockPlacement pass. Then in a late pass, it lowers br_unless
back into br_if.
Differential Revision: http://reviews.llvm.org/D14995
llvm-svn: 254826
Add physical register defs to instructions used from stackified
instructions to prevent them from being scheduled into the middle of
a stack sequence. This is a conservative measure which may be loosened
in the future.
Differential Revision: http://reviews.llvm.org/D15252
llvm-svn: 254811
This is just prototype for load/store for i32 types. I'll add them to
the rest of the types if we like this direction.
Differential Revision: http://reviews.llvm.org/D15197
llvm-svn: 254807
When the notion of target specific memory intrinsics was introduced to EarlyCSE, the commit confused the notions of volatile and simple memory access. Since I'm about to start working on this area, cleanup the naming so that patches aren't horribly confusing. Note that the actual implementation was always bailing if the load or store wasn't simple.
Reminder:
- "volatile" - C++ volatile, can't remove any memory operations, but in principal unordered
- "ordered" - imposes ordering constraints on other nearby memory operations
- "atomic" - can't be split or sheared. In LLVM terms, all "ordered" operations are also atomic so the predicate "isAtomic" is often used.
- "simple" - a load which is none of the above. These are normal loads and what most of the optimizer works with.
llvm-svn: 254805
Full varargs support will depend on prologue/epilogue support, but this patch
gets us started with most of the basic infrastructure.
Differential Revision: http://reviews.llvm.org/D15231
llvm-svn: 254799
These instructions are not supported by all CPUs in 64-bit mode. Emitting them
causes Chromium to crash on start-up for users with such chips.
(GCC puts these instructions behind -msahf on 64-bit for the same reason.)
This patch adds FeatureLAHFSAHF, enables it by default for 32-bit targets
and modern CPUs, and changes X86InstrInfo::copyPhysReg back to the lowering
from before r244503 when the instructions are not available.
Differential Revision: http://reviews.llvm.org/D15240
llvm-svn: 254793
This commit adds a new target-independent calling convention for C++ TLS
access functions. It aims to minimize overhead in the caller by perserving as
many registers as possible.
The target-specific implementation for X86-64 is defined as following:
Arguments are passed as for the default C calling convention
The same applies for the return value(s)
The callee preserves all GPRs - except RAX and RDI
The access function makes C-style TLS function calls in the entry and exit
block, C-style TLS functions save a lot more registers than normal calls.
The added calling convention ties into the existing implementation of the
C-style TLS functions, so we can't simply use existing calling conventions
such as preserve_mostcc.
rdar://9001553
llvm-svn: 254737
Since BuildMI() automatically adds the implicit operands for a new instruction,
adding the old instructions CC operand resulted in that there were two CC imp-def
operands, where only one was marked as dead. This caused buildSchedGraph() to
miss dependencies on the CC reg.
Review by Ulrich Weigand
llvm-svn: 254714
Add new x86 pass which replaces address calculations in load or store instructions with def register of existing LEA (must be in the same basic block), if the LEA calculates address that differs only by a displacement. Works only with -Os or -Oz.
Differential Revision: http://reviews.llvm.org/D13294
llvm-svn: 254712
with its source instead of forcing the values on GPRs.
This improves the lowering of vector code when such bitcasts happen in the
middle of vector computations.
rdar://problem/23691584
llvm-svn: 254684
Summary:
computeRegisterLiveness and analyzePhysReg are currently getting
confused about liveness in some cases, breaking copyPhysReg's
calculation of whether AX is dead in some cases. Work around this issue
temporarily by assuming that AX is always live.
See detail in: https://llvm.org/bugs/show_bug.cgi?id=25033#c7
And associated bugs PR24535 PR25033 PR24991 PR24992 PR25201.
This workaround makes the code correct but slightly inefficient, but it
seems to confuse the machine instr verifier which now things EAX was
undefined in some cases where it's being conservatively saved /
restored.
Reviewers: majnemer, sanjoy
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D15198
llvm-svn: 254680
When a block has no terminator instructions, getFirstTerminator() returns
end(), which can't be used in dominance checks. Check dominance for phi
operands separately.
Also, remove some bits from WebAssemblyRegStackify.cpp that were causing
trouble on the same testcase; they were left behind from an earlier
experiment.
Differential Revision: http://reviews.llvm.org/D15210
llvm-svn: 254662
Almost all these changes are conditioned and only apply to the new
x86-64 f128 type configuration, which will be enabled in a follow up
patch. They are required together to make new f128 work. If there is
any error, we should fix or revert them as a whole.
These changes should have no impact to current configurations.
* Relax type legalization checks to accept new f128 type configuration,
whose TypeAction is TypeSoftenFloat, not TypeLegal, but also has
TLI.isTypeLegal true.
* Relax GetSoftenedFloat to return in some cases f128 type SDValue,
which is TLI.isTypeLegal but not "softened" to i128 node.
* Allow customized FABS, FNEG, FCOPYSIGN on new f128 type configuration,
to generate optimized bitwise operators for libm functions.
* Enhance related Lower* functions to handle f128 type.
* Enhance DAGTypeLegalizer::run, SoftenFloatResult, and related functions
to keep new f128 type in register, and convert f128 operators to library calls.
* Fix Combiner, Emitter, Legalizer routines that did not handle f128 type.
* Add ExpandConstant to handle i128 constants, ExpandNode
to handle ISD::Constant node.
* Add one more parameter to getCommonSubClass and firstCommonClass,
to guarantee that returned common sub class will contain the specified
simple value type.
This extra parameter is used by EmitCopyFromReg in InstrEmitter.cpp.
* Fix infinite loop in getTypeLegalizationCost when f128 is the value type.
* Fix printOperand to handle null operand.
* Enhance ISD::BITCAST node to handle f128 constant.
* Expand new f128 type for BR_CC, SELECT_CC, SELECT, SETCC nodes.
* Enhance X86AsmPrinter to emit f128 values in comments.
Differential Revision: http://reviews.llvm.org/D15134
llvm-svn: 254653
Summary:
These ADJCALLSTACK markers don't generate code, but they keep dynamic
alloca code that calls chkstk out of the prologue.
This slightly pessimizes inalloca calls by preventing some register copy
coalescing, but I can live with that.
Reviewers: qcolombet
Subscribers: hans, llvm-commits
Differential Revision: http://reviews.llvm.org/D15200
llvm-svn: 254645
In the case of a conditional branch without a preceding cmp we used to emit
a "and; cmp; b.eq/b.ne" sequence, use tbz/tbnz instead.
Differential Revision: http://reviews.llvm.org/D15122
llvm-svn: 254621
Currently "<type> ptr <reg name>" treated as <reg name> in MS inline asm, ignoring the "<type> ptr" completely and possibly ignoring the intention of the user.
Fixed llvm to produce an error when encountering "<type> ptr <reg name>" operands.
For example: andpd xmm1,xmmword ptr xmm1 --> andpd xmm1, xmm1
though andpd has 2 possible matching formats - andpd xmm, xmm/m128
Patch by: ziv.izhar@intel.com
Differential Revision: http://reviews.llvm.org/D14607
llvm-svn: 254607
Summary: This is done only when targeting HSA.
Reviewers: arsenm
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D13807
llvm-svn: 254587
This call should in fact be made by RegScavenger::enterBasicBlock()
called below. The first call does nothing except for triggering UB,
indicated by UBSan (passing nullptr to memset()).
llvm-svn: 254548
std::hex is not used anywhere in LLVM code base except for this place,
and it has a known undefined behavior (at least in libstdc++ 4.9.3):
https://llvm.org/bugs/show_bug.cgi?id=18156, which fires in UBSan
bootstrap of LLVM.
llvm-svn: 254547
The ARM ARM is clear that 128-bit loads are only guaranteed to have been atomic
if there has been a corresponding successful stxp. It's less clear for AArch32, so
I'm leaving that alone for now.
llvm-svn: 254524
Summary: Only global or readonly segment variables should appear in object files.
Reviewers: arsenm
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D15111
llvm-svn: 254519
|9B DD /7| FSTSW m2byte| Valid Valid Store FPU status word at m2byteafter checking for pending unmasked floating-point exceptions.|
|9B DF E0| FSTSW AX| Valid Valid Store FPU status word in AX register after checking for pending unmasked floating-point exceptions.|
|DD /7 |FNSTSW *m2byte| Valid Valid Store FPU status word at m2bytewithout checking for pending unmasked floating-point exceptions.|
|DF E0 |FNSTSW *AX| Valid Valid Store FPU status word in AX register without checking for pending unmasked floating-point exceptions|
m2byte is word register, and therefor instruction operand need to be change from f32mem to i16mem.
Differential Revision: http://reviews.llvm.org/D14953
llvm-svn: 254512
On FMA targets, we can avoid having to load a constant to negate a float/double multiply by instead using a FNMSUB (-(X*Y)-0)
Fix for PR24366
Differential Revision: http://reviews.llvm.org/D14909
llvm-svn: 254495
I checked and updated the cost of AVX-512 conversion operations. Added cost of conversion operations in DQ mode.
Conversion of illegal types that requires vector split is not calculated right now (like for other X86 targets).
Differential Revision: http://reviews.llvm.org/D15074
llvm-svn: 254494
We mustn't introduce a shift of exactly 64-bits for any inputs, since that's an
UNDEF value (and worse, it's not what you want with the natural Arch64
implementation).
The generated code is pretty horrific, but I couldn't come up with an obviously
better alternative (if the amount is constant EXTR could help). Turns out
128-bit shifts are just nasty.
rdar://22491037
llvm-svn: 254475
The values in this field are compared against getAvailableFeatures()
which returns an uint64_t. This was causing problems in an internal
branch.
llvm-svn: 254462
Don't use commuteInstruction, and don't commute if
doing so will not improve legality. Skip the more
complex checks for literal operands and constant bus restrictions,
which are not a concern for VOP2 instructions because src1
does not accept SGPRs or constants and few implicitly
read vcc.
This gets called quite a few times and the
attempts at commuting are a significant fraction
of the time spent in SIFixSGPRCopies, so it's
somewhat worthwhile to optimize. With this patch and others
leading up to it, this reduces the compile time of SIFixSGPRCopies
on some of the LuxMark 2 kernels from ~8ms to ~5ms on my system.
llvm-svn: 254452
Summary:
This had been broken for a very long time, but nobody noticed until
D14357 enabled shrink-wrapping by default.
Reviewers: jroelofs, qcolombet
Subscribers: tyomitch, llvm-commits, rengolin
Differential Revision: http://reviews.llvm.org/D14986
llvm-svn: 254444
Summary:
When not useful bits, BitWidth becomes 0 and APInt will not be happy.
See https://llvm.org/bugs/show_bug.cgi?id=25571
We can just mark the operand as IMPLICIT_DEF is none bits of it is used.
Reviewers: t.p.northover, jmolloy
Subscribers: gberry, jmolloy, mgrang, aemerson, llvm-commits, rengolin
Differential Revision: http://reviews.llvm.org/D14803
llvm-svn: 254440
The cost for scalarized operations is computed as N * (scalar operation
cost + 1 extractelement + 1 insertelement). This partially fixes
inflating the cost of scalarized operations since every operation is
scalarized and free. I don't think we want any cost asociated with
scalarization, but for now insertelement is still counted. I'm not sure
if we should pretend that insertelement is also free, or add a way
to compute a custom scalarization cost.
llvm-svn: 254438
Summary:
This makes the assembly output look nicer and there is no reason to
have custom strings for these.
Reviewers: arsenm
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D14671
llvm-svn: 254426
The @llvm.get.dynamic.area.offset.* intrinsic family is used to get the offset
from native stack pointer to the address of the most recent dynamic alloca on
the caller's stack. These intrinsics are intendend for use in combination with
@llvm.stacksave and @llvm.restore to get a pointer to the most recent dynamic
alloca. This is useful, for example, for AddressSanitizer's stack unpoisoning
routines.
Patch by Max Ostapenko.
Differential Revision: http://reviews.llvm.org/D14983
llvm-svn: 254404
The Statistical Profiling Extension is an optional extension to
ARMv8.2-A. Since it is an optional extension, I have added the
FeatureSPE subtarget feature to control it. The assembler-visible parts
of this extension are the new "psb csync" instruction, which is
equivalent to "hint #17", and a number of system registers.
Differential Revision: http://reviews.llvm.org/D15021
llvm-svn: 254401
Add ARMv8.2-A to TargetParser, so that it can be used by the clang
command-line options and the .arch directive.
Most testing of this will be done in clang, checking that the
command-line options that this enables work.
Differential Revision: http://reviews.llvm.org/D15037
llvm-svn: 254400
This adds subtarget features for ARMv8.2-A, which builds on (and
requires the features from) ARMv8.1-A. Most assembler-visible features
of ARMv8.2-A are system instructions, and are all required parts of the
architecture, so just depend on the HasV8_2aOps subtarget feature.
There is also one large, optional feature, which adds 16-bit floating
point versions of all existing floating-point instructions (VFP and
SIMD), this is represented by the FeatureFullFP16 subtarget feature.
Differential Revision: http://reviews.llvm.org/D15036
llvm-svn: 254399
Not sure how to test this. I noticed by inspection in the isel tables where the same pattern tried to produce DIV and DIVR or SUB and SUBR.
llvm-svn: 254388
(This is the second attempt to submit this patch. The first caused two assertion
failures and was reverted. See https://llvm.org/bugs/show_bug.cgi?id=25687)
The patch in http://reviews.llvm.org/D13745 is broken into four parts:
1. New interfaces without functional changes (http://reviews.llvm.org/D13908).
2. Use new interfaces in SelectionDAG, while in other passes treat probabilities
as weights (http://reviews.llvm.org/D14361).
3. Use new interfaces in all other passes.
4. Remove old interfaces.
This patch is 3+4 above. In this patch, MBB won't provide weight-based
interfaces any more, which are totally replaced by probability-based ones.
The interface addSuccessor() is redesigned so that the default probability is
unknown. We allow unknown probabilities but don't allow using it together
with known probabilities in successor list. That is to say, we either have a
list of successors with all known probabilities, or all unknown
probabilities. In the latter case, we assume each successor has 1/N
probability where N is the number of successors. An assertion checks if the
user is attempting to add a successor with the disallowed mixed use as stated
above. This can help us catch many misuses.
All uses of weight-based interfaces are now updated to use probability-based
ones.
Differential revision: http://reviews.llvm.org/D14973
llvm-svn: 254377
and the follow-up r254356: "Fix a bug in MachineBlockPlacement that may cause assertion failure during BranchProbability construction."
Asserts were firing in Chromium builds. See PR25687.
llvm-svn: 254366
The patch in http://reviews.llvm.org/D13745 is broken into four parts:
1. New interfaces without functional changes (http://reviews.llvm.org/D13908).
2. Use new interfaces in SelectionDAG, while in other passes treat probabilities
as weights (http://reviews.llvm.org/D14361).
3. Use new interfaces in all other passes.
4. Remove old interfaces.
This patch is 3+4 above. In this patch, MBB won't provide weight-based
interfaces any more, which are totally replaced by probability-based ones.
The interface addSuccessor() is redesigned so that the default probability is
unknown. We allow unknown probabilities but don't allow using it together
with known probabilities in successor list. That is to say, we either have a
list of successors with all known probabilities, or all unknown
probabilities. In the latter case, we assume each successor has 1/N
probability where N is the number of successors. An assertion checks if the
user is attempting to add a successor with the disallowed mixed use as stated
above. This can help us catch many misuses.
All uses of weight-based interfaces are now updated to use probability-based
ones.
Differential revision: http://reviews.llvm.org/D14973
llvm-svn: 254348
We currently output FMA instructions on targets which support both FMA4 + FMA (i.e. later Bulldozer CPUS bdver2/bdver3/bdver4).
This patch flips this so FMA4 is preferred; this is for several reasons:
1 - FMA4 is non-destructive reducing the need for mov instructions.
2 - Its more straighforward to commute and fold inputs (although the recent work on FMA has reduced this difference).
3 - All supported targets have FMA4 performance equal or better to FMA - Piledriver (bdver2) in particular has half the throughput when executing FMA instructions.
Its looks like no future AMD processor lines will support FMA4 after the Bulldozer series so we're not causing problems for later CPUs.
Differential Revision: http://reviews.llvm.org/D14997
llvm-svn: 254339
If we know we have stack objects, we reserve the registers
that the private buffer resource and wave offset are passed
and use them directly.
If not, reserve the last 5 SGPRs just in case we need to spill.
After register allocation, try to pick the next available registers
instead of the last SGPRs, and then insert copies from the inputs
to the reserved registers in the progloue.
This also only selectively enables all of the input registers
which are really required instead of always enabling them.
llvm-svn: 254331
It does not work because of emergency stack slots.
This pass was supposed to eliminate dummy registers for the
spill instructions, but the register scavenger can introduce
more during PrologEpilogInserter, so some would end up
left behind if they were needed.
The potential for spilling the scratch resource descriptor
and offset register makes doing something like this
overly complicated. Reserve registers to use for the resource
descriptor and use them directly in eliminateFrameIndex.
Also removes creating another scratch resource descriptor
when directly selecting scratch MUBUF instructions.
The choice of which registers are reserved is temporary.
For now it attempts to pick the next available registers
after the user and system SGPRs.
llvm-svn: 254329
The MachineVerifier wants to check that the register operands of an
instruction belong to the instruction's register class. RIP-relative
control flow instructions violated this by referencing RIP. While this
was fixed for SysV, it was never fixed for Win64.
llvm-svn: 254315
Re-enable shrink wrapping for PPC64 Little Endian.
One minor modification to PPCFrameLowering::findScratchRegister was necessary to handle fall-thru blocks (blocks with no terminator) correctly.
Tested with all LLVM test, clang tests, and the self-hosting build, with no problems found.
PHabricator: http://reviews.llvm.org/D14778
llvm-svn: 254314
Value of offset operand for microMIPS BALC and BC instructions is currently shifted 2 bits, but it should be 1 bit.
Differential Revision: http://reviews.llvm.org/D14770
llvm-svn: 254296
We could already recognise shuffle(FSUB, FADD) -> ADDSUB, this allow us to recognise shuffle(FADD, FSUB) -> ADDSUB by commuting the shuffle mask prior to matching.
llvm-svn: 254259
This patch implements dynamic realignment of stack objects for targets
with a non-realigned stack pointer. Behaviour in FunctionLoweringInfo
is changed so that for a target that has StackRealignable set to
false, over-aligned static allocas are considered to be variable-sized
objects and are handled with DYNAMIC_STACKALLOC nodes.
It would be good to group aligned allocas into a single big alloca as
an optimization, but this is yet todo.
SystemZ benefits from this, due to its stack frame layout.
New tests SystemZ/alloca-03.ll for aligned allocas, and
SystemZ/alloca-04.ll for "no-realign-stack" attribute on functions.
Review and help from Ulrich Weigand and Hal Finkel.
llvm-svn: 254227
Summary:
Since this build attribute corresponds to a whole module, and
different functions in a module may differ in the optimizations
enabled for them, this attribute is emitted after all functions,
and only in the case that the optimization goals for all
functions match.
Reviewers: logan, hans
Subscribers: aemerson, rengolin, llvm-commits
Differential Revision: http://reviews.llvm.org/D14934
llvm-svn: 254201
ARMv8.2-A adds 16-bit floating point versions of all existing VFP
floating-point instructions. This is an optional extension, so all of
these instructions require the FeatureFullFP16 subtarget feature.
Most of these instructions are the same as the 32- and 64-bit versions,
but with the type field (bits 23-22) set to 0b11. Previously the top bit
of the size field was always 0, so the instruction classes only provided
a 1-bit size field, which I have widened to 2 bits.
Differential Revision: http://reviews.llvm.org/D15014
llvm-svn: 254198
Summary:
The bugs were:
* append, prepend, and balign were not tested
* balign takes a uimm2 not a uimm5.
* drotr32 was correctly implemented with a uimm5 but the tests expected
'52' to be valid.
* li/la were implemented with a uimm5 instead of simm32. simm32 isn't
completely correct either but I'll fix that when I get to simm32.
A notable omission are some of the shift instructions. Several of these
have been implemented using a single uimm6 instruction (rather than two
uimm5 instructions and a CodeGen-only uimm6 pseudo). These will be updated
in the uimm6 patch.
Reviewers: vkalintiris
Subscribers: llvm-commits, dsanders
Differential Revision: http://reviews.llvm.org/D14712
llvm-svn: 254164
ARMv8.2-A adds new variants of the "at" (address translate) system
instruction, which take the PSTATE.PAN bit (added in ARMv8.1-A). These
are a required part of ARMv8.2-A, so no additional subtarget features
are required.
Differential Revision: http://reviews.llvm.org/D15018
llvm-svn: 254159
Building on r253865 the crash is not limited to signed overflows.
Disable custom handling of unsigned 32-bit and 64-bit integer divide.
Add test cases for both 32-bit and 64-bit unsigned integer overflow.
llvm-svn: 254158
ARMv8.2-A adds a new PSTATE bit, PSTATE.UAO, which allows the LDTR/STTR
instructions to behave the same as LDR/STR with respect to execute-only
pages at higher privilege levels. New variants of the MSR/MRS
instructions are added to allow reading and writing this bit. It is a
required part of ARMv8.2-A, so no additional subtarget features are
required.
Differential Revision: http://reviews.llvm.org/D15020
llvm-svn: 254157
ARMv8.2-A adds the "dc cvap" instruction, which is a system instruction
that cleans caches to the point of persistence (for systems that have
persistent memory). It is a required part of ARMv8.2-A, so no additional
subtarget features are required.
Differential Revision: http://reviews.llvm.org/D15016
llvm-svn: 254156
ARMv8.2-A adds a new ID register, ID_A64MMFR2_EL1, which behaves in the
same way as ID_A64MMFR0_EL1 and ID_A64MMFR1_EL1. It is a required part
of ARMv8.2-A, so no additional subtarget features are required.
Differential Revision: http://reviews.llvm.org/D15017
llvm-svn: 254155
This adds subtarget features for ARMv8.2-A, which builds on (and
requires the features from) ARMv8.1-A. Most assembler-visible features
of ARMv8.2-A are system instructions, and are all required parts of the
architecture, so just depend on the HasV8_2aOps subtarget feature. There
is also one large, optional feature, which adds 16-bit floating point
versions of all existing floating-point instructions (VFP and SIMD),
this is represented by the FeatureFullFP16 subtarget feature.
Differential Revision: http://reviews.llvm.org/D15013
llvm-svn: 254154
generated for _mm_losd_s{s,d}() intrinsics and used in scalar FMAs generated
for FMA intrinsics _mm_f{madd,msub,nmadd,nmsub}_s{s,d}().
Reviewer: David Kreitzer
Differential Revision: http://reviews.llvm.org/D14762
llvm-svn: 254140
Summary:
This returns a pointer to the dispatch packet, which can be used to load
information about the kernel dispach.
Reviewers: arsenm
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D14898
llvm-svn: 254116
This is a temporary fix to address ICE on 2005-10-21-longlonggtu.ll.
The proper fix will be to use A2_tfrsi, but it will need more work to
teach all users of A2_tfrsi to also expect a floating-point operand.
llvm-svn: 254099
v2: added more tests, moved the SALU->VALU conversion to a separate function
It looks like it's not possible to get subregisters in the S_ABS lowering
code, and I don't feel like guessing without testing what the correct code
would look like.
llvm-svn: 254095
If virtual registers are created late, mappings to WebAssembly
registers need to be added explicitly. This patch adds a function
to do so and teaches WebAssemblyPeephole to use it. This fixes
an out-of-bounds access on the WARegs vector.
llvm-svn: 254094
Summary:
Many target lowerings copy-paste the code to test SDValues for known constants.
This code can instead be shared in SelectionDAG.cpp, and reused in the targets.
Reviewers: MatzeB, andreadb, tstellarAMD
Subscribers: arsenm, jyknight, llvm-commits
Differential Revision: http://reviews.llvm.org/D14945
llvm-svn: 254085
Instead of trying to move ARGUMENT instructions back up to the top after
they've been scheduled or sunk down, use a fake physical register to
create a liveness constraint that prevents ARGUMENT instructions from
moving down in the first place. This is still not entirely ideal, however
it is more robust than letting them move and moving them back.
llvm-svn: 254084
The e500mc does not actually support the mfocrf instruction; update the
processor definitions to reflect that fact.
Patch by Tom Rix (with some test-case cleanup by me).
llvm-svn: 254064
It was wrong order of operands (from intrinsic to DAG node).
I added more strict type specification for instruction selection.
Differential Revision: http://reviews.llvm.org/D14942
llvm-svn: 254059
This caused PR25607 and also caused Chromium to crash on start-up.
(Also had to update test/CodeGen/X86/avx-splat.ll, which was committed
after shrink wrapping was enabled.)
llvm-svn: 254044
X86 needs to use its own FMA opcodes, preventing the standard FNEG(FMA) pattern table recognition method used by other platforms. This patch adds support for lowering FNEG(FMA(X,Y,Z)) into a single suitably negated FMA instruction.
Fix for PR24364
Differential Revision: http://reviews.llvm.org/D14906
llvm-svn: 254016
This patch fixes the following issues:
1. Fix the return type of X86psadbw: it should not be the same type of inputs.
For vNi8 inputs the output should be vMi64, where M = N/8.
2. Fix the return type of int_x86_avx512_psad_bw_512 accordingly.
3. Fix the definiton of PSADBW, VPSADBW, and VPSADBWY accordingly.
4. Adjust the return type when building a DAG node of X86ISD::PSADBW type.
5. Update related tests.
Differential revision: http://reviews.llvm.org/D14897
llvm-svn: 254010
We had duplicated definitions for the same hardware '[v]movq' instructions. For example with SSE:
def MOVZQI2PQIrr : RS2I<0x6E, MRMSrcReg, (outs VR128:$dst), (ins GR64:$src),
"mov{d|q}\t{$src, $dst|$dst, $src}", // X86-64 only
[(set VR128:$dst, (v2i64 (X86vzmovl (v2i64 (scalar_to_vector GR64:$src)))))],
IIC_SSE_MOVDQ>;
def MOV64toPQIrr : RS2I<0x6E, MRMSrcReg, (outs VR128:$dst), (ins GR64:$src),
"mov{d|q}\t{$src, $dst|$dst, $src}",
[(set VR128:$dst, (v2i64 (scalar_to_vector GR64:$src)))],
IIC_SSE_MOVDQ>, Sched<[WriteMove]>;
As shown in the test case and PR25554:
https://llvm.org/bugs/show_bug.cgi?id=25554
This causes us to miss reusing an operand because later passes don't know these 'movq' are the same instruction.
This patch deletes one pair of these defs.
Sadly, this won't fix the original test case in the bug report. Something else is still broken.
Differential Revision: http://reviews.llvm.org/D14941
llvm-svn: 253988
The one regression in the builtin tests is in the read2 test which now
(again) has many extra copies, but this should be solved once the pass
is replaced with a DAG combine.
llvm-svn: 253974
The patch in http://reviews.llvm.org/D13745 is broken into four parts:
1. New interfaces without functional changes.
2. Use new interfaces in SelectionDAG, while in other passes treat probabilities
as weights.
3. Use new interfaces in all other passes.
4. Remove old interfaces.
This the second patch above. In this patch SelectionDAG starts to use
probability-based interfaces in MBB to add successors but other MC passes are
still using weight-based interfaces. Therefore, we need to maintain correct
weight list in MBB even when probability-based interfaces are used. This is
done by updating weight list in probability-based interfaces by treating the
numerator of probabilities as weights. This change affects many test cases
that check successor weight values. I will update those test cases once this
patch looks good to you.
Differential revision: http://reviews.llvm.org/D14361
llvm-svn: 253965
This patch detects the AVG pattern in vectorized code, which is simply
c = (a + b + 1) / 2, where a, b, and c have the same type which are vectors of
either unsigned i8 or unsigned i16. In the IR, i8/i16 will be promoted to
i32 before any arithmetic operations. The following IR shows such an example:
%1 = zext <N x i8> %a to <N x i32>
%2 = zext <N x i8> %b to <N x i32>
%3 = add nuw nsw <N x i32> %1, <i32 1 x N>
%4 = add nuw nsw <N x i32> %3, %2
%5 = lshr <N x i32> %N, <i32 1 x N>
%6 = trunc <N x i32> %5 to <N x i8>
and with this patch it will be converted to a X86ISD::AVG instruction.
The pattern recognition is done when combining instructions just before type
legalization during instruction selection. We do it here because after type
legalization, it is much more difficult to do pattern recognition based
on many instructions that are doing type conversions. Therefore, for
target-specific instructions (like X86ISD::AVG), we need to take care of type
legalization by ourselves. However, as X86ISD::AVG behaves similarly to
ISD::ADD, I am wondering if there is a way to legalize operands and result
types of X86ISD::AVG together with ISD::ADD. It seems that the current design
doesn't support this idea.
Tests are added for SSE2, AVX2, and AVX512BW and both i8 and i16 types of
variant vector sizes.
Differential revision: http://reviews.llvm.org/D14761
llvm-svn: 253952
Caller saved regs differ between SysV and Win64. Use the tail call available set to scavenge from.
Refactor register info to create new helper to get at tail call GPRs. Added a new test case for windows. Fixed up a number of X64 tests since now RCX is preferred over RDX on SysV.
Differential Revision: http://reviews.llvm.org/D14878
llvm-svn: 253927
With the '=' suffix now indicating which operands are output operands, it's
no longer as important to distinguish between a call's inputs and its outputs
using operand ordering, so we can go back to printing them in the normal order.
llvm-svn: 253925
This distinguishes input operands from output operands. This is something of
a syntactic experiment to see whether the mild amount of clutter this adds is
outweighed by the extra information it conveys to the reader.
llvm-svn: 253922
The current approach to using get_local and set_local is to use them
implicitly, as register uses and defs. Introduce new copy instructions
which are themselves no-ops except for the get_local and set_local
that they imply, so that we use get_local and set_local consistently.
llvm-svn: 253905
WebAssembly is currently using labels to end scopes, so for example a
loop scope looks like this:
BB0_0:
loop BB0_1
...
BB0_1:
with BB0_0 being the label of the first block not in the loop. This
requires that the label be printed even when it's only reachable via
fallthrough. To arrange this, insert a no-op LOOP_END instruction in
such cases at the end of the loop.
llvm-svn: 253901
Always starting blocks at the top of their containing loops works, but creates
unnecessarily deep nesting because it makes all blocks in a loop overlap.
Refine the BLOCK placement algorithm to start blocks at nearest common
dominating points instead, which significantly shrinks them and reduces
overlapping.
llvm-svn: 253876
Disable custom handling of signed 32-bit and 64-bit integer divide.
Add test cases for both 32-bit and 64-bit integer overflow crashes.
llvm-svn: 253865
ISERT_SUBVECTOR for i1 vectors may be done with shifts, when we insert into the lower part, or into the upper part, on into all-zero vector.
CONCAT_VECTORS uses ISERT_SUBVECTOR.
Differential Revision: http://reviews.llvm.org/D14815
llvm-svn: 253819
Extended DFA tablegen to:
- added "-debug-only dfa-emitter" support to llvm-tblgen
- defined CVI_PIPE* resources for the V60 vector coprocessor
- allow specification of multiple required resources
- supports ANDs of ORs
- e.g. [SLOT2, SLOT3], [CVI_MPY0, CVI_MPY1] means:
(SLOT2 OR SLOT3) AND (CVI_MPY0 OR CVI_MPY1)
- added support for combo resources
- allows specifying ORs of ANDs
- e.g. [CVI_XLSHF, CVI_MPY01] means:
(CVI_XLANE AND CVI_SHIFT) OR (CVI_MPY0 AND CVI_MPY1)
- increased DFA input size from 32-bit to 64-bit
- allows for a maximum of 4 AND'ed terms of 16 resources
- supported expressions now include:
expression => term [AND term] [AND term] [AND term]
term => resource [OR resource]*
resource => one_resource | combo_resource
combo_resource => (one_resource [AND one_resource]*)
Author: Dan Palermo <dpalermo@codeaurora.org>
kparzysz: Verified AMDGPU codegen to be unchanged on all llc
tests, except those dealing with instruction encodings.
Reapply the previous patch, this time without circular dependencies.
llvm-svn: 253793
Extended DFA tablegen to:
- added "-debug-only dfa-emitter" support to llvm-tblgen
- defined CVI_PIPE* resources for the V60 vector coprocessor
- allow specification of multiple required resources
- supports ANDs of ORs
- e.g. [SLOT2, SLOT3], [CVI_MPY0, CVI_MPY1] means:
(SLOT2 OR SLOT3) AND (CVI_MPY0 OR CVI_MPY1)
- added support for combo resources
- allows specifying ORs of ANDs
- e.g. [CVI_XLSHF, CVI_MPY01] means:
(CVI_XLANE AND CVI_SHIFT) OR (CVI_MPY0 AND CVI_MPY1)
- increased DFA input size from 32-bit to 64-bit
- allows for a maximum of 4 AND'ed terms of 16 resources
- supported expressions now include:
expression => term [AND term] [AND term] [AND term]
term => resource [OR resource]*
resource => one_resource | combo_resource
combo_resource => (one_resource [AND one_resource]*)
Author: Dan Palermo <dpalermo@codeaurora.org>
kparzysz: Verified AMDGPU codegen to be unchanged on all llc
tests, except those dealing with instruction encodings.
llvm-svn: 253790
It turns out we have a number of places that just grab the first type attached to a register class for various reasons. This is fine unless for some reason that type isn't legal on the current target, such as for SSE1 which doesn't support v16i8/v8i16/v4i32/v2i64 - all of which were included before 4f32 in the class.
Given that this is such a rare situation I've just re-ordered the types and placed the float types first.
Fix for PR16133
Differential Revision: http://reviews.llvm.org/D14787
llvm-svn: 253773
This change merges adjacent zero stores into a wider single store.
For example :
strh wzr, [x0]
strh wzr, [x0, #2]
becomes
str wzr, [x0]
This will fix PR25410.
llvm-svn: 253711
incorrect, as the chosen representative of the weak symbol may not live
with the code in question. Always indirect the access through the TOC
instead.
Patch by Kyle Butt!
llvm-svn: 253708
Summary:
This follows D14577 to treat ARMv6-J as an alias for ARMv6,
instead of an architecture in its own right.
The functional change is that the default CPU when targeting ARMv6-J
changes from arm1136j-s to arm1136jf-s, which is currently used as
the default CPU for ARMv6; both are, in fact, ARMv6-J CPUs.
The J-bit (Jazelle support) is irrelevant to LLVM, and it doesn't
affect code generation, attributes, optimizations, or anything else,
apart from selecting the default CPU.
Reviewers: rengolin, logan, compnerd
Subscribers: aemerson, llvm-commits, rengolin
Differential Revision: http://reviews.llvm.org/D14755
llvm-svn: 253675
WebAssembly does not have physical registers, so even if LLVM uses physical
registers like SP, they'll need to be lowered to virtual registers before
AsmPrinter time.
llvm-svn: 253644
coloring pass. Turn the logic into "look for an insert point and
then move things past the insert point".
No functional change intended.
llvm-svn: 253626
Summary :
* Rename isSmallTypeLdMerge() to isNarrowLoad().
* Rename NumSmallTypeMerged to NumNarrowTypePromoted.
* Use Subtarget defined as a member variable.
llvm-svn: 253587
This change extends r251438 to handle more narrow load promotions
including byte type, unscaled, and signed. For example, this change will
convert :
ldursh w1, [x0, #-2]
ldurh w2, [x0, #-4]
into
ldur w2, [x0, #-4]
asr w1, w2, #16
and w2, w2, #0xffff
llvm-svn: 253577
Copying one mask register to another under BW should be done with kmovq instruction, otherwise we can loose some bits.
Copying 8 bits under DQ may be done with kmovb.
Differential Revision: http://reviews.llvm.org/D14812
llvm-svn: 253563
The lowering patterns for X86ISD::VZEXT_MOVL for 128-bit to 256-bit vectors were just copying the lower xmm instead of actually masking off the first scalar using a blend.
Fix for PR25320.
Differential Revision: http://reviews.llvm.org/D14151
llvm-svn: 253561
Make X86AsmBackend generate smarter nops instead of a bunch of 0x90 for code alignment for CPUs which don't support long nop instructions.
Differential Revision: http://reviews.llvm.org/D14178
llvm-svn: 253557
The masked intrinsics support all integer and floating point data types. I added the pointer type to this list.
Added tests for CodeGen and for Loop Vectorizer.
Updated the Language Reference.
Differential Revision: http://reviews.llvm.org/D14150
llvm-svn: 253544
Note, this was reviewed (and more details are in) http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html
These intrinsics currently have an explicit alignment argument which is
required to be a constant integer. It represents the alignment of the
source and dest, and so must be the minimum of those.
This change allows source and dest to each have their own alignments
by using the alignment attribute on their arguments. The alignment
argument itself is removed.
There are a few places in the code for which the code needs to be
checked by an expert as to whether using only src/dest alignment is
safe. For those places, they currently take the minimum of src/dest
alignments which matches the current behaviour.
For example, code which used to read:
call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dest, i8* %src, i32 500, i32 8, i1 false)
will now read:
call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 8 %dest, i8* align 8 %src, i32 500, i1 false)
For out of tree owners, I was able to strip alignment from calls using sed by replacing:
(call.*llvm\.memset.*)i32\ [0-9]*\,\ i1 false\)
with:
$1i1 false)
and similarly for memmove and memcpy.
I then added back in alignment to test cases which needed it.
A similar commit will be made to clang which actually has many differences in alignment as now
IRBuilder can generate different source/dest alignments on calls.
In IRBuilder itself, a new argument was added. Instead of calling:
CreateMemCpy(Dst, Src, getInt64(Size), DstAlign, /* isVolatile */ false)
you now call
CreateMemCpy(Dst, Src, getInt64(Size), DstAlign, SrcAlign, /* isVolatile */ false)
There is a temporary class (IntegerAlignment) which takes the source alignment and rejects
implicit conversion from bool. This is to prevent isVolatile here from passing its default
parameter to the source alignment.
Note, changes in future can now be made to codegen. I didn't change anything here, but this
change should enable better memcpy code sequences.
Reviewed by Hal Finkel.
llvm-svn: 253511
It turns out we decide whether to use SjLj exceptions or some alternative in
two separate places in the backend, and they disagreed with each other. This
led to inconsistent code and is generally a terrible idea.
So make them consistent and add an assert that they *do* match (unfortunately
MCAsmInfo isn't available in opt, so it can't be used to initialise the CodeGen
version directly).
llvm-svn: 253502