In the optimizer (GVN etc.) when eliminating redundant nodes with different
flags, the flags are ignored for the purposes of testing for congruence, and
then intersected for the purposes of producing a result that supports the union
of all the uses. This commit makes SelectionDAG's CSE do the same thing,
allowing it to CSE nodes in more cases. This fixes PR26063.
Differential Revision: http://reviews.llvm.org/D15957
llvm-svn: 257940
Summary:
Rename to getCatchSwitchParentPad, to make it more clear which ancestor
the "parent" in question is. Add a comment pointing out the key feature
that the returned pad indicates which funclet contains the successor
block.
Reviewers: rnk, andrew.w.kaylor, majnemer
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D16222
llvm-svn: 257933
Since r230276, we support an improved legalization for f64->f16,
which goes through a temporary f32, improving codegen when
f32->f16 is legal but not f64->f16. This requires unsafe-fp-math.
However, that legalization assumed that the second step, producing
a pseudo-softened f16, had type i16. That's not true on targets
with illegal i16, such as ARM.
Use the initial f64->f16 result type instead.
llvm-svn: 257794
We rely on HasOpaqueSPAdjustment not changing after we've calculated
things based on it. Things like whether or not we can use 'rep;movs' to
copy bytes around, that sort of thing. If it changes, invariants in the
backend will quietly break. This situation arose when we had a call to
memcpy *and* a COPY of the FLAGS register where we would attempt to
reference local variables using %esi, a register that was clobbered by
the 'rep;movs'.
This fixes PR26124.
llvm-svn: 257730
Previous implementation in http://reviews.llvm.org/D10522
created external references to __emutls_v.* variables.
Such references are inaccurate and cannot be handled by
all linkers, e.g. Android dynamic and gold linkers for aarch64.
Now a new LowerEmuTLS pass to go through all global variables,
and add emutls_v.* and emutls_t.* variables.
These __emutls* variables have the same linkage and
visibility as the associated user defined TLS variable.
Also removed old code that dump __emutls* variables in AsmPrinter.cpp,
and updated TLS unit tests.
Differential Revision: http://reviews.llvm.org/D15300
llvm-svn: 257718
Summary:
During legalization if i16, do not ASSERTZEXT the result of FP_TO_FP16.
Directly return an FP_TO_FP16 node with return type as the
promote-to-type of i16.
This patch also removes extraneous length check. This legalization
should be valid even if integer and float types are of different
lengths.
This patch breaks a hard-float test for fp16 args. The test is changed
to allow a vmov to zero-out the top bits, and also ensure that the
return value is in an FP register.
Reviewers: ab, jmolloy
Subscribers: srhines, llvm-commits
Differential Revision: http://reviews.llvm.org/D15438
llvm-svn: 257184
Unlike my comment in 257022 said, it turns out we do handle constant vectors in the statepoint lowering, but only because SelectionDAG doesn't actually produce constants for them. Add a couple of tests which show this working.
Also, add a triple to the same test file to hopefully fix a failing bot.
It turns out we do han
llvm-svn: 257025
Currently, we try to split vectors of pointers back into their component pointer elements during rewrite-statepoints-for-gc. This is less than ideal since presumably the vectorizer chose to vectorize for a reason. :) It's also been a source of bugs - in particular, the relocation logic as currently implemented was recently discovered to be wrong.
The alternate approach is to allow gc.relocates of vector-of-pointer type and update the backend to handle them. That's what this patch tries to do. This won't actually enable vector-of-pointers in practice - there are some RS4GC changes needed - but the lowering is standalone and testable so it makes sense to separate.
Note that there are some known cases around vector constants which this patch does not handle. Once this is in, I'll send another patch with individual fixes and test cases.
Differential Revision: http://reviews.llvm.org/D15632
llvm-svn: 257022
The functionality that calculateCatchReturnSuccessorColors provides was
once non-trivial: it was a computation layered on top of funclet
coloring.
These days, LLVM IR directly encodes what
calculateCatchReturnSuccessorColors computed, obsoleting the need for
it.
No functionality change is intended.
llvm-svn: 256965
In an inbounds getelementptr, when an index produces a constant non-negative
offset to add to the base, the add can be assumed to not have unsigned overflow.
This relies on the assumption that addresses can't occupy more than half the
address space, which isn't possible in C because it wouldn't be possible to
represent the difference between the start of the object and one-past-the-end
in a ptrdiff_t.
Setting the NoUnsignedWrap flag is theoretically useful in general, and is
specifically useful to the WebAssembly backend, since it permits stronger
constant offset folding.
Differential Revision: http://reviews.llvm.org/D15544
llvm-svn: 256890
Summary:
This commit renames GCRelocateOperands to GCRelocateInst and makes it an
intrinsic wrapper, similar to e.g. MemCpyInst. Also, all users of
GCRelocateOperands were changed to use the new intrinsic wrapper instead.
Reviewers: sanjoy, reames
Subscribers: reames, sanjoy, llvm-commits
Differential Revision: http://reviews.llvm.org/D15762
llvm-svn: 256811
We need a frame pointer if there is a push/pop sequence after the
prologue in order to unwind the stack. Scanning the instructions to
figure out if this happened made hasFP not constant-time which is a
violation of expectations. Let's compute this up-front and reuse that
computation when we need it.
llvm-svn: 256730
Pulled out the similar CONCAT_VECTORS creation code from the 2/3 operand getNode() calls (to handle all UNDEF and all BUILD_VECTOR cases). Added a similar handler to the general getNode() call as well.
llvm-svn: 256709
This adds support for the MCU psABI in a way different from r251223 and r251224,
basically reverting most of these two patches. The problem with the approach
taken in r251223/4 is that it only handled libcalls that originated from the backend.
However, the mid-end also inserts quite a few libcalls and assumes these use the
platform's default calling convention.
The previous patch tried to insert inregs when necessary both in the FE and,
somewhat hackily, in the CG. Instead, we now define a new default calling convention
for the MCU, which doesn't use inreg marking at all, similarly to what x86-64 does.
Differential Revision: http://reviews.llvm.org/D15054
llvm-svn: 256494
Teach the statepoint lowering code to emit Indirect stackmap entries for spill inserted by StatepointLowering (i.e. SelectionDAG), but Direct stackmap entries for in-IR allocas which represent manual stack slots. This is what the docs call for (http://llvm.org/docs/StackMaps.html#stack-map-format), but we've been emitting both as Direct. This was pointed out recently on the mailing list as a bug. It also blocks http://reviews.llvm.org/D15632 which extends the lowering to handle vector-of-pointers since only Indirect references can encode a variable sized slot.
To implement this, I introduced a new flag on the StackObject class used to maintian information about stack slots. I original considered (and prototyped in http://reviews.llvm.org/D15632), the idea of using the existing isSpillSlot flag, but end up deciding that was a bit too risky and that the cost of adding a new flag was low. Having the new flag will also allow us - in the future - to emit better comments in verbose assembly which indicate where a particular stack spill around a call comes from. (deopt, gc, regalloc).
Differential Revision: http://reviews.llvm.org/D15759
llvm-svn: 256352
Reasons:
1) The existing form was a form of false generality. None of the implemented GCStrategies use anything other than a type. Its becoming more and more clear we're going to need some type of strong GC pointer in the type system and we shouldn't pretend otherwise at this point.
2) The API was awkward when applied to vectors-of-pointers. The old one could have been made to work, but calling isGCManagedPointer(Ty->getScalarType()) is much cleaner than the Value alternatives.
3) The rewriting implementation effectively assumes the type based predicate as well. We should be consistent.
llvm-svn: 256312
Summary:
These were deprecated 11 months ago when a generic
llvm.experimental.gc.result intrinsic, which works for all types, was added.
Reviewers: sanjoy, reames
Subscribers: sanjoy, chenli, llvm-commits
Differential Revision: http://reviews.llvm.org/D15719
llvm-svn: 256262
Summary:
First up is instcombine, where in the dbg.declare -> dbg.value conversion,
the llvm.dbg.value needs to be called on the actual loaded value, rather
than the address (since the whole point of this transformation is to be
able to get rid of the alloca). Further, now that that's cleaned up, we
can remove a hack in the backend, that would add an implicit OP_deref if
the argument to dbg.value was an alloca. This stems from before the
existence of DIExpression and is no longer necessary since the deref can
be expressed explicitly.
Now, in order to make sure that the tests pass with this change, we need to
correct the printing of DEBUG_VALUE comments to take into account the
expression, which wasn't taken into account before.
Unfortunately, for both these changes, there were a number of incorrect
test cases (mostly the wrong number of DW_OP_derefs, but also a couple
where the test itself was broken more badly). aprantl and I have gone
through and adjusted these test case in order to make them pass with
these fixes and in some cases to make sure they're actually testing
what they are meant to test.
Reviewers: aprantl
Subscribers: dsanders
Differential Revision: http://reviews.llvm.org/D14186
llvm-svn: 256077
Update supportSplitCSR's interface to take machine function instead of the
calling convention.
Review comments for http://reviews.llvm.org/D15341
llvm-svn: 255818
Summary: This patch adds a check in visitLandingPad to see if landingpad's result type is token type. If so, do not create DAG nodes for its exception pointer and selector value. This patch enables the back end to handle landingpads of token type.
Reviewers: JosephTremoulet, majnemer, rnk
Subscribers: sanjoy, llvm-commits
Differential Revision: http://reviews.llvm.org/D15405
llvm-svn: 255749
It appears that neither compiler-rt nor the gnu soft-float libraries actually
implement these conversions. Instead of emitting calls to library functions
that don't exist, handle it similarly to the way we handle i8 -> float and
i16 -> float conversions: call the i32 library function, and adjust the type.
Differential Revision: http://reviews.llvm.org/D15151
llvm-svn: 255643
Full type legalizer that works with all vectors length - from 2 to 16, (i32, i64, float, double).
This intrinsic, for example
void @llvm.masked.scatter.v2f32(<2 x float>%data , <2 x float*>%ptrs , i32 align , <2 x i1>%mask )
requires type widening for data and type promotion for mask.
Differential Revision: http://reviews.llvm.org/D13633
llvm-svn: 255629
Part 1 was submitted in http://reviews.llvm.org/D15134.
Changes in this part:
* X86RegisterInfo.td, X86RecognizableInstr.cpp: Add FR128 register class.
* X86CallingConv.td: Pass f128 values in XMM registers or on stack.
* X86InstrCompiler.td, X86InstrInfo.td, X86InstrSSE.td:
Add instruction selection patterns for f128.
* X86ISelLowering.cpp:
When target has MMX registers, configure MVT::f128 in FR128RegClass,
with TypeSoftenFloat action, and custom actions for some opcodes.
Add missed cases of MVT::f128 in places that handle f32, f64, or vector types.
Add TODO comment to support f128 type in inline assembly code.
* SelectionDAGBuilder.cpp:
Fix infinite loop when f128 type can have
VT == TLI.getTypeToTransformTo(Ctx, VT).
* Add unit tests for x86-64 fp128 type.
Differential Revision: http://reviews.llvm.org/D11438
llvm-svn: 255558
It turns out that terminatepad gives little benefit over a cleanuppad
which calls the termination function. This is not sufficient to
implement fully generic filters but MSVC doesn't support them which
makes terminatepad a little over-designed.
Depends on D15478.
Differential Revision: http://reviews.llvm.org/D15479
llvm-svn: 255522
When FastISel fails to translate an instruction it hands off code
generation to SelectionDAG. Before it does so, it may have generated
local value instructions to feed phi nodes in successor blocks. These
instructions will then be generated again by SelectionDAG, causing
duplication and less efficient code, including extra spill
instructions.
Patch by Wolfgang Pieb!
Differential Revision: http://reviews.llvm.org/D11768
llvm-svn: 255520
This patch adds some missing calls to MBB::normalizeSuccProbs() in several
locations where it should be called. Those places are found by checking if the
sum of successors' probabilities is approximate one in MachineBlockPlacement
pass with some instrumented code (not in this patch).
Differential revision: http://reviews.llvm.org/D15259
llvm-svn: 255455
Summary:
Previously SelectionDAGBuilder asserted that the pointer operands of
memcpy / memset / memmove intrinsics are in address space < 256. This assert
implicitly assumed the X86 backend, where all address spaces < 256 are
equivalent to address space 0 from the code generator's point of view. On some
targets (R600 and NVPTX) several address spaces < 256 have a target-defined
meaning, so this assert made little sense for these targets.
This patch removes this wrong assertion and adds extra checks before lowering
these intrinsics to library calls. If a pointer operand can't be casted to
address space 0 without changing semantics, a fatal error is reported to the
user.
The new behavior should be valid for all targets that give address spaces != 0
a target-specified meaning (NVPTX, R600, X86). NVPTX lowers big or
variable-sized memory intrinsics before SelectionDAG construction. All other
memory intrinsics are inlined (the threshold is set very high for this target).
R600 doesn't support memcpy / memset / memmove library calls (previously the
illegal emission of a call to such library function triggered an error
somewhere in the code generator). X86 now emits inline loads and stores for
address spaces 256 and 257 up to the same threshold that is used for address
space 0 and reports a fatal error otherwise.
I call this a "partial fix" because there are still cases that can't be
lowered. A fatal error is reported in these cases.
Reviewers: arsenm, theraven, compnerd, hfinkel
Subscribers: hfinkel, llvm-commits, alex
Differential Revision: http://reviews.llvm.org/D7241
llvm-svn: 255441
While we have successfully implemented a funclet-oriented EH scheme on
top of LLVM IR, our scheme has some notable deficiencies:
- catchendpad and cleanupendpad are necessary in the current design
but they are difficult to explain to others, even to seasoned LLVM
experts.
- catchendpad and cleanupendpad are optimization barriers. They cannot
be split and force all potentially throwing call-sites to be invokes.
This has a noticable effect on the quality of our code generation.
- catchpad, while similar in some aspects to invoke, is fairly awkward.
It is unsplittable, starts a funclet, and has control flow to other
funclets.
- The nesting relationship between funclets is currently a property of
control flow edges. Because of this, we are forced to carefully
analyze the flow graph to see if there might potentially exist illegal
nesting among funclets. While we have logic to clone funclets when
they are illegally nested, it would be nicer if we had a
representation which forbade them upfront.
Let's clean this up a bit by doing the following:
- Instead, make catchpad more like cleanuppad and landingpad: no control
flow, just a bunch of simple operands; catchpad would be splittable.
- Introduce catchswitch, a control flow instruction designed to model
the constraints of funclet oriented EH.
- Make funclet scoping explicit by having funclet instructions consume
the token produced by the funclet which contains them.
- Remove catchendpad and cleanupendpad. Their presence can be inferred
implicitly using coloring information.
N.B. The state numbering code for the CLR has been updated but the
veracity of it's output cannot be spoken for. An expert should take a
look to make sure the results are reasonable.
Reviewers: rnk, JosephTremoulet, andrew.w.kaylor
Differential Revision: http://reviews.llvm.org/D15139
llvm-svn: 255422
After much discussion, ending here:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151123/315620.html
it has been decided that, instead of having the vectorizer directly generate
special absdiff and horizontal-add intrinsics, we'll recognize the relevant
reduction patterns during CodeGen. Accordingly, these intrinsics are not needed
(the operations they represent can be pattern matched, as is already done in
some backends). Thus, we're backing these out in favor of the current
development work.
r248483 - Codegen: Fix llvm.*absdiff semantic.
r242546 - [ARM] Use [SU]ABSDIFF nodes instead of intrinsics for VABD/VABA
r242545 - [AArch64] Use [SU]ABSDIFF nodes instead of intrinsics for ABD/ABA
r242409 - [Codegen] Add intrinsics 'absdiff' and corresponding SDNodes for absolute difference operation
llvm-svn: 255387
The access function has a short entry and a short exit, the initialization
block is only run the first time. To improve the performance, we want to
have a short frame at the entry and exit.
We explicitly handle most of the CSRs via copies. Only the CSRs that are not
handled via copies will be in CSR_SaveList.
Frame lowering and prologue/epilogue insertion will generate a short frame
in the entry and exit according to CSR_SaveList. The majority of the CSRs will
be handled by register allcoator. Register allocator will try to spill and
reload them in the initialization block.
We add CSRsViaCopy, it will be explicitly handled during lowering.
1> we first set FunctionLoweringInfo->SplitCSR if conditions are met (the target
supports it for the given calling convention and the function has only return
exits). We also call TLI->initializeSplitCSR to perform initialization.
2> we call TLI->insertCopiesSplitCSR to insert copies from CSRsViaCopy to
virtual registers at beginning of the entry block and copies from virtual
registers to CSRsViaCopy at beginning of the exit blocks.
3> we also need to make sure the explicit copies will not be eliminated.
rdar://problem/23557469
Differential Revision: http://reviews.llvm.org/D15340
llvm-svn: 255353
PR25763 demonstrated an issue with D14683 - vector comparison constant folding only works for i1 results, so we need to split off the sign-extension of the result to the required type. Luckily this can be done with the existing type legalization code.
llvm-svn: 255289
During selection DAG legalization, extractelement is replaced with a load
instruction. To do this, a temporary store to the stack is used unless an
existing store is found that can be re-used.
If re-using a store, the chain going out of the store must be replaced by
the one going out of the new load (this ensures that any stores that must
take place after the store happens after the load, else the value might
be overwritten before it is loaded).
The problem is, if the extractelement index is dependent on the store
replacing the chain will introduce a cycle in the selection DAG (the load
uses the index, and by replacing the chain we will make the index dependent
on the load).
To fix this, if the index is dependent on the store, the store is skipped.
This is conservative as we may end up creating an unnecessary extra store
to the stack. However, the situation is not expected to occur very often.
Differential Revision: http://reviews.llvm.org/D15330
llvm-svn: 255114
Patterns were missing for KNL target for <8 x i32>, <8 x float> masked load/store.
This intrinsic comes with all legal types:
<8 x float> @llvm.masked.load.v8f32(<8 x float>* %addr, i32 align, <8 x i1> %mask, <8 x float> %passThru),
but still requires lowering, because VMASKMOVPS, VMASKMOVDQU32 work with 512-bit vectors only.
All data operands should be widened to 512-bit vector.
The mask operand should be widened to v16i1 with zeroes.
Differential Revision: http://reviews.llvm.org/D15265
llvm-svn: 254909
This is a revised version of r254655 which uses a Printable wrapper
class to avoid ambiguous overload problems.
Differential Revision: http://reviews.llvm.org/D14348
llvm-svn: 254681
Almost all these changes are conditioned and only apply to the new
x86-64 f128 type configuration, which will be enabled in a follow up
patch. They are required together to make new f128 work. If there is
any error, we should fix or revert them as a whole.
These changes should have no impact to current configurations.
* Relax type legalization checks to accept new f128 type configuration,
whose TypeAction is TypeSoftenFloat, not TypeLegal, but also has
TLI.isTypeLegal true.
* Relax GetSoftenedFloat to return in some cases f128 type SDValue,
which is TLI.isTypeLegal but not "softened" to i128 node.
* Allow customized FABS, FNEG, FCOPYSIGN on new f128 type configuration,
to generate optimized bitwise operators for libm functions.
* Enhance related Lower* functions to handle f128 type.
* Enhance DAGTypeLegalizer::run, SoftenFloatResult, and related functions
to keep new f128 type in register, and convert f128 operators to library calls.
* Fix Combiner, Emitter, Legalizer routines that did not handle f128 type.
* Add ExpandConstant to handle i128 constants, ExpandNode
to handle ISD::Constant node.
* Add one more parameter to getCommonSubClass and firstCommonClass,
to guarantee that returned common sub class will contain the specified
simple value type.
This extra parameter is used by EmitCopyFromReg in InstrEmitter.cpp.
* Fix infinite loop in getTypeLegalizationCost when f128 is the value type.
* Fix printOperand to handle null operand.
* Enhance ISD::BITCAST node to handle f128 constant.
* Expand new f128 type for BR_CC, SELECT_CC, SELECT, SETCC nodes.
* Enhance X86AsmPrinter to emit f128 values in comments.
Differential Revision: http://reviews.llvm.org/D15134
llvm-svn: 254653
vector.resize() is significantly slower than memset in many STLs
and the cost of initializing these vectors is significant on targets
with many registers. Since we don't need the overhead of a vector,
use a simple unique_ptr instead.
llvm-svn: 254526
Cost calculation for vector GEP failed with due to invalid cast to GEP index operand.
The bug is fixed, added a test.
http://reviews.llvm.org/D14976
llvm-svn: 254408
The @llvm.get.dynamic.area.offset.* intrinsic family is used to get the offset
from native stack pointer to the address of the most recent dynamic alloca on
the caller's stack. These intrinsics are intendend for use in combination with
@llvm.stacksave and @llvm.restore to get a pointer to the most recent dynamic
alloca. This is useful, for example, for AddressSanitizer's stack unpoisoning
routines.
Patch by Max Ostapenko.
Differential Revision: http://reviews.llvm.org/D14983
llvm-svn: 254404
SDAG currently can emit debug location for function parameters when
an llvm.dbg.declare points to either a function argument SSA temp,
or to an AllocaInst. This change extends this logic by adding a
fallback case when neither of the above is true.
This is required for SafeStack, which may copy the contents of a
byval function argument into something that is not an alloca, and
then describe the target as the new location of the said argument.
llvm-svn: 254352
This patch implements dynamic realignment of stack objects for targets
with a non-realigned stack pointer. Behaviour in FunctionLoweringInfo
is changed so that for a target that has StackRealignable set to
false, over-aligned static allocas are considered to be variable-sized
objects and are handled with DYNAMIC_STACKALLOC nodes.
It would be good to group aligned allocas into a single big alloca as
an optimization, but this is yet todo.
SystemZ benefits from this, due to its stack frame layout.
New tests SystemZ/alloca-03.ll for aligned allocas, and
SystemZ/alloca-04.ll for "no-realign-stack" attribute on functions.
Review and help from Ulrich Weigand and Hal Finkel.
llvm-svn: 254227
Summary:
Many target lowerings copy-paste the code to test SDValues for known constants.
This code can instead be shared in SelectionDAG.cpp, and reused in the targets.
Reviewers: MatzeB, andreadb, tstellarAMD
Subscribers: arsenm, jyknight, llvm-commits
Differential Revision: http://reviews.llvm.org/D14945
llvm-svn: 254085
to a simple type when lowering a truncating store of a vector type. In this
case for an EVT we'll return Expand as we should in all of the cases anyhow.
The testcase triggered at the one in VectorLegalizer::LegalizeOp, inspection
found the rest.
llvm-svn: 254061
The patch in http://reviews.llvm.org/D13745 is broken into four parts:
1. New interfaces without functional changes.
2. Use new interfaces in SelectionDAG, while in other passes treat probabilities
as weights.
3. Use new interfaces in all other passes.
4. Remove old interfaces.
This the second patch above. In this patch SelectionDAG starts to use
probability-based interfaces in MBB to add successors but other MC passes are
still using weight-based interfaces. Therefore, we need to maintain correct
weight list in MBB even when probability-based interfaces are used. This is
done by updating weight list in probability-based interfaces by treating the
numerator of probabilities as weights. This change affects many test cases
that check successor weight values. I will update those test cases once this
patch looks good to you.
Differential revision: http://reviews.llvm.org/D14361
llvm-svn: 253965
When MergeConsecutiveStores() combines two loads and two stores into
wider loads and stores, the chain users of both of the original loads
must be transfered to the new load, because it may be that a chain
user only depends on one of the loads.
New test case: test/CodeGen/SystemZ/dag-combine-01.ll
Reviewed by James Y Knight.
Bugzilla: https://llvm.org/bugs/show_bug.cgi?id=25310#c6
llvm-svn: 253779
Note, this was reviewed (and more details are in) http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html
These intrinsics currently have an explicit alignment argument which is
required to be a constant integer. It represents the alignment of the
source and dest, and so must be the minimum of those.
This change allows source and dest to each have their own alignments
by using the alignment attribute on their arguments. The alignment
argument itself is removed.
There are a few places in the code for which the code needs to be
checked by an expert as to whether using only src/dest alignment is
safe. For those places, they currently take the minimum of src/dest
alignments which matches the current behaviour.
For example, code which used to read:
call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dest, i8* %src, i32 500, i32 8, i1 false)
will now read:
call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 8 %dest, i8* align 8 %src, i32 500, i1 false)
For out of tree owners, I was able to strip alignment from calls using sed by replacing:
(call.*llvm\.memset.*)i32\ [0-9]*\,\ i1 false\)
with:
$1i1 false)
and similarly for memmove and memcpy.
I then added back in alignment to test cases which needed it.
A similar commit will be made to clang which actually has many differences in alignment as now
IRBuilder can generate different source/dest alignments on calls.
In IRBuilder itself, a new argument was added. Instead of calling:
CreateMemCpy(Dst, Src, getInt64(Size), DstAlign, /* isVolatile */ false)
you now call
CreateMemCpy(Dst, Src, getInt64(Size), DstAlign, SrcAlign, /* isVolatile */ false)
There is a temporary class (IntegerAlignment) which takes the source alignment and rejects
implicit conversion from bool. This is to prevent isVolatile here from passing its default
parameter to the source alignment.
Note, changes in future can now be made to codegen. I didn't change anything here, but this
change should enable better memcpy code sequences.
Reviewed by Hal Finkel.
llvm-svn: 253511
This patch adds support for vector constant folding of integer/float comparisons.
This requires FoldConstantVectorArithmetic to support scalar constant operands (in this case ISD::CONDCASE). In future we should be able to support other scalar constant types as necessary (and possibly start calling FoldConstantVectorArithmetic for all node creations)
Differential Revision: http://reviews.llvm.org/D14683
llvm-svn: 253504
This change introduces an instrumentation intrinsic instruction for
value profiling purposes, the lowering of the instrumentation intrinsic
and raw reader updates. The raw profile data files for llvm-profdata
testing are updated.
llvm-svn: 253484
The virtual register containing the address for returned value on
stack should in the DAG be represented with a CopyFromReg node and not
a Register node. Otherwise, InstrEmitter will not make sure that it
ends up in the right register class for the target instruction.
SystemZ needs this, becuause the reg class for address registers is a
subset of the general 64 bit register class.
test/SystemZ/CodeGen/args-07.ll and args-04.ll updated to run with
-verify-machineinstrs.
Reviewed by Hal Finkel.
llvm-svn: 253461
Summary:
Now that there is a one-to-one mapping from MachineFunction to
WinEHFuncInfo, we don't need to use a DenseMap to select the right
WinEHFuncInfo for the current funclet.
The main challenge here is that X86WinEHStatePass is an IR pass that
doesn't have access to the MachineFunction. I gave it its own
WinEHFuncInfo object that it uses to calculate state numbers, which it
then throws away. As long as nobody creates or removes EH pads between
this pass and SDAG construction, we will get the same state numbers.
The other thing X86WinEHStatePass does is to mark the EH registration
node. Instead of communicating which alloca was the registration through
WinEHFuncInfo, I added the llvm.x86.seh.ehregnode intrinsic. This
intrinsic generates no code and simply marks the alloca in use.
Reviewers: JCTremoulet
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D14668
llvm-svn: 253378
Statepoint lowering currently expects that the target method of a
statepoint only defines a single value. This precludes using
statepoints with ABIs that return values in multiple registers
(e.g. the SysV AMD64 ABI). This change adds support for lowering
statepoints with mutli-def targets.
llvm-svn: 253339
Richard Trieu noted that UBSan detected an overflowing shift, and the obvious fix caused a crash.
What was happening was that the shiftee (1U) was indeed too small for the possible range of shifts it had to handle, but also we were using "VT.getSizeInBits()" to get the maximum type bitwidth, but we wanted "VT.getScalarSizeInBits()" to get the vector lane size instead of the entire vector size.
Use an APInt for the shift and VT.getScalarSizeInBits().
llvm-svn: 253023
This reverts commit r252565.
This also includes the revert of the commit mentioned below in order to
avoid breaking tests in AMDGPU:
Revert "AMDGPU: Set isAllocatable = 0 on VS_32/VS_64"
This reverts commit r252674.
llvm-svn: 252956
Several backends have instructions to reverse the order of bits in an integer. Conceptually matching such patterns is similar to @llvm.bswap, and it was mentioned in http://reviews.llvm.org/D14234 that it would be best if these patterns were matched in InstCombine instead of reimplemented in every different target.
This patch introduces an intrinsic @llvm.bitreverse.i* that operates similarly to @llvm.bswap. For plumbing purposes there is also a new ISD node ISD::BITREVERSE, with simple expansion and promotion support.
The intention is that InstCombine's BSWAP detection logic will be extended to support BITREVERSE too, and @llvm.bitreverse intrinsics emitted (if the backend supports lowering it efficiently).
llvm-svn: 252878
- Factor out code to query and modify the sign bit of a floatingpoint
value as an integer. This also works if none of the targets integer
types is big enough to hold all bits of the floatingpoint value.
- Legalize FABS(x) as FCOPYSIGN(x, 0.0) if FCOPYSIGN is available,
otherwise perform bit manipulation on the sign bit. The previous code
used "x >u 0 ? x : -x" which is incorrect for x being -0.0! It also
takes 34 instructions on ARM Cortex-M4. With this patch we only
require 5:
vldr d0, LCPI0_0
vmov r2, r3, d0
lsrs r2, r3, #31
bfi r1, r2, #31, #1
bx lr
(This could be further improved if the compiler would recognize that
r2, r3 is zero).
- Only lower FCOPYSIGN(x, y) = sign(x) ? -FABS(x) : FABS(x) if FABS is
available otherwise perform bit manipulation on the sign bit.
- Perform the sign(x) test by masking out the sign bit and comparing
with 0 rather than shifting the sign bit to the highest position and
testing for "<s 0". For x86 copysignl (on 80bit values) this gets us:
testl $32768, %eax
rather than:
shlq $48, %rax
sets %al
testb %al, %al
Differential Revision: http://reviews.llvm.org/D11172
llvm-svn: 252839
Summary:
Don't fold
(zext (and (load x), cst)) -> (and (zextload x), (zext cst))
if
(and (load x) cst)
will match as a zextload already and has additional users.
For example, the following IR:
%load = load i32, i32* %ptr, align 8
%load16 = and i32 %load, 65535
%load64 = zext i32 %load16 to i64
store i32 %load16, i32* %dst1, align 4
store i64 %load64, i64* %dst2, align 8
used to produce the following aarch64 code:
ldr w8, [x0]
and w9, w8, #0xffff
and x8, x8, #0xffff
str w9, [x1]
str x8, [x2]
but with this change produces the following aarch64 code:
ldrh w8, [x0]
str w8, [x1]
str x8, [x2]
Reviewers: resistor, mcrosier
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D14340
llvm-svn: 252789
This allows avoiding the default Expand behavior which
introduces stack usage. Bitcast the scalar and replace
the missing elements with undef.
This is covered by existing tests and used by a future
commit which makes 64-bit vectors legal types on AMDGPU.
llvm-svn: 252632
This is for AMDGPU to implement v2i64 extract as extract of
half of a v4i32.
This is covered by existing tests and used by a future
commit which makes 64-bit vectors legal types on AMDGPU.
llvm-svn: 252630
I'm not sure what the point of this was. I'm not sure why
you would ever define an instruction that produces an unallocatable
register class. No tests fail with this removed, and it seems like
it should be a verifier error to define such an instruction.
This was problematic for AMDGPU because it would make bad decisions
by arbitrarily changing the register class when unsetting isAllocatable
for VS_32/VS_64, which is currently set as a workaround to this problem.
AMDGPU uses the VS_32/VS_64 register classes to represent operands which
can use either VGPRs or SGPRs. When isAllocatable is unset for these,
this would need to pick either the SGPR or VGPR class and insert either
a copy we don't want, or an illegal copy we would need to deal with
later. A semi-arbitrary register class ordering decision is made in tablegen,
which resulted in always picking a VGPR class because it happens to have
more registers than the SGPR register class. We really just want to
use whatever register class the original register had.
llvm-svn: 252565
We don't currently have any runtime library functions for operations on
f16 values (other than conversions to and from f32 and f64), so we
should always promote it to f32, even if that is not a legal type. In
that case, the f32 values would be softened to f32 library calls.
SoftenFloatRes_FP_EXTEND now needs to check the promoted operand's type,
as it may ne a no-op or require a different library call.
getCopyFromParts and getCopyToParts now need to cope with a
floating-point value stored in a larger integer part, as is the case for
any target that needs to store an f16 value in a 32-bit integer
register.
Differential Revision: http://reviews.llvm.org/D12856
llvm-svn: 252459
Summary:
The CLR's personality routine passes these in rdx/edx, not rax/eax.
Make getExceptionPointerRegister a virtual method parameterized by
personality function to allow making this distinction.
Similarly make getExceptionSelectorRegister a virtual method parameterized
by personality function, for symmetry.
Reviewers: pgavlin, majnemer, rnk
Subscribers: jyknight, dsanders, llvm-commits
Differential Revision: http://reviews.llvm.org/D14344
llvm-svn: 252383
We already had a test for this for 32-bit SEH catchpads, but those don't
actually create funclets. We had a bug that only appeared in funclet
prologues, where we would establish EBP and ESI as our FP and BP, and
then downstream prologue code would overwrite them.
While I was at it, I fixed Win64+funclets+stackrealign. This issue
doesn't come up as often there due to the ABI requring 16 byte stack
alignment, but now we can rest easy that AVX and WinEH will work well
together =P.
llvm-svn: 252210
There is no point in having invoke safepoints handled differently than the
call safepoints. All relevant decisions could be made by looking at whether
or not gc.result and gc.relocate lay in a same basic block. This change will
allow to lower call safepoints with relocates and results in a different
basic blocks. See test case for example.
Differential Revision: http://reviews.llvm.org/D14158
llvm-svn: 252028