This intrinsic used a typed pointer for a call target operand. This
change updates the operand to be an opaque pointer and updates all
pointers in all test files that use the intrinsic.
Differential revision: https://reviews.llvm.org/D131261
The SystemZ ABI says that 128 bit integers should be aligned to only 8 bytes.
Reviewed By: Ulrich Weigand, Nikita Popov
Differential Revision: https://reviews.llvm.org/D130900
This patch allows SimplifyDemandedBits to call SimplifyMultipleUseDemandedBits in cases where the ISD::SRL source operand has other uses, enabling us to peek through the shifted value if we don't demand all the bits/elts.
This is another step towards removing SelectionDAG::GetDemandedBits and just using TargetLowering::SimplifyMultipleUseDemandedBits.
There a few cases where we end up with extra register moves which I think we can accept in exchange for the increased ILP.
Differential Revision: https://reviews.llvm.org/D77804
This patch relands the f32 vararg assertion on z/OS fix that was reverted previously due to the testcase failing on non-z/OS platforms. It is now passing.
The tablegen lines that specify the XPLINK64 calling convention for promoting an f32 vararg to an f64 are effectively overwritten by the following tablegen line which bitcast an f64 vararg to an i64 (so that it can be used in the GPRs). Thus it becomes a bitcast from f32 to i64. We don't handle bitcasts for f32s and so this causes an assertion to be thrown.
We fix this by simplifying the tablegen lines to explicity show this behaviour, and allow the f32 in the bitcast case by first promoting it to an f64.
This PR adds support for creating leaf functions when there are no CSRs used, no function calls are made, no stack frame is acquired, and contain no try/catch/throw statements.
Reviewed By: uweigand
Differential Revision: https://reviews.llvm.org/D129687
This is similar to D125680, but for llvm.experimental.patchpoint
(instead of llvm.experimental.stackmap).
Differential review: https://reviews.llvm.org/D129268
Following some recent discussions, this changes the representation
of callbrs in IR. The current blockaddress arguments are replaced
with `!` label constraints that refer directly to callbr indirect
destinations:
; Before:
%res = callbr i8* asm "", "=r,r,i"(i8* %x, i8* blockaddress(@test8, %foo))
to label %asm.fallthrough [label %foo]
; After:
%res = callbr i8* asm "", "=r,r,!i"(i8* %x)
to label %asm.fallthrough [label %foo]
The benefit of this is that we can easily update the successors of
a callbr, without having to worry about also updating blockaddress
references. This should allow us to remove some limitations:
* Allow unrolling/peeling/rotation of callbr, or any other
clone-based optimizations
(https://github.com/llvm/llvm-project/issues/41834)
* Allow duplicate successors
(https://github.com/llvm/llvm-project/issues/45248)
This is just the IR representation change though, I will follow up
with patches to remove limtations in various transformation passes
that are no longer needed.
Differential Revision: https://reviews.llvm.org/D129288
Noticed while investigating the SystemZ regressions in D77804, prefer handling the knownbits analysis/simplification in the bitop nodes directly before falling back to SimplifyMultipleUseDemandedBits
When a function does a dynamic stack allocation, the function's stack
size (in the stack map) is reported as UINT64_MAX.
This change tests and documents this property.
Differential Revision: https://reviews.llvm.org/D128525
If an instruction is sunk into a loop then any kill flags on
operands declared outside the loop must be cleared as these
will be live for all loop iterations.
Fixes#46827
Reviewed By: MatzeB
Differential Revision: https://reviews.llvm.org/D126754
During lowering of memcmp/bcmp, the check for a size of 0 is done
in 2 different ways. In rare cases this can lead to a crash in
SystemZSelectionDAGInfo::EmitTargetCodeForMemcmp(). The root cause
is that SelectionDAGBuilder::visitMemCmpBCmpCall() checks for a
constant int value which is not yet evaluated. When the value is
turned into a SDValue, then the evaluation is done and results in
a ConstantSDNode. But EmitTargetCodeForMemcmp() expects the special
case of 0 length to be handled, which results in an assertion.
The fix is to turn the value into a SDValue, so that both functions
use the same check.
Reviewed By: uweigand
Differential Revision: https://reviews.llvm.org/D126900
When e.g. a VR64 register is spilled to a stack slot requiring a long
(20-bit) displacement, it is possible to use an FP opcode if the allocated
phys reg allows it. This eliminates the use of a separate LAY instruction.
Reviewed By: Ulrich Weigand
Differential Revision: https://reviews.llvm.org/D115406
This enabled opaque pointers by default in LLVM. The effect of this
is twofold:
* If IR that contains *neither* explicit ptr nor %T* types is passed
to tools, we will now use opaque pointer mode, unless
-opaque-pointers=0 has been explicitly passed.
* Users of LLVM as a library will now default to opaque pointers.
It is possible to opt-out by calling setOpaquePointers(false) on
LLVMContext.
A cmake option to toggle this default will not be provided. Frontends
or other tools that want to (temporarily) keep using typed pointers
should disable opaque pointers via LLVMContext.
Differential Revision: https://reviews.llvm.org/D126689
There are a few places where we use report_fatal_error when the input is broken.
Currently, this function always crashes LLVM with an abort signal, which
then triggers the backtrace printing code.
I think this is excessive, as wrong input shouldn't give a link to
LLVM's github issue URL and tell users to file a bug report.
We shouldn't print a stack trace either.
This patch changes report_fatal_error so it uses exit() rather than
abort() when its argument GenCrashDiag=false.
Reviewed by: nikic, MaskRay, RKSimon
Differential Revision: https://reviews.llvm.org/D126550
It appears that float support is complete, or at least, the stackmap records
emitted are not inconceivable (I must admit that I don't know about many of the
architectures under test here).
One curiosity, the SystemZ tests highlight an undocumented (or maybe incorrect)
quirk of the stackmap format: in the case of a Register record, the Offset or
SmallConstant field can encode a sub-register index! I've only ever seen this
field zero for Register entries up until now.
This diff adds tests that check the currently-working stackmap cases for i128.
This will help ensure no regressions are later introduced by D125680 (when
ready).
Note that i128 stackmap support is currently incomplete, so we cant test all
i128 functionality:
i128 constants >= 2^{63} crash LLVM
non-constant i128s crash LLVM
So this change tests only constant i128 operands of value < 2^{63}.
A couple of incorrect comments are also fixed.
Make sure to also handle extended value types to avoid crashing.
Resulting integers greater than 64 bits are not optimized (i128 is not a
legal type), and vectorizing seems to result in libcalls instead of just
scalarization.
Other extended vector types like <10 x float> are however now handled and
should result in vectorized conversions.
Reviewed By: Ulrich Weigand
Differential Revision: https://reviews.llvm.org/D125881
If we're using shift pairs to mask, then relax the one use limit if the shift amounts are equal - we'll only be generating a single AND node.
AArch64 has a couple of regressions due to this, so I've enforced the existing one use limit inside a AArch64TargetLowering::shouldFoldConstantShiftPairToMask callback.
Part of the work to fix the regressions in D77804
Differential Revision: https://reviews.llvm.org/D125607
Pulled out of D77804 as its going to be easier to address the regressions individually.
This patch allows SimplifyDemandedBits to call SimplifyMultipleUseDemandedBits in cases where the source operand has other uses, enabling us to peek through the shifted value if we don't demand all the bits/elts.
The lost RISCV gorc2 fold shouldn't be a problem - instcombine would have already destroyed that pattern - see https://github.com/llvm/llvm-project/issues/50553
Differential Revision: https://reviews.llvm.org/D124839
* Set MaxStoresPerMemcpy and MaxStoresPerMemset to 2.
* Optimize stores of replicated values in SystemZ::combineSTORE(). This
handles the now expanded memory operations and as well some other
pre-existing cases.
* Reject a big displacement in isLegalAddressingMode() for a vector type.
* Return true from shouldConsiderGEPOffsetSplit().
Reviewed By: Ulrich Weigand
Differential Revision: https://reviews.llvm.org/D122105
The result of sign_extend_inreg needs to have as many sign bits
as requested by the VT argument. The easiest way to guarantee this
is to fold it to 0.
SystemZ test was modified to avoid using undef.
Fixes https://github.com/llvm/llvm-project/issues/55178
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D124696
Bail out from cases where the result is a ConstantSDNode as it cannot be
selected and should typically not end up here.
Fixes: #55204
Reviewed By: Ulrich Weigand
The recently announced IBM z16 processor implements the architecture
already supported as "arch14" in LLVM. This patch adds support for
"z16" as an alternate architecture name for arch14.
This patch boosts the inlining threshold for a particular type of functions
that are using an incoming argument only as a memcpy source.
Review: Ulrich Weigand
Differential Revision: https://reviews.llvm.org/D121341
This patch adds support for inline assembly address operands using the "p"
constraint on X86 and SystemZ.
This was in fact broken on X86 (see example at
https://reviews.llvm.org/D110267, Nov 23).
These operands should probably be treated the same as memory operands by
CodeGenPrepare, which have been commented with "TODO" there.
Review: Xiang Zhang and Ulrich Weigand
Differential Revision: https://reviews.llvm.org/D122220
Place PersistentId declaration under #if LLVM_ENABLE_ABI_BREAKING_CHECKS to
reduce memory usage when it is not needed.
Differential Revision: https://reviews.llvm.org/D120714
This patch extends support for generating huge stack frames on 64-bit XPLINK by implementing the ABI-mandated call to the stack extension routine.
Reviewed By: uweigand
Differential Revision: https://reviews.llvm.org/D120450
Add support for va intrinsics for the XPLINK ABI.
Only the extended vararg variant, which uses a pointer to next
argument, is supported. The standard variant will build on this.
Reviewed By: uweigand
Differential Revision: https://reviews.llvm.org/D120148
The tablegen lines that specify the XPLINK64 calling convention for promoting an f32 vararg to an f64 are effectively overwritten by the following tablegen line which bitcast an f64 vararg to an i64 (so that it can be used in the GPRs). It becomes a bitcast from f32 to i64.
Since we don't handle a bitcast for f32s this caused an assertion.
With XPLINK, dynamic stack allocations requires calling
a runtime function, which allocates the stack memory,
moves the register save area, and returns the new
stack pointer.
Reviewed By: uweigand
Differential Revision: https://reviews.llvm.org/D119732
With XPLINK, a no-op with information about the call type is emitted
after each call instruction. Centralizing it has the advantage that it is
easy to document all cases, and it makes it easier to extend it later
(e.g. dynamic stack allocation, 32 bit mode).
Also add a test checking the call types emitted so far.
Reviewed By: uweigand
Differential Revision: https://reviews.llvm.org/D119557
Return false from ShouldShrinkFPConstant(), so that these constants are stored
in their full size on the constant pool, even if they could have been shrunk
and used with an extending load.
This is better since LD is faster than LDE, and it also enables reg/mem opcodes.
Review: Ulrich Weigand
Differential Revision: https://reviews.llvm.org/D117927
By reordering the objects on the stack frame after looking at the users, a
better utilization of displacement operands will result. This means less
needed Load Address instructions for the accessing of these objects.
This is important for very large functions where otherwise small changes
could cause a lot more/less accesses go out of range.
Note: this is not yet enabled for SystemZXPLINKFrameLowering, but should be.
Review: Ulrich Weigand
Differential Revision: https://reviews.llvm.org/D115690
In D115311, we're looking to modify clang to emit i constraints rather
than X constraints for callbr's indirect destinations. Prior to doing
so, update all of the existing tests in llvm/ to match.
Reviewed By: void, jyknight
Differential Revision: https://reviews.llvm.org/D115410
This patch adds support for prologue and epilogue generation for the z/OS target under the XPLINK64 ABI for functions with a stack size of less than 1048576 bytes (huge stack frames).
Reviewed By: uweigand
Differential Revision: https://reviews.llvm.org/D114457
This patch adds support for prologue and epilogue generation for
the z/OS target under the XPLINK64 ABI for functions with a stack
size of less than 1048576 bytes (huge stack frames).
Reviewed by: uweigand, Kai
Differential Revision: https://reviews.llvm.org/D114457
Memset with a constant length was implemented with a single store followed by
a series of MVC:s. This patch changes this so that one store of the byte is
emitted for each MVC, which avoids data dependencies between the MVCs. An
MVI/STC + MVC(len-1) is done for each block.
In addition, memset with a variable length is now also handled without a
libcall. Since the byte is first stored and then MVC is used from that
address, a length of two must now be subtracted instead of one for the loop
and EXRL. This requires an extra check for the one-byte case, which is
handled in a special block with just a single MVI/STC (like GCC).
Review: Ulrich Weigand
Differential Revision: https://reviews.llvm.org/D112004
Currently we create register mappings for registers used only once in current
MBB. For registers with multiple uses, when all the uses are in the current MBB,
we can also create mappings for them similarly according to the last use.
For example
%reg101 = ...
= ... reg101
%reg103 = ADD %reg101, %reg102
We can create mapping between %reg101 and %reg103.
Differential Revision: https://reviews.llvm.org/D113193
It was discovered that an extra register COPY remained when expanding a
(variable length) memory operation with a loop and there was another use of
the involved address register(s) afterwards.
A simple fix for this is to COPY the address registers before the loop and
use that new vreg instead.
Review: Ulrich Weigand
Differential Revision: https://reviews.llvm.org/D112065