This provides an optimized implementation of SADDO/SSUBO/UADDO/USUBO
as well as ADDCARRY/SUBCARRY on top of the new CC implementation.
In particular, multi-word arithmetic now uses UADDO/ADDCARRY instead
of the old ADDC/ADDE logic, which means we no longer need to use
"glue" links for those instructions. This also allows making full
use of the memory-based instructions like ALSI, which couldn't be
recognized due to limitations in the DAG matcher previously.
Also, the llvm.sadd.with.overflow et.al. intrinsincs now expand to
directly using the ADD instructions and checking for a CC 3 result.
llvm-svn: 331203
This patch moves formation of LOC-type instructions from (late)
IfConversion to the early if-conversion pass, and in some cases
additionally creates them directly from select instructions
during DAG instruction selection.
To make early if-conversion work, the patch implements the
canInsertSelect / insertSelect callbacks. It also implements
the commuteInstructionImpl and FoldImmediate callbacks to
enable generation of the full range of LOC instructions.
Finally, the patch adds support for all instructions of the
load-store-on-condition-2 facility, which allows using LOC
instructions also for high registers.
Due to the use of the GRX32 register class to enable high registers,
we now also have to handle the cases where there are still no single
hardware instructions (conditional move from a low register to a high
register or vice versa). These are converted back to a branch sequence
after register allocation. Since the expandRAPseudos callback is not
allowed to create new basic blocks, this requires a simple new pass,
modelled after the ARM/AArch64 ExpandPseudos pass.
Overall, this patch causes significantly more LOC-type instructions
to be used, and results in a measurable performance improvement.
llvm-svn: 288028
This adds a new SystemZ-specific intrinsic, llvm.s390.tdc.f(32|64|128),
which maps straight to the test data class instructions. A new IR pass
is added to recognize instructions that can be converted to TDC and
perform the necessary replacements.
Differential Revision: http://reviews.llvm.org/D21949
llvm-svn: 275016
This adds intrinsics to allow access to all of the z13 vector instructions.
Note that instructions whose semantics can be described by standard LLVM IR
do not get any intrinsics.
For each instructions whose semantics *cannot* (fully) be described, we
define an LLVM IR target-specific intrinsic that directly maps to this
instruction.
For instructions that also set the condition code, the LLVM IR intrinsic
returns the post-instruction CC value as a second result. Instruction
selection will attempt to detect code that compares that CC value against
constants and use the condition code directly instead.
Based on a patch by Richard Sandiford.
llvm-svn: 236527
This the first of a series of patches to add CodeGen support exploiting
the instructions of the z13 vector facility. This patch adds support
for the native integer vector types (v16i8, v8i16, v4i32, v2i64).
When the vector facility is present, we default to the new vector ABI.
This is characterized by two major differences:
- Vector types are passed/returned in vector registers
(except for unnamed arguments of a variable-argument list function).
- Vector types are at most 8-byte aligned.
The reason for the choice of 8-byte vector alignment is that the hardware
is able to efficiently load vectors at 8-byte alignment, and the ABI only
guarantees 8-byte alignment of the stack pointer, so requiring any higher
alignment for vectors would require dynamic stack re-alignment code.
However, for compatibility with old code that may use vector types, when
*not* using the vector facility, the old alignment rules (vector types
are naturally aligned) remain in use.
These alignment rules are not only implemented at the C language level
(implemented in clang), but also at the LLVM IR level. This is done
by selecting a different DataLayout string depending on whether the
vector ABI is in effect or not.
Based on a patch by Richard Sandiford.
llvm-svn: 236521
The current SystemZ back-end only supports the local-exec TLS access model.
This patch adds all required CodeGen support for the other TLS models, which
means in particular:
- Expand initial-exec TLS accesses by loading TLS offsets from the GOT
using @indntpoff relocations.
- Expand general-dynamic and local-dynamic accesses by generating the
appropriate calls to __tls_get_offset. Note that this routine has
a non-standard ABI and requires loading the GOT pointer into %r12,
so the patch also adds support for the GLOBAL_OFFSET_TABLE ISD node.
- Add a new platform-specific optimization pass to remove redundant
__tls_get_offset calls in the local-dynamic model (modeled after
the corresponding X86 pass).
- Add test cases verifying all access models and optimizations.
llvm-svn: 229654
Add header guards to files that were missing guards. Remove #endif comments
as they don't seem common in LLVM (we can easily add them back if we decide
they're useful)
Changes made by clang-tidy with minor tweaks.
llvm-svn: 215558
We previously used the default expansion to SELECT_CC, which in turn would
expand to "LHI; BRC; LHI". In most cases it's better to use an IPM-based
sequence instead.
llvm-svn: 192784
When loading immediates into a GR32, the port prefered LHI, followed by
LLILH or LLILL, followed by IILF. LHI and IILF are natural 32-bit
operations, but LLILH and LLILL also clear the upper 32 bits of the register.
This was represented as taking a 32-bit subreg of a 64-bit assignment.
Using subregs for something as simple as a move immediate was probably
a bad idea. Also, I have patches to add support for the high-word facility,
and we don't want something like LLILH and LLILL to stop the high word of
the same GPR from being used.
This patch therefore uses LHI and IILF to begin with and adds a late
machine-specific pass to use LLILH and LLILL if the other half of the
register is not live. The high-word patches extend this behavior to
IIHF, LLIHL and LLIHH.
No behavioral change intended.
llvm-svn: 191363
For now this just handles simple comparisons of an ANDed value with zero.
The CC value provides enough information to do any comparison for a
2-bit mask, and some nonzero comparisons with more populated masks,
but that's all future work.
llvm-svn: 189819
For now just handles simple comparisons of an ANDed value with zero.
The CC value provides enough information to do any comparison for a
2-bit mask, and some nonzero comparisons with more populated masks,
but that's all future work.
llvm-svn: 189469
SystemZTargetLowering::emitStringWrapper() previously loaded the character
into R0 before the loop and made R0 live on entry. I'd forgotten that
allocatable registers weren't allowed to be live across blocks at this stage,
and it confused LiveVariables enough to cause a miscompilation of f3 in
memchr-02.ll.
This patch instead loads R0 in the loop and leaves LICM to hoist it
after RA. This is actually what I'd tried originally, but I went for
the manual optimisation after noticing that R0 often wasn't being hoisted.
This bug forced me to go back and look at why, now fixed as r188774.
We should also try to optimize null checks so that they test the CC result
of the SRST directly. The select between null and the SRST GPR result could
then usually be deleted as dead.
llvm-svn: 188779
Perhaps predictably, doing comparison elimination on the fly during
SystemZLongBranch turned out to be a bad idea. The next patches make
use of LOAD AND TEST and BRANCH ON COUNT, both of which require
changes to earlier instructions.
No functionality change intended.
llvm-svn: 187718
System z branches have a mask to select which of the 4 CC values should
cause the branch to be taken. We can invert a branch by inverting the mask.
However, not all instructions can produce all 4 CC values, so inverting
the branch like this can lead to some oddities. For example, integer
comparisons only produce a CC of 0 (equal), 1 (less) or 2 (greater).
If an integer EQ is reversed to NE before instruction selection,
the branch will test for 1 or 2. If instead the branch is reversed
after instruction selection (by inverting the mask), it will test for
1, 2 or 3. Both are correct, but the second isn't really canonical.
This patch therefore keeps track of which CC values are possible
and uses this when inverting a mask.
Although this is mostly cosmestic, it fixes undefined behavior
for the CIJNLH in branch-08.ll. Another fix would have been
to mask out bit 0 when generating the fused compare and branch,
but the point of this patch is that we shouldn't need to do that
in the first place.
The patch also makes it easier to reuse CC results from other instructions.
llvm-svn: 187495
Before this change, the SystemZ backend would use BRCL for all branches
and only consider shortening them to BRC when generating an object file.
E.g. a branch on equal would use the JGE alias of BRCL in assembly output,
but might be shortened to the JE alias of BRC in ELF output. This was
a useful first step, but it had two problems:
(1) The z assembler isn't traditionally supposed to perform branch shortening
or branch relaxation. We followed this rule by not relaxing branches
in assembler input, but that meant that generating assembly code and
then assembling it would not produce the same result as going directly
to object code; the former would give long branches everywhere, whereas
the latter would use short branches where possible.
(2) Other useful branches, like COMPARE AND BRANCH, do not have long forms.
We would need to do something else before supporting them.
(Although COMPARE AND BRANCH does not change the condition codes,
the plan is to model COMPARE AND BRANCH as a CC-clobbering instruction
during codegen, so that we can safely lower it to a separate compare
and long branch where necessary. This is not a valid transformation
for the assembler proper to make.)
This patch therefore moves branch relaxation to a pre-emit pass.
For now, calls are still shortened from BRASL to BRAS by the assembler,
although this too is not really the traditional behaviour.
The first test takes about 1.5s to run, and there are likely to be
more tests in this vein once further branch types are added. The feeling
on IRC was that 1.5s is a bit much for a single test, so I've restricted
it to SystemZ hosts for now.
The patch exposes (and fixes) some typos in the main CodeGen/SystemZ tests.
A later patch will remove the {{g}}s from that directory.
llvm-svn: 182274
This adds the actual lib/Target/SystemZ target files necessary to
implement the SystemZ target. Note that at this point, the target
cannot yet be built since the configure bits are missing. Those
will be provided shortly by a follow-on patch.
This version of the patch incorporates feedback from reviews by
Chris Lattner and Anton Korobeynikov. Thanks to all reviewers!
Patch by Richard Sandiford.
llvm-svn: 181203