Summary:
The SVE masked load and store intrinsics introduced in D76688 rely on
common llvm.masked.load/store nodes. This patch creates new ISD nodes
for LD1(S) & ST1 to remove this dependency.
Additionally, this adds support for sign & zero extending
loads and truncating stores.
Reviewers: sdesmalen, efriedma, cameron.mcinally, c-rhodes, rengolin
Reviewed By: efriedma
Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, danielkiss, andwar, cfe-commits, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78204
This reverts commit 17b1869b72.
It is an attempt to fix the failure reported at
The patch differs from the original one reviwed at
https://reviews.llvm.org/D77435 only for the use of the std::make_tuple
in building the return value of `findAddrModeSVELoadStore`:
- return {IsRegReg ? Opc_rr : Opc_ri, NewBase, NewOffset};
+ return std::make_tuple(IsRegReg ? Opc_rr : Opc_ri, NewBase,
the original patch submitted at
fc4e954ed5
was failing the following build:
http://lab.llvm.org:8011/builders/clang-armv7-linux-build-cache/builds/29420/
with error:
/home/buildslave/buildslave/clang-armv7-linux-build-cache/llvm/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
/home/buildslave/buildslave/clang-armv7-linux-build-cache/llvm/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp:1439:10:
error: chosen constructor is explicit in copy-initialization
return {IsRegReg ? Opc_rr : Opc_ri, NewBase, NewOffset};
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/bin/../lib/gcc/arm-linux-gnueabihf/5.4.0/../../../../include/c++/5.4.0/tuple:479:19:
note: explicit constructor declared here
constexpr tuple(_UElements&&... __elements)
^
1 error generated.
Summary:
This change is fixing an issue where the dagcombine incorrectly used an addressing mode with scaled offsets (indices), instead of unscaled offsets.
Those addressing modes do not exist for `prfh` , `prfw` and `prfd`, hence we can reuse `prfb` because that has unscaled offsets, and because the pseudo-code in the XML spec suggests that the element size is not used for the amount of data that is prefetched by the instruction.
FWIW, GCC also emits a `prfb` for these cases.
Reviewers: sdesmalen, andwar, rengolin
Reviewed By: sdesmalen
Subscribers: tschuett, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78069
Summary:
Fixed wrong conditions for generating (S[LR]I X, Y, C2) from
(or (and X, BvecC1), (lsl Y, C2)) and added ISel nodes to lower to S[LR]I. The
optimisation is also enabled by default now.
Subscribers: kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77387
Summary:
The renaming is necessary to make the naming scheme uniform with other
gather/scatter load/stores SVE intrinsics.
The naming of variables and functions have been adapted to make it
explicit whether we are dealing with a scalar offset (which is
unscaled) or an index (which is scaled according to the data type of
the lanes of the vector).
Reviewers: andwar, sdesmalen, rengolin
Reviewed By: andwar
Subscribers: tschuett, hiraditya, arphaman, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77839
Summary:
Remove usages of asserting vector getters in Type in preparation for the
VectorType refactor. The existence of these functions complicates the
refactor while adding little value.
Reviewers: mcrosier, efriedma, sdesmalen
Reviewed By: efriedma
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77269
Instead, represent the mask as out-of-line data in the instruction. This
should be more efficient in the places that currently use
getShuffleVector(), and paves the way for further changes to add new
shuffles for scalable vectors.
This doesn't change the syntax in textual IR. And I don't currently plan
to change the bitcode encoding in this patch, although we'll probably
need to do something once we extend shufflevector for scalable types.
I expect that once this is finished, we can then replace the raw "mask"
with something more appropriate for scalable vectors. Not sure exactly
what this looks like at the moment, but there are a few different ways
we could handle it. Maybe we could try to describe specific shuffles.
Or maybe we could define it in terms of a function to convert a fixed-length
array into an appropriate scalable vector, using a "step", or something
like that.
Differential Revision: https://reviews.llvm.org/D72467
On Darwin these need to be selected into a function call for the TLS
address lookup. As a result, they can't be moved below a physreg write,
which happens in call sequences. In the long term, we should have some
mechanism in the localizer to prevent localizing into target-specific
atomic instruction sequences.
rdar://60056248
Differential Revision: https://reviews.llvm.org/D76652
Summary:
In order to keep the names consistent with other SVE gather loads, the
intrinsics for gather prefetch are renamed as follows:
* @llvm.aarch64.sve.gather.prfb -> @llvm.aarch64.sve.prfb.gather
Reviewed by: fpetrogalli
Differential Revision: https://reviews.llvm.org/D76421
Summary:
This fixes a discrepancy between the non-temporal loads/store
intrinsics and other SVE load intrinsics (such as nf/ff), so
that Clang can use the same code to generate these intrinsics.
Reviewers: andwar, kmclaughlin, rengolin, efriedma
Reviewed By: efriedma
Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76237
Summary:
This intrinsic implements the unpredicated duplication of scalar values
and is mapped to (through ISD::SPLAT_VECTOR):
* DUP <Zd>.<T>, #<imm>
* DUP <Zd>.<T>, <R><n|SP>
Reviewed by: sdesmalen
Differential Revision: https://reviews.llvm.org/D75900
Summary:
This patch adds the following intrinsics for non-temporal gather loads
and scatter stores:
* aarch64_sve_ldnt1_gather_index
* aarch64_sve_stnt1_scatter_index
These intrinsics implement the "scalar + vector of indices" addressing
mode.
As opposed to regular and first-faulting gathers/scatters, there's no
instruction that would take indices and then scale them. Instead, the
indices for non-temporal gathers/scatters are scaled before the
intrinsics are lowered to `ldnt1` instructions.
The new ISD nodes, GLDNT1_INDEX and SSTNT1_INDEX, are only used as
placeholders so that we can easily identify the cases implemented in
this patch in performGatherLoadCombine and performScatterStoreCombined.
Once encountered, they are replaced with:
* GLDNT1_INDEX -> SPLAT_VECTOR + SHL + GLDNT1
* SSTNT1_INDEX -> SPLAT_VECTOR + SHL + SSTNT1
The patterns for lowering ISD::SHL for scalable vectors (required by
this patch) were missing, so these are added too.
Reviewed By: sdesmalen
Differential Revision: https://reviews.llvm.org/D75601
Summary:
This patch adds the following LLVM IR intrinsics for SVE:
1. non-temporal gather loads
* @llvm.aarch64.sve.ldnt1.gather
* @llvm.aarch64.sve.ldnt1.gather.uxtw
* @llvm.aarch64.sve.ldnt1.gather.scalar.offset
2. non-temporal scatter stores
* @llvm.aarch64.sve.stnt1.scatter
* @llvm.aarch64.sve.ldnt1.gather.uxtw
* @llvm.aarch64.sve.ldnt1.gather.scalar.offset
These intrinsic are mapped to the corresponding SVE instructions
(example for half-words, zero-extending):
* ldnt1h { z0.s }, p0/z, [z0.s, x0]
* stnt1h { z0.s }, p0/z, [z0.s, x0]
Note that for non-temporal gathers/scatters, the SVE spec defines only
one instruction type: "vector + scalar". For this reason, we swap the
arguments when processing intrinsics that implement the "scalar +
vector" addressing mode:
* @llvm.aarch64.sve.ldnt1.gather
* @llvm.aarch64.sve.ldnt1.gather.uxtw
* @llvm.aarch64.sve.stnt1.scatter
* @llvm.aarch64.sve.ldnt1.gather.uxtw
In other words, all intrinsics for gather-loads and scatter-stores
implemented in this patch are mapped to the same load and store
instruction, respectively.
The sve2_mem_gldnt_vs multiclass (and it's counterpart for scatter
stores) from SVEInstrFormats.td was split into:
* sve2_mem_gldnt_vec_vs_32_ptrs (32bit wide base addresses)
* sve2_mem_gldnt_vec_vs_62_ptrs (64bit wide base addresses)
This is consistent with what we did for
@llvm.aarch64.sve.ld1.scalar.offset and highlights the actual split in
the spec and the implementation.
Reviewed by: sdesmalen
Differential Revision: https://reviews.llvm.org/D74858
Summary:
The following intrinsics are added:
* @llvm.aarch64.sve.ldff1.gather
* @llvm.aarch64.sve.ldff1.gather.index
* @llvm.aarch64.sve.ldff1.gather_sxtw
* @llvm.aarch64.sve.ldff1.gather.uxtw
* @llvm.aarch64.sve.ldff1.gather_sxtw.index
* @llvm.aarch64.sve.ldff1.gather.uxtw.index
* @llvm.aarch64.sve.ldff1.gather.scalar.offset
Although this patch is quite substantial, the vast majority of the
implementation is just a 'copy & paste' of the implementation of regular
gather loads, including tests. There's only a handful of new
definitions:
* AArch64ISD nodes defined in AArch64ISelLowering.h (e.g. GLDFF1)
* Seleciton DAG Types in AArch64SVEInstrInfo.td (e.g.
AArch64ldff1_gather)
* intrinsics in IntrinsicsAArch64.td (e.g. aarch64_sve_ldff1_gather)
* Pseudo instructions in SVEInstrFormats.td to workaround the issue of
use-before-def for the FFR register.
Reviewed By: sdesmalen
Differential Revision: https://reviews.llvm.org/D75128
In some cases Clang does not perform merging of instructions AND and TST (aka
ANDS xzr).
Example:
tst x2, x1
and x3, x2, x1
to:
ands x3, x2, x1
This patch add such merging during instruction selection: when AND is replaced
with ANDS instruction in LowerSELECT_CC, all users of AND also should be
changed for using this ANDS instruction
Short discussion on mailing list:
http://llvm.1065342.n5.nabble.com/llvm-dev-ARM-Peephole-optimization-instructions-tst-add-tp133109.html
Patch by Pavel Kosov.
Differential Revision: https://reviews.llvm.org/D71701
This node reads the rounding control which means it needs to be ordered properly with operations that change the rounding control. So it needs to be chained to maintain order.
This patch adds a chain input and output to the node and connects it to the chain in SelectionDAGBuilder. I've update all in-tree targets to connect their chain through their lowering code.
Differential Revision: https://reviews.llvm.org/D75132
Summary:
This patch renames functions and TableGen classes for SVE gathers and
scatters. The original names implied that the corresponding
methods/classes are only suited for regular gathers/scatters (i.e. LD1
and ST1), which is not the case. Indeed, we will be re-using them for
non-temporal and first-faulting gathers/scatters in the forthcoming
patches. The new names also highlight the split into Vector-Scalar (VS)
and Scalar-Vector (SV) cases.
List of changes:
* `performLD1GatherCombine` and `performST1ScatterCombine` are renamed
as `performGatherLoadCombine` and `performScatterStoreCombine`,
respectively.
* Selection DAG types for scatters and gathers from
AArch64SVEInstrInfo.td are renamed. For example, `SDT_AArch64_GLD1` is
renamed as `SDT_AArch64_GATHER_SV`. SV stands for Scalar-Vector, as
opposed to Vector-Scalar (VS).
* The intrinsic classes from IntrinsicsAArch64.td are renamed. For
example, `AdvSIMD_GatherLoad_64bitOffset_Intrinsic` is renamed as
`AdvSIMD_GatherLoad_SV_64b_Offsets_Intrinsic`.
* Updated comments in `performGatherLoadCombine` and
`performScatterStoreCombine`.
Reviewers: sdesmalen, rengolin, efriedma
Reviewed By: sdesmalen
Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75035
Summary:
Implements the following intrinsics:
* llvm.aarch64.sve.convert.to.svbool
* llvm.aarch64.sve.convert.from.svbool
For converting the ACLE svbool_t type (<n x 16 x i1>) to and from the
other predicate types: <n x 8 x i1>, <n x 4 x i1> and <n x 2 x i1>.
Reviewers: sdesmalen, kmclaughlin, efriedma, dancgr, rengolin
Reviewed By: sdesmalen, efriedma
Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74471
Summary:
Implements the @llvm.aarch64.sve.dupq.lane intrinsic.
As specified in the ACLE, the behaviour of:
svdupq_lane_u64(data, index)
...is identical to:
svtbl(data, svadd_x(svptrue_b64(),
svand_x(svptrue_b64(), svindex_u64(0, 1), 1),
index * 2))
If the index is in the range [0,3], the operation is equivalent
to a single DUP (.q) instruction.
Reviewers: sdesmalen, c-rhodes, cameron.mcinally, efriedma, dancgr, rengolin
Reviewed By: sdesmalen
Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74734
This patch enables the debug entry values feature.
- Remove the (CC1) experimental -femit-debug-entry-values option
- Enable it for x86, arm and aarch64 targets
- Resolve the test failures
- Leave the llc experimental option for targets that do not
support the CallSiteInfo yet
Differential Revision: https://reviews.llvm.org/D73534
Summary:
This patch implements the part of the calling convention
where SVE Vectors are passed by reference. This means the
caller must allocate stack space for these objects and
pass the address to the callee.
Reviewers: efriedma, rovka, cameron.mcinally, c-rhodes, rengolin
Reviewed By: efriedma
Subscribers: tschuett, kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71216
Summary:
Implements the @llvm.aarch64.sve.index intrinsic, which
takes a scalar base and step value.
This patch also adds the printSImm function to AArch64InstPrinter
to ensure that immediates of type i8 & i16 are printed correctly.
Reviewers: sdesmalen, andwar, efriedma, dancgr, cameron.mcinally, rengolin
Reviewed By: cameron.mcinally
Subscribers: tatyana-krasnukha, tschuett, kristof.beyls, hiraditya, rkruppe, arphaman, psnobl, cfe-commits, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74550
This patch added generation of SIMD bitwise insert BIT/BIF instructions.
In the absence of GCC-like functionality for optimal constraints satisfaction
during register allocation the bitwise insert and select patterns are matched
by pseudo bitwise select BSP instruction with not tied def.
It is expanded later after register allocation with def tied
to BSL/BIT/BIF depending on operands registers.
This allows to get rid of redundant moves.
Reviewers: t.p.northover, samparker, dmgreen
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D74147
This patch enables the debug entry values feature.
- Remove the (CC1) experimental -femit-debug-entry-values option
- Enable it for x86, arm and aarch64 targets
- Resolve the test failures
- Leave the llc experimental option for targets that do not
support the CallSiteInfo yet
Differential Revision: https://reviews.llvm.org/D73534
Remove code from LegalizeTypes that allowed this to work.
We were already using BUILD_PAIR for this in some places so this
standardizes on a single way to do this.
Summary: This patch introduces an API for MemOp in order to simplify and tighten the client code.
Reviewers: courbet
Subscribers: arsenm, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jsji, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73964
The CATCHPAD node mostly existed to be selected into the EH_RESTORE
instruction, which sets the frame back up when 32-bit Windows exceptions
return to the parent function. However, creating this MachineInstr early
increases the risk that other passes will come along and insert
instructions that use the stack before ESP and EBP are restored. That
happened in PR44697.
Instead of representing these in the instruction stream early, delay it
until PEI. Mark the blocks where this needs to happen as EHPads, but not
funclet entry blocks. Passes after PEI have to be careful not to hoist
instructions that can use stack across frame setup instructions, so this
should be relatively reliable.
Fixes PR44697
Reviewed By: hans
Differential Revision: https://reviews.llvm.org/D73752
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790
Reviewers: courbet
Subscribers: arsenm, dschuff, jyknight, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73885
Summary: This is a first step before changing the types to llvm::Align and introduce functions to ease client code.
Reviewers: courbet
Subscribers: arsenm, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, jsji, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73785
Strict fp-to-int and int-to-fp conversions can be handled in the same way that
the non-strict versions are (by using the appropriate instruction or converting
to a function call when we have no instruction).
Differential Revision: https://reviews.llvm.org/D73625
This gets selected to the appropriate fcvt instruction. Handling from there on
isn't fully correct yet, as we need to model fcvt reading and writing to fpsr
and fpcr.
Differential Revision: https://reviews.llvm.org/D73201
These become STRICT_FCMP and STRICT_FCMPE, which then get selected to the
corresponding FCMP and FCMPE instructions, though the handling from there on
isn't fully correct as we don't model reads and writes to FPCR and FPSR.
Differential Revision: https://reviews.llvm.org/D73368
This patch also fixes up a number of cases in DAGCombine and
SelectionDAGBuilder where the size of a scalable vector is used in a
fixed-width context (thus triggering an assertion failure).
Reviewers: efriedma, c-rhodes, rovka, cameron.mcinally
Reviewed By: efriedma
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71215
In LLVM IR, vscale can be represented with an intrinsic. For some targets,
this is equivalent to the constexpr:
getelementptr <vscale x 1 x i8>, <vscale x 1 x i8>* null, i32 1
This can be used to propagate the value in CodeGenPrepare.
In ISel we add a node that can be legalized to one or more
instructions to materialize the runtime vector length.
This patch also adds SVE CodeGen support for VSCALE, which maps this
node to RDVL instructions (for scaled multiples of 16bytes) or CNT[HSD]
instructions (scaled multiples of 2, 4, or 8 bytes, respectively).
Reviewers: rengolin, cameron.mcinally, hfinkel, sebpop, SjoerdMeijer, efriedma, lattner
Reviewed by: efriedma
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68203
Currently we fail to lower non-termporal stores for 256+ bit vectors
to STNPQ, because type legalization will split them up to 128 bit stores
and because there are no single non-temporal stores, creating STPNQ
in the Load/Store optimizer would be quite tricky.
This patch adds custom lowering for 256 bit non-temporal vector stores
to improve the generated code.
Reviewers: dmgreen, samparker, t.p.northover, ab
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D72919
The ACLE distinguishes between the following addressing modes for gather
loads:
* "scalar base, vector offset", and
* "vector base, scalar offset".
For the "vector base, scalar offset" case, the
`int_aarch64_sve_ld1_gather_imm` intrinsic was added in 79f2422d.
Currently, that intrinsic assumes that the scalar offset is passed as an
immediate. As a result, it does not cater for cases where scalar offset
is stored in a register.
In this patch `int_aarch64_sve_ld1_gather_imm` is extended so that all
cases are covered:
* `int_aarch64_sve_ld1_gather_imm` is renamed as
`int_aarch64_sve_ld1_gather_scalar_offset`
* new DAG combine rules are added for GLD1_IMM for scenarios where the
offset is a non-immediate scalar or an out-of-range immediate
* sve-intrinsics-gather-loads-vector-base.ll is renamed as
sve-intrinsics-gather-loads-vector-base-imm-offset.ll
* sve-intrinsics-gather-loads-vector-base-scalar-offset.ll is added to test
file for non-immediate offsets
Similar changes are made for scatter store intrinsics.
Reviewed By: sdesmalen, efriedma
Differential Revision: https://reviews.llvm.org/D71773
This was dropping the invariant metadata on dead argument loads, so
they weren't deleted.
Atomics still need to be fixed the same way. Also, apparently store
was never preserving dereferencable which should also be fixed.
which is the default TLS model for non-PIC objects. This allows large/
many thread local variables or a compact/fast code in an executable.
Specification is same as that of GCC. For example, the code model
option precedes the TLS size option.
TLS access models other than local-exec are not changed. It means
supoort of the large code model is only in the local exec TLS model.
Patch By KAWASHIMA Takahiro (kawashima-fj <t-kawashima@fujitsu.com>)
Reviewers: dmgreen, mstorsjo, t.p.northover, peter.smith, ostannard
Reviewd By: peter.smith
Committed by: peter.smith
Differential Revision: https://reviews.llvm.org/D71688
Only PPC seems to be using it, and only checks some simple cases and
doesn't distinguish between FP. Just switch to using LLT to simplify
use from GlobalISel.
For now, we didn't set the default operation action for SIGN_EXTEND_INREG for
vector type, which is 0 by default, that is legal. However, most target didn't
have native instructions to support this opcode. It should be set as expand by
default, as what we did for ANY_EXTEND_VECTOR_INREG.
Differential Revision: https://reviews.llvm.org/D70000
Summary:
Currently 32 bit unpacked offsets are passed as nxv2i64. However, as
pointed out in https://reviews.llvm.org/D71074, using nxv2i32 instead
would improve consistency with:
* how other arguments are treated
* how scatter stores are implemented
This patch makes sure that 32 bit unpacked offsets are passes as nxv2i32
instead of nxv2i64.
Reviewers: sdesmalen, efriedma
Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71724
These operations are needed as building blocks for promoting so they
can't be promoted themselves.
This appeared to work because the fp_extend query type for operation
actions is the result type, not the input type so it never triggered
in the legalizer.
For fp_round, the vector op legalizer just ended up creating a
nop fp_extend that was elided by getNode, followed by a nop
fp_round that was also elided by getNode. This was followed by
a final fp_round from v4f32 back to vf416 which was CSEd to the
original node. Then legalize vector ops just believed that node
legalized to itself. LegalizeDAG took another crack at promoting
it, but didn't have a handler so just skipped it with a debug
message saying it wasn't promoted.
This patch just removes the operation actions to avoid this
non-sense. Found while trying to refactor LegalizeVectorOps to
handle multiple result nodes better.
This moves the X86 specific transform from rL364407
into DAGCombiner to generically handle 'little to big' cases
(for example: extract_subvector(v2i64 bitcast(v16i8))). This
allows us to remove both the x86 implementation and the aarch64
bitcast(extract_subvector(bitcast())) combine.
Earlier patches that dealt with regressions initially exposed
by this patch:
rG5e5e99c041e4
rG0b38af89e2c0
Patch by: @RKSimon (Simon Pilgrim)
Differential Revision: https://reviews.llvm.org/D63815
As the extern_weak target might be missing, resolving to the absolute
address zero, we can't use the normal direct PC-relative branch
instructions (as that would result in relocations out of range).
Improve the classifyGlobalFunctionReference method to set
MO_DLLIMPORT/MO_COFFSTUB, and simplify the existing code in
AArch64TargetLowering::LowerCall to use the return value from
classifyGlobalFunctionReference for these cases.
Add code in both AArch64FastISel and GlobalISel/IRTranslator to
bail out for function calls to extern weak functions on windows,
to let SelectionDAG handle them.
This matches what was done for X86 in 6bf108d77a.
Differential Revision: https://reviews.llvm.org/D71721
This is another potential regression exposed by D63815.
Here we peek through a bitcast to find an extract subvector and
scale the splat offset based on that:
splat (bitcast (extract X, C)), LaneC --> duplane (bitcast X), LaneC'
Differential Revision: https://reviews.llvm.org/D71672
Recommit 23c28c4043 (reverted in
dcb48f50bd) with a fix for an assert
"Request for a fixed size on a scalable object" being triggered in
`LowerSVEIntrinsicEXT`. The fix is to call `getKnownMinSize` on the
TypeSize object.
Summary:
Instead of generating two i64 instructions for each load or store of a
volatile i128 value (two LDRs or STRs), now emit a single LDP or STP.
Reviewers: labrinea, t.p.northover, efriedma
Reviewed By: efriedma
Subscribers: kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69559
Summary:
The use of a boolean isInteger flag (generally initialized using
VT.isInteger()) caused errors in our out-of-tree CHERI backend
(https://github.com/CTSRD-CHERI/llvm-project).
In our backend, pointers use a separate ValueType (iFATPTR) and therefore
.isInteger() returns false. This meant that getSetCCInverse() was using the
floating-point variant and generated incorrect code for us:
`(void *)0x12033091e < (void *)0xffffffffffffffff` would return false.
Committing this change will significantly reduce our merge conflicts
for each upstream merge.
Reviewers: spatel, bogner
Reviewed By: bogner
Subscribers: wuzish, arsenm, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, jsji, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70917
Updated pred_load patterns added to AArch64SVEInstrInfo.td by this patch
to use reg + imm non-temporal loads to fix previous test failures.
Original commit message:
Adds the following intrinsics:
- llvm.aarch64.sve.ldnt1
- llvm.aarch64.sve.stnt1
This patch creates masked loads and stores with the
MONonTemporal flag set when used with the intrinsics above.
This has two main effects:
- Optimizes debug info size by saving 221.86 MB of obj file size in a
Windows optimized+debug build of 'all'. This is 3.03% of 7,332.7MB of
object file size.
- Incremental step towards decoupling target intrinsics.
The enums are still compact, so adding and removing a single
target-specific intrinsic will trigger a rebuild of all of LLVM.
Assigning distinct target id spaces is potential future work.
Part of PR34259
Reviewers: efriedma, echristo, MaskRay
Reviewed By: echristo, MaskRay
Differential Revision: https://reviews.llvm.org/D71320
Summary:
Adds the following intrinsics:
- llvm.aarch64.sve.ldnt1
- llvm.aarch64.sve.stnt1
This patch creates masked loads and stores with the
MONonTemporal flag set when used with the intrinsics above.
Reviewers: sdesmalen, paulwalker-arm, dancgr, mgudim, efriedma, rengolin
Reviewed By: efriedma
Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71000
Summary:
Recognize wide compares where the wide operand is a splat of a scalar
value in the appropriate range and convert to the immediate variant of
the instruction.
Patch by Graham Hunter
Reviewers: sdesmalen, efriedma, dancgr, rovka, rengolin
Reviewed By: efriedma
Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl,
llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71009
The generated sequence with whilelo is unintuitive, but it's the best
I could come up with given the limited number of SVE instructions that
interact with scalar registers. The other sequence I was considering
was something like dup+cmpne, but an extra scalar instruction seems
better than an extra vector instruction.
Differential Revision: https://reviews.llvm.org/D71160
This patch adds intrinsics for SVE gather loads from memory addresses generated by a vector base plus immediate index:
* @llvm.aarch64.sve.ld1.gather.imm
This intrinsics maps 1-1 to the corresponding SVE instruction (example for half-words):
* ld1h { z0.d }, p0/z, [z0.d, #16]
Committed on behalf of Andrzej Warzynski (andwar)
Reviewers: sdesmalen, huntergr, kmclaughlin, eli.friedman, rengolin, rovka, dancgr, mgudim, efriedma
Reviewed By: sdesmalen
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70806
AMDGPU needs to know the FP mode for the function to answer this
correctly when this is removed from the subtarget.
AArch64 had to make this more complicated by using this from an IR
hook, so add an IR typed overload.
* Implements scalable size queries for MVTs, split out from D53137.
* Contains a fix for FindMemType to avoid using scalable vector type
to contain non-scalable types.
* Explicit casts for several places where implicit integer sign
changes or promotion from 32 to 64 bits caused problems.
* CodeGenDAGPatterns will treat scalable and non-scalable vector types
as different.
Reviewers: greened, cameron.mcinally, sdesmalen, rovka
Reviewed By: rovka
Differential Revision: https://reviews.llvm.org/D66871
We had some code for this for 32-bit ARM, but this doesn't really need
to be in target-specific code; generalize it.
(I think this started showing up recently because we added an
optimization that converts pow to powi.)
Differential Revision: https://reviews.llvm.org/D69013
This adds some extra patterns to select AArch64 Neon SQADD, UQADD, SQSUB
and UQSUB from the existing target independent sadd_sat, uadd_sat,
ssub_sat and usub_sat nodes.
It does not attempt to replace the existing int_aarch64_neon_uqadd
intrinsic nodes as they are apparently used for both scalar and vector,
and need to be legal on scalar types for some of the patterns to work.
The int_aarch64_neon_uqadd on scalar would move the two integers into
floating point registers, perform a Neon uqadd and move the value back.
I don't believe this is good idea for uadd_sat to do the same as the
scalar alternative is simpler (an adds with a csinv). For signed it may
be smaller, but I'm not sure about it being better.
So this just adds some extra patterns for the existing vector
instructions, matching on the _sat nodes.
Differential Revision: https://reviews.llvm.org/D69374
Summary:
A new function pass (Transforms/CFGuard/CFGuard.cpp) inserts CFGuard checks on
indirect function calls, using either the check mechanism (X86, ARM, AArch64) or
or the dispatch mechanism (X86-64). The check mechanism requires a new calling
convention for the supported targets. The dispatch mechanism adds the target as
an operand bundle, which is processed by SelectionDAG. Another pass
(CodeGen/CFGuardLongjmp.cpp) identifies and emits valid longjmp targets, as
required by /guard:cf. This feature is enabled using the `cfguard` CC1 option.
Reviewers: thakis, rnk, theraven, pcc
Subscribers: ychen, hans, metalcanine, dmajor, tomrittervg, alex, mehdi_amini, mgorny, javed.absar, kristof.beyls, hiraditya, steven_wu, dexonsmith, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D65761
Adds a new ISD node to replicate a scalar value across all elements of
a vector. This is needed for scalable vectors, since BUILD_VECTOR cannot
be used.
Fixes up default type legalization for scalable vectors after the
new MVT type ranges were introduced.
At present I only use this node for scalable vectors. A DAGCombine has
been added to transform a BUILD_VECTOR into a SPLAT_VECTOR if all
elements are the same, but only if the default operation action of
Expand has been overridden by the target.
I've only added result promotion legalization for scalable vector
i8/i16/i32/i64 types in AArch64 for now.
Reviewers: t.p.northover, javed.absar, greened, cameron.mcinally, jmolloy
Reviewed By: jmolloy
Differential Revision: https://reviews.llvm.org/D47775
llvm-svn: 375222
Summary:
Implements the following intrinsics:
- int_aarch64_sve_sunpkhi
- int_aarch64_sve_sunpklo
- int_aarch64_sve_uunpkhi
- int_aarch64_sve_uunpklo
This patch also adds AArch64ISD nodes for UNPK instead of implementing
the intrinsics directly, as they are required for a future patch which
implements the sign/zero extension of legal vectors.
This patch includes tests for the Subdivide2Argument type added by D67549
Reviewers: sdesmalen, SjoerdMeijer, greened, rengolin, rovka
Reviewed By: greened
Subscribers: tschuett, kristof.beyls, rkruppe, psnobl, cfe-commits, llvm-commits
Differential Revision: https://reviews.llvm.org/D67550
llvm-svn: 375210
* Adds a TypeSize struct to represent the known minimum size of a type
along with a flag to indicate that the runtime size is a integer multiple
of that size
* Converts existing size query functions from Type.h and DataLayout.h to
return a TypeSize result
* Adds convenience methods (including a transparent conversion operator
to uint64_t) so that most existing code 'just works' as if the return
values were still scalars.
* Uses the new size queries along with ElementCount to ensure that all
supported instructions used with scalable vectors can be constructed
in IR.
Reviewers: hfinkel, lattner, rkruppe, greened, rovka, rengolin, sdesmalen
Reviewed By: rovka, sdesmalen
Differential Revision: https://reviews.llvm.org/D53137
llvm-svn: 374042
Support for tracking registers that forward function parameters into the
following function frame. For now we only support cases when parameter
is forwarded through single register.
Reviewers: aprantl, vsk, t.p.northover
Reviewed By: vsk
Differential Revision: https://reviews.llvm.org/D66953
llvm-svn: 374033
Replace with the MachineFunction. X86 is the only user, and only uses
it for the function. This removes one obstacle from using this in
GlobalISel. The other is the more tolerable EVT argument.
The X86 use of the function seems questionable to me. It checks hasFP,
before frame lowering.
llvm-svn: 373292
This caused severe compile-time regressions, see PR43455.
> Modern processors predict the targets of an indirect branch regardless of
> the size of any jump table used to glean its target address. Moreover,
> branch predictors typically use resources limited by the number of actual
> targets that occur at run time.
>
> This patch changes the semantics of the option `-max-jump-table-size` to limit
> the number of different targets instead of the number of entries in a jump
> table. Thus, it is now renamed to `-max-jump-table-targets`.
>
> Before, when `-max-jump-table-size` was specified, it could happen that
> cluster jump tables could have targets used repeatedly, but each one was
> counted and typically resulted in tables with the same number of entries.
> With this patch, when specifying `-max-jump-table-targets`, tables may have
> different lengths, since the number of unique targets is counted towards the
> limit, but the number of unique targets in tables is the same, but for the
> last one containing the balance of targets.
>
> Differential revision: https://reviews.llvm.org/D60295
llvm-svn: 373060
Modern processors predict the targets of an indirect branch regardless of
the size of any jump table used to glean its target address. Moreover,
branch predictors typically use resources limited by the number of actual
targets that occur at run time.
This patch changes the semantics of the option `-max-jump-table-size` to limit
the number of different targets instead of the number of entries in a jump
table. Thus, it is now renamed to `-max-jump-table-targets`.
Before, when `-max-jump-table-size` was specified, it could happen that
cluster jump tables could have targets used repeatedly, but each one was
counted and typically resulted in tables with the same number of entries.
With this patch, when specifying `-max-jump-table-targets`, tables may have
different lengths, since the number of unique targets is counted towards the
limit, but the number of unique targets in tables is the same, but for the
last one containing the balance of targets.
Differential revision: https://reviews.llvm.org/D60295
llvm-svn: 372893
I think we should be able to use shl instead of sshl and ushl for
positive constant shift values, unless I am missing something.
We already have the machinery in place to ensure we only replace
nodes, if the shift value is positive and <= the element width.
This is a generalization of an earlier patch rL372565.
Reviewers: t.p.northover, samparker, dmgreen, anemet
Reviewed By: anemet
Differential Revision: https://reviews.llvm.org/D67955
llvm-svn: 372824
Try to generate ushll/sshll for aarch64_neon_ushl/aarch64_neon_sshl,
if their first operand is extended and the second operand is a constant
Also adds a few tests marked with FIXME, where we can further increase
codegen.
Reviewers: t.p.northover, samparker, dmgreen, anemet
Reviewed By: anemet
Differential Revision: https://reviews.llvm.org/D62308
llvm-svn: 372565
* Reordered MVT simple types to group scalable vector types
together.
* New range functions in MachineValueType.h to only iterate over
the fixed-length int/fp vector types.
* Stopped backends which don't support scalable vector types from
iterating over scalable types.
Reviewers: sdesmalen, greened
Reviewed By: greened
Differential Revision: https://reviews.llvm.org/D66339
llvm-svn: 372099
Summary:
Adds the following inline asm constraints for SVE:
- Upl: One of the low eight SVE predicate registers, P0 to P7 inclusive
- Upa: SVE predicate register with full range, P0 to P15
Reviewers: t.p.northover, sdesmalen, rovka, momchil.velikov, cameron.mcinally, greened, rengolin
Reviewed By: rovka
Subscribers: javed.absar, tschuett, rkruppe, psnobl, cfe-commits, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66524
llvm-svn: 371967
This is the main CodeGen patch to support the arm64_32 watchOS ABI in LLVM.
FastISel is mostly disabled for now since it would generate incorrect code for
ILP32.
llvm-svn: 371722
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790
Reviewers: courbet
Subscribers: nemanjai, javed.absar, hiraditya, kbarton, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, s.egerton, pzheng, ychen, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67267
llvm-svn: 371212
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790
Reviewers: courbet
Subscribers: jyknight, sdardis, nemanjai, javed.absar, hiraditya, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, s.egerton, pzheng, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67229
llvm-svn: 371200
Summary:
This patch renames functions that takes or returns alignment as log2, this patch will help with the transition to llvm::Align.
The renaming makes it explicit that we deal with log(alignment) instead of a power of two alignment.
A few renames uncovered dubious assignments:
- `MirParser`/`MirPrinter` was expecting powers of two but `MachineFunction` and `MachineBasicBlock` were using deal with log2(align). This patch fixes it and updates the documentation.
- `MachineBlockPlacement` exposes two flags (`align-all-blocks` and `align-all-nofallthru-blocks`) supposedly interpreted as power of two alignments, internally these values are interpreted as log2(align). This patch updates the documentation,
- `MachineFunctionexposes` exposes `align-all-functions` also interpreted as power of two alignment, internally this value is interpreted as log2(align). This patch updates the documentation,
Reviewers: lattner, thegameg, courbet
Subscribers: dschuff, arsenm, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, javed.absar, hiraditya, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, dexonsmith, PkmX, jocewei, jsji, Jim, s.egerton, llvm-commits, courbet
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65945
llvm-svn: 371045
Summary:
Adds the following inline asm constraints for SVE:
- w: SVE vector register with full range, Z0 to Z31
- x: Restricted to registers Z0 to Z15 inclusive.
- y: Restricted to registers Z0 to Z7 inclusive.
This change also adds the "z" modifier to interpret a register as an SVE register.
Not all of the bitconvert patterns added by this patch are used, but they have been included here for completeness.
Reviewers: t.p.northover, sdesmalen, rovka, momchil.velikov, rengolin, cameron.mcinally, greened
Reviewed By: sdesmalen
Subscribers: javed.absar, tschuett, rkruppe, psnobl, cfe-commits, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66302
llvm-svn: 370673
The patch fixed the issue that RV64 didn't clear the upper bits
when return complex floating value with lp64 ABI.
float _Complex
complex_add(float _Complex a, float _Complex b)
{
return a + b;
}
RealResult = zero_extend(RealA + RealB)
ImageResult = ImageA + ImageB
Return (RealResult | (ImageResult << 32))
The patch introduces shouldExtendTypeInLibCall target hook to suppress
the AssertZext generation when lowering floating LibCall.
Thanks to Eli's comments from the Bugzilla
https://bugs.llvm.org/show_bug.cgi?id=42820
Differential Revision: https://reviews.llvm.org/D65497
llvm-svn: 370275
Neither libgcc or compiler-rt are usually used on Windows, so these
functions can't be called.
Differential revision: https://reviews.llvm.org/D66880
llvm-svn: 370204
Inserting a value into Visited has the effect of terminating a search for
predecessors if that node is seen. This is legitimate for the base address, and
acts as a slight performance optimization, but the vector-building node can be
paert of a legitimate cycle so we shouldn't stop searching there.
PR43056.
llvm-svn: 370036
The patch introduces MakeLibCallOptions struct as suggested by @efriedma on D65497.
The struct contain argument flags which will pass to makeLibCall function.
The patch should not has any functionality changes.
Differential Revision: https://reviews.llvm.org/D65795
llvm-svn: 369622
Patch D56593 by @courbet results in calls to `bcmp()` in some cases, should
the target support the it. Unless `TTI::MemCmpExpansionOptions()`
is overridden by the target.
In a proprietary benchmark we see a performance drop of about 12% on PNG
compression before this patch, though it passes all tests.
This patch mirrors X86 for AArch64 and initializes
`TTI::MemCmpExpansionOptions()` to then expand calls to `bcmp()` when
appropriate. No tuning of the parameters was performed, but, at this point,
it's enough to recover the performance drop above.
This problem also exists on ARM. Once a consensus is reached for AArch64, we
can work to fix ARM as well.
Authors:
- Evandro Menezes (@evandro) <e.menezes@samsung.com>
- Brian Rzycki (@brzycki) <b.rzycki@samsung.com>
Differential revision: https://reviews.llvm.org/D64805
llvm-svn: 367898
Summary:
This patch adds initial support for the SVE calling convention such that
SVE types can be passed as arguments and return values to/from a
subroutine.
The SVE AAPCS states [1]:
z0-z7 are used to pass scalable vector arguments to a subroutine,
and to return scalable vector results from a function. If a
subroutine takes arguments in scalable vector or predicate
registers, or if it is a function that returns results in such
registers, it must ensure that the entire contents of z8-z23 are
preserved across the call. In other cases it need only preserve the
low 64 bits of z8-z15, as described in §5.1.2.
p0-p3 are used to pass scalable predicate arguments to a subroutine
and to return scalable predicate results from a function. If a
subroutine takes arguments in scalable vector or predicate
registers, or if it is a function that returns results in these
registers, it must ensure that p4-p15 are preserved across the call.
In other cases it need not preserve any scalable predicate register
contents.
SVE predicate and data registers are passed indirectly (i.e. spilled to the
stack and pass the address) if they exceed the registers used for argument
passing defined by the PCS referenced above. Until SVE stack support is merged
we can't spill SVE registers to the stack, so currently an llvm_unreachable is
used where we will eventually handle this.
[1] https://static.docs.arm.com/100986/0000/100986_0000.pdf
Reviewed By: ostannard
Differential Revision: https://reviews.llvm.org/D65448
llvm-svn: 367859
We process 2 elements at a time and expect the number of elements to be
even. Similar to D60690.
Reviewers: dmgreen, samparker, t.p.northover
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D65400
llvm-svn: 367831
Summary:
This is patch is part of a serie to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790
Reviewers: courbet, jfb, jakehehrlich
Reviewed By: jfb
Subscribers: wuzish, jholewinski, arsenm, dschuff, nemanjai, jvesely, nhaehnle, javed.absar, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, dexonsmith, PkmX, jocewei, jsji, s.egerton, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65514
llvm-svn: 367828
This makes the field wider than MachineOperand::SubReg_TargetFlags so that
we don't end up silently truncating any higher bits. We should still catch
any bits truncated from the MachineOperand field as a consequence of the
assertion in MachineOperand::setTargetFlags().
Differential Revision: https://reviews.llvm.org/D65465
llvm-svn: 367474
Summary:
This was originally reported in D62818.
https://rise4fun.com/Alive/oPH
InstCombine does the opposite fold, in hope that `C l>>/<< Y` expression
will be hoisted out of a loop if `Y` is invariant and `X` is not.
But as it is seen from the diffs here, if it didn't get hoisted,
the produced assembly is almost universally worse.
Much like with my recent "hoist add/sub by/from const" patches,
we should get almost universal win if we hoist constant,
there is almost always an "and/test by imm" instruction,
but "shift of imm" not so much, so we may avoid having to
materialize the immediate, and thus need one less register.
And since we now shift not by constant, but by something else,
the live-range of that something else may reduce.
Special care needs to be applied not to disturb x86 `BT` / hexagon `tstbit`
instruction pattern. And to not get into endless combine loop.
Reviewers: RKSimon, efriedma, t.p.northover, craig.topper, spatel, arsenm
Reviewed By: spatel
Subscribers: hiraditya, MaskRay, wuzish, xbolva00, nikic, nemanjai, jvesely, wdng, nhaehnle, javed.absar, tpr, kristof.beyls, jsji, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62871
llvm-svn: 366955
This introduces a new family of combiner helper routines that re-use the
target specific cost model from SelectionDAG, and generate inline implementations
of the memcpy family of intrinsics.
The combines are only enabled at optimization levels higher than -O0, and give
very substantial performance improvements.
Differential Revision: https://reviews.llvm.org/D65167
llvm-svn: 366951
Implement IR intrinsics for stack tagging. Generated code is very
unoptimized for now.
Two special intrinsics, llvm.aarch64.irg.sp and llvm.aarch64.tagp are
used to implement a tagged stack frame pointer in a virtual register.
Differential Revision: https://reviews.llvm.org/D64172
llvm-svn: 366360
Summary:
This is the backend part of [[ https://bugs.llvm.org/show_bug.cgi?id=42457 | PR42457 ]].
In middle-end, we'd want to prefer the form with two adds - D63992,
but as this diff shows, not every target will prefer that pattern.
Out of 4 targets for which i added tests all seem to be ok with inc-of-add for scalars,
but only X86 prefer that same pattern for vectors.
Here i'm adding a new TLI hook, always defaulting to the inc-of-add,
but adding AArch64,ARM,PowerPC overrides to prefer inc-of-add only for scalars.
Reviewers: spatel, RKSimon, efriedma, t.p.northover, hfinkel
Reviewed By: efriedma
Subscribers: nemanjai, javed.absar, kristof.beyls, kbarton, jsji, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64090
llvm-svn: 365010
On Windows ARM64, intrinsic __debugbreak is compiled into brk #0xF000 which is
mapped to llvm.debugtrap in Clang. Instruction brk #F000 is the defined break
point instruction on ARM64 which is recognized by Windows debugger and
exception handling code, so llvm.debugtrap should map to it instead of
redirecting to llvm.trap (brk #1) as the default implementation.
Differential Revision: https://reviews.llvm.org/D63635
llvm-svn: 364115
As discussed on D62910, we need to check whether particular types of memory access are allowed, not just their alignment/address-space.
This NFC patch adds a MachineMemOperand::Flags argument to allowsMemoryAccess and allowsMisalignedMemoryAccesses, and wires up calls to pass the relevant flags to them.
If people are happy with this approach I can then update X86TargetLowering::allowsMisalignedMemoryAccesses to handle misaligned NT load/stores.
Differential Revision: https://reviews.llvm.org/D63075
llvm-svn: 363179
This patch optimizes ISD::LRINT and ISD::LLRINT to frintx plus
fcvtzs. It currently only handles the scalar version.
Reviewed By: SjoerdMeijer, mstorsjo
Differential Revision: https://reviews.llvm.org/D62018
llvm-svn: 361877
Summary:
On Windows, X8 may be used to pass in the address of an aggregate that
is returned indirectly. Therefore, it should be forwarded to variadic
musttail calls and preserved in thunks.
Fixes PR41997
Reviewers: mgrang, efriedma
Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62344
llvm-svn: 361585
Some checks in isShuffleMaskLegal expect an even number of elements,
e.g. isTRN_v_undef_Mask or isUZP_v_undef_Mask, otherwise they access
invalid elements and crash. This patch adds checks to the impacted
functions.
Fixes PR41951
Reviewers: t.p.northover, dmgreen, samparker
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D60690
llvm-svn: 361235