This patches fixes the visibility and linkage information of symbols
referring to IR globals.
Emission of external declarations is now done in the first execution
of emitConstantPool rather than in emitLinkage (and a few other
places). This is the point where we have already gathered information
about used symbols (by running the MC Lower PrePass) and not yet
started emitting any functions so that any declarations that need to
be emitted are done so at the top of the file before any functions.
This changes the order of a few directives in the final asm file which
required an update to a few tests.
Reviewed By: sbc100
Differential Revision: https://reviews.llvm.org/D118122
The current cost-model overestimates the cost of vector compares &
selects for ordered floating point compares. This patch fixes that by
extending the existing logic for integer predicates.
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D118256
masked.atomicrmw.*.i32 intrinsics access an i32 (and then possibly
mask it), so hardcode MVT::i32 as the access type here, rather than
determining it from the pointer element type.
Differential Revision: https://reviews.llvm.org/D118336
This is a slight change because I'm using the ANY_EXTEND result
instead of the original operand, but getNode should constant fold.
While there, add a comment about why the code specifically checks
for a ConstantSDNode.
We can use the RISCVISD::GREV encoding that swaps the bits in
each byte. This allows it to use the existing computeKnownBits
support for RISCVISD::GREV.
Limit this to SSE41 - AVX1 targets to avoid UNPCKL(PSHUFB,PSHUFB), pre-SSE41 we don't have PACKUSDW/BLENDW and with AVX2 we can perform this as PERMQ(PSHUFB()).
Allows pow2 mask tests to avoid an unnecessary constant load.
Noticed while investigating how to extend MatchVectorAllZeroTest to support more allof/anyof patterns.
We already have an ISD opcode for the more general GREV/GREVI
instructon. We can just use it with the encoding that corresponds
to the behavior of brev8. This is similar to what we do for orc.b
where we use the GORC ISD opcode.
Without AVX512 (which can efficiently extend/truncate to vXi16/vXi32), unpacking/packing to vXi16 is more efficient that relying on the (uops-heavy) PBLENDV shift expansion
AArch64ISD::PFALSE does not provide any value, in fact it can
prevent common combines from firing. We only needed to lower
to PFALSE until ISD::SPLAT_VECTOR became generally available.
Differential Revision: https://reviews.llvm.org/D118469
This refactors some code dealing with setting Wasm symbol types.
Some of the code dealing with types was moved from
`WebAssemblyUtilities` to `WebAssemblyTypeUtilities`.
Reviewed By: sbc100
Differential Revision: https://reviews.llvm.org/D118121
Now the backend promotes mask vector to an i8 vector and extract element from that. We could bitcast to a widen element vector, and extract from it to GPR, then use I instruction to extract the certain bit.
Differential Revision: https://reviews.llvm.org/D117389
Currently, the clang.arc.attachedcall bundle takes an optional function
argument. Depending on whether the argument is present, calls with this
bundle have the following semantics:
- on x86, with the argument present, the call is lowered to:
call _target
mov rax, rdi
call _objc_retainAutoreleasedReturnValue
- on AArch64, without the argument, the call is lowered to:
bl _target
mov x29, x29
and the objc runtime call is expected to be emitted separately.
That's because, on x86, the objc runtime checks for both the mov and
the call on x86, and treats the combination as the ARC autorelease elision
marker.
But on AArch64, it only checks for the dedicated NOP marker, as that's
historically been sufficiently unique. Thanks to that, the runtime call
wasn't required to be adjacent to the NOP marker, so it wasn't emitted
as part of the bundle sequence.
This patch unifies both architectures: on AArch64, we now emit all
3 instructions for the bundle. This guarantees that the runtime call
is adjacent to the marker in the sequence, and that's information the
runtime can use to further optimize this.
This helps simplify some of the handling, in particular
BundledRetainClaimRVs, which no longer needs to know whether the bundle
is sufficient or not: it now always should be.
Note that this does not include an AutoUpgrade for the nullary bundles,
as they are only produced in ObjCContract as part of the obj/asm emission
pipeline, and are not expected to be in bitcode.
Differential Revision: https://reviews.llvm.org/D118214
We were creating a truncate with the default for the type, but for
VP intrinsics we have a VL that we should use.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D118406
Fix the pseudos to have the correct size in the MCInstrDesc description.
Inspired by D118009 and D117970.
Reviewed By: asb
Differential Revision: https://reviews.llvm.org/D118175
`__gnu_h2f_ieee` and `__gnu_f2h_ieee` are introduce by ARM and set that as
default name for fp16 and fp32 conversion in LLVM.
However RISC-V GCC using default naming scheme for that, which is
`__extendhfsf2` and `__truncsfhf2` for that, that cause runtime ABI
incompatible issue.
Although we didn't have formal runtime ABI spec to specify those naming
convention yet, but I think it would be great to fix the incompatible
issue first.
And I've plan to create a runtime ABI spec undere psABI spec this year.
Reviewed By: asb
Differential Revision: https://reviews.llvm.org/D118207
extract_vec_elt (load X), C --> scalar load (X+C)
As noted in the comment, DAGCombiner has this fold -- and the code in this
patch is adapted from DAGCombiner::scalarizeExtractedVectorLoad() -- but
x86 should benefit even if the loaded vector has other uses as long as we
apply some other x86-specific conditions. The motivating example from #50310
is shown in vec_int_to_fp.ll.
Fixes#50310
Differential Revision: https://reviews.llvm.org/D118376
Where the instruction mnemonic contains a dot, we name the corresponding
instruction in the .td file using a _ in the place of the dot. e.g. LR_W
rather than LRW. This commit updates RISCVInstrInfoZb.td to follow that
convention.
This patch updates how splat loads handled and is an extension of D106555.
Particularly, for v2i64/v4f32/v4i32 types, they are updated to handle only
non-extending loads. For v8i16/v16i8 types, they are updated to handle extending
loads only if the memory VT is the same vector element VT type.
A test case has been added to illustrate a scenario where a PPCISD::LD_SPLAT
node should not be produced. In this test, it depicts the following f64
extending load used in a v2f64 build vector, but the extending load is actually
used in more places other than the build vector (such as in t12 and t16).
```
Type-legalized selection DAG: %bb.0 'test:entry'
SelectionDAG has 20 nodes:
t0: ch = EntryToken
t4: i64,ch = CopyFromReg t0, Register:i64 %1
t6: i64,ch = CopyFromReg t0, Register:i64 %2
t11: f64,ch = load<(load (s64) from %ir.b, !tbaa !7)> t0, t4, undef:i64
t16: f64 = fadd t31, t37
t34: ch = store<(store (s64) into %ir.c, !tbaa !7)> t31:1, t16, t6, undef:i64
t36: ch = TokenFactor t34, t37:1
t27: v2f64 = BUILD_VECTOR t37, t37
t22: ch,glue = CopyToReg t36, Register:v2f64 $v2, t27
t12: f64 = fadd t11, t37
t28: ch = store<(store (s64) into %ir.b, !tbaa !7)> t11:1, t12, t4, undef:i64
t31: f64,ch = load<(load (s64) from %ir.c, !tbaa !7)> t28, t6, undef:i64
t2: i64,ch = CopyFromReg t0, Register:i64 %0
t37: f64,ch = load<(load (s32) from %ir.a, !tbaa !3), anyext from f32> t0, t2, undef:i64
t23: ch = PPCISD::RET_FLAG t22, Register:v2f64 $v2, t22:1
```
Differential Revision: https://reviews.llvm.org/D117803
When a comparison is extended and it would be free to extend the
arguments to that comparison, we can propagate the extend into those arguments.
This prevents extra instructions being generated to extend the result of the
comparison, which is not free to extend.
This is a resubmission of D116812 with fixes that need another review.
Differential Revision: https://reviews.llvm.org/D118139
Adds patterns of the form "(and a, (not b)) -> bic".
NOTE: With this support I'm inclined to remove AArch64ISD::BIC,
but will leave that investigation for another time.
Differential Revision: https://reviews.llvm.org/D118365
Previous folds by combineSetCCMOVMSK might have converted these to CMP when changing the bitwidth, and the CMP->SUB fold might not have happened (or will happen)
rG9103b73fe052 was assuming that we could OR/AND with the source vector, but that will fail on float/double vectors without bitcasting - it also missed the case that any_of checks might be testing less than all the source elements
`instrprof-icall-promo.test` `FAIL`s on Solaris/sparcv9:
Profile-sparc :: instrprof-icall-promo.test
Profile-sparcv9 :: instrprof-icall-promo.test
when compiling `compiler-rt/test/profile/Inputs/instrprof-icall-promo_2.cpp` with
fatal error: error in backend: Relocation for CG Profile could not be created: unknown relocation name
This happens because the Sparc backend doesn't implement `BFD_RELOC_NONE`.
This patch fixes that, following what X86 does.
Tested on `sparcv9-sun-solaris2.11`.
Differential Revision: https://reviews.llvm.org/D118136
Inspecting the pointer element type here is incompatible with
opaque pointers, and doesn't seem necessary to me. I think the
intention might have been to check the type of load/store pointer
arguments, but I believe those should get checked through their
return type or value operand anyway. I don't get any test failures
if I simply drop this.
Differential Revision: https://reviews.llvm.org/D118353
This is similar to D116619, but now it handles `invoke`s. The reason we
didn't handle `invoke`s back then was we didn't support Wasm EH + Wasm
SjLj together, and the only case SjLj transformation will see `invoke`s
is when we are using Wasm EH. (In Emscripten EH, they would have been
transformed to `call`s to invoke wrappers.)
But after D117610 we support Wasm EH + Wasm SjLj together and we can
nullify `invoke`s to `setjmp` when there is no other longjmpable calls
within the function. Actually this is very unlikely to happen in
practice, because we treat destructors as longjmpable and also treat
`__cxa_end_catch` as longjmpable even if it is not.
Reviewed By: dschuff
Differential Revision: https://reviews.llvm.org/D118408
Wasm SjLj converts longjmpable calls into `invoke`s that unwind to
`%catch.longjmp.dispatch` BB, from where we check if the thrown
exception is a `longjmp`. But in case a call already has a `funclet`
attribute, i.e., it is within a catch scope, we have to unwind to its
unwind destination first to preserve the scoping structure. That will
eventually unwind to `%catch.longjmp.dispatch`, because all
`catchswitch` and `cleanupret` that unwind to caller are redirected to
`%catch.dispatch.longjmp` during Wasm SjLj transformation.
But the prevous code assumed `cleanuppad`'s parent pad was always an
instruction, and didn't handle when a `cleanuppad`'s parent is `none`.
This CL handles this case, and makes the `while` loop more intuitive by
removing `FromPad` condition and explicitly inserting `break`s.
Reviewed By: dschuff
Differential Revision: https://reviews.llvm.org/D118407
Wasm EH, used with either of Emscripten SjLj or Wasm SjLj, does not
allow `setjmp` calls to be placed within a `catch` clause, because we
don't support jumping into a `catch` block. Emscripten EH does not have
this restriction. But I think this restriction wouldn't prevent most of
use cases. This CL errors out with a clear messsage for this case.
Reviewed By: dschuff, kripken
Differential Revision: https://reviews.llvm.org/D118286
When we create an invoke wrapper call, if the original call instruction
has a `noreturn` attribute, we shouldn't copy it, because we expect
invoke wrapper calls to return. This generated incorrect `free` call
before an invoke wrapper call that calls `__cxa_throw`, because
`__cxa_throw` has `noreturn` attribute.
Reviewed By: dschuff
Differential Revision: https://reviews.llvm.org/D118274
In D117610 we treated `__cxa_end_catch` longjmpable even though it was
not to make unwind destination relationships correct. But we only need
to do this in Wasm SjLj, and doing this in Emscripten SjLj does not make
the code incorrect but add unnecessary invokes. This CL treats
`__cxa_end_catch` longjmpable only in Wasm SjLj.
Reviewed By: dschuff
Differential Revision: https://reviews.llvm.org/D117943
According to riscv-v-spec-1.0, widening signed(vs2)-unsigned integer multiply
vwmulsu.vv vd, vs2, vs1, vm # vector-vector
vwmulsu.vx vd, vs2, rs1, vm # vector-scalar
It is worth noting that signed op is only for vs2.
For vwmulsu.vv, we can swap two ops, and don't care which is sign extension,
but for vwmulsu.vx signExt can not be a vector extended from scalar (rs1).
I specifically added two functions ending with _swap in the test case.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D118215
Return false from ShouldShrinkFPConstant(), so that these constants are stored
in their full size on the constant pool, even if they could have been shrunk
and used with an extending load.
This is better since LD is faster than LDE, and it also enables reg/mem opcodes.
Review: Ulrich Weigand
Differential Revision: https://reviews.llvm.org/D117927
By reordering the objects on the stack frame after looking at the users, a
better utilization of displacement operands will result. This means less
needed Load Address instructions for the accessing of these objects.
This is important for very large functions where otherwise small changes
could cause a lot more/less accesses go out of range.
Note: this is not yet enabled for SystemZXPLINKFrameLowering, but should be.
Review: Ulrich Weigand
Differential Revision: https://reviews.llvm.org/D115690
Since it's introduction, the qrdmlah has been represented as a qrdmulh
and a sadd_sat. This doesn't produce the same result for all input
values though. This patch fixes that by introducing a qrdmlah (and
qrdmlsh) intrinsic specifically for the vqrdmlah and sqrdmlah
instructions. The old test cases will now produce a qrdmulh and sqadd,
as expected.
Fixes#53120 and #50905 and #51761.
Differential Revision: https://reviews.llvm.org/D117592
Rejecting AGPR DS_WRITE instructions before adding them to any mergeable
list seems cleaner than adding them to the list and rejecting them
later.
Differential Revision: https://reviews.llvm.org/D118368
Using separate lists for AGPR and non-AGPR instructions seems like a
cleaner solution than putting them all in the same list and then later
refusing to merge instructions of different AGPR-ness.
Differential Revision: https://reviews.llvm.org/D118367
Change CombineInfo::setMI to take a reference to the
SILoadStoreOptimizer instance, for easy access to common fields like
TII and STM.
Differential Revision: https://reviews.llvm.org/D118366
This adds the following changes:
* Fold: vselect(<all active predicate>, x, y) => x
* Extend isAllActivePredicate to take vscale_range into account, e.g.
isAllActivePredicate(vl16) for nxv16i1 and vscale == 1 => true.
isAllActivePredicate(vl32) for nxv16i1 and vscale == 2 => true.
Differential Revision: https://reviews.llvm.org/D118147
InstCombine performs this more generally with SimplifyUsingDistributiveLaws, but we don't need anything that complex here - this is mainly to fix up cases where logic ops get created late on during lowering, often in conjunction with sext/zext ops for type legalization.
https://alive2.llvm.org/ce/z/gGpY5v
Swizzled accesses are not merged, but there is no particular reason not
to merge two instructions if any of the intervening instructions happens
to be a swizzled access.
This moves the check for swizzled accesses out of checkAndPrepareMerge
into collectMergeableInsts where I think it makes more sense.
Differential Revision: https://reviews.llvm.org/D118267
The ISel patterns for PFALSE helps recognise the instructions as being
free of side-effects, which helps MachineCSE remove redundant
PFALSE instructions.
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D118054
SILoadStoreOptimizer::collectMergeableInsts already ends the current
block if it sees a volatile (or ordered) memory access, so there is no
need to check for them again when scanning the instructions between two
pairing candidates in a block.
Differential Revision: https://reviews.llvm.org/D118266
This patch adds support for expanding VP_MERGE through a sequence of
vector operations producing a full-length mask setting up the elements
past EVL/pivot to be false, combining this with the original mask, and
culminating in a full-length vector select.
This expansion should work for any data type, though the only use for
RVV is for boolean vectors, which themselves rely on an expansion for
the VSELECT.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D118058
CSKY arch has multiple FPU instruction versions such as FPU, FPUv2 and FPUv3 to implement floating operations.
For now, we just only support FPUv2 and FPUv3.
It includes the encoding, asm parsing of instructions and codegen of DAG nodes.
This matches what the spec uses for the vncvt.x.x.w assembly
pseudoinstruction.
Reviewed By: kito-cheng
Differential Revision: https://reviews.llvm.org/D118295
Form the MAI spec: It’s ok that Src_C and vDst are the exact same VGPRs
or Src_C and vDst are completely separated. The case that Src_C and vDst
are overlapping should be avoid as new value could be written to accumulator
input before it gets read.
Note that this inevitably increases register pressure to the point where
some programs will become uncompilable.
This patch separates MAC and FMA versions of MFMA instructions using either
tied dst and src2 or earlyclobber dst.
Fixes: SWDEV-318900
Differential Revision: https://reviews.llvm.org/D117844
Most importantly, fixes constant bus errors in the 64-bit cases. It's
surprising to me these were even passing the selection test using
SReg_* sources. Also fixes pattern matching in the 32-bit cases, with
simple operands.
These patterns aren't working in a few cases, like with mixed SGPR
inputs. The patterns aren't looking through the SGPR->VGPR copies like
they need to. The vector cases also have some unmerges of build_vector
which are obscuring the inputs.
For Zba/Zbb/Zbc/Zbs I've removed the 'B' completely and used the
extension names as presented at the start of Chapter 1 of the
1.0.0 Bitmanipulation spec.
For the unratified extensions, I've replaced 'B' with 'Zb' and
otherwise left them unchanged.
Reviewed By: asb
Differential Revision: https://reviews.llvm.org/D117822
This reverts commit ef82063207.
- It conflicts with the existing llvm::size in STLExtras, which will now
never be called.
- Calling it without llvm:: breaks C++17 compat
In commit 1674d9b6b2, I fixed the bug where we didn't consider
both words of the result of the comparison. However, the logic
needs to be different for eq and ne.
Namely for eq, we need both words of the doubleword to equal so it
is an AND. OTOH for ne, we need either word to be unequal so it
is an OR.
AMDGPUHSAMetadataStreamer currently assumes that pointer arguments
without align attribute have ABI alignment of the pointee type.
This is incompatible with opaque pointers, but also plain incorrect:
Pointer arguments without explicit alignment have alignment 1. It is
the responsibility of the frontent to add correct align annotations.
Differential Revision: https://reviews.llvm.org/D118229
We have some bitcasts which we know will be simplified,
so their cost is zero.
Reviewed By: david-arm, sdesmalen
Differential Revision: https://reviews.llvm.org/D118019
Packed-mode broadcast of f32/i32 requires the subregister to be
replicated to the full I64 register prior. Add repl_i32 and repl_f32 to
faciliate this.
Reviewed By: kaz7
Differential Revision: https://reviews.llvm.org/D117878
Currently not (xor_one_use) pattern is always selected to S_XNOR irrelative od the node divergence.
This relies on further custom selection pass which converts to VALU if necessary and replaces with V_NOT_B32 ( V_XOR_B32)
on those targets which have no V_XNOR.
Current change enables the patterns which explicitly select the not (xor_one_use) to appropriate form.
We assume that xor (not) is already turned into the not (xor) by the combiner.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D116270
NOTE: Only considers i64 based vectors at this time because smaller
element types require extra isel operand parsing.
Differential Revision: https://reviews.llvm.org/D118040
When wider vectors are used, for example fixed width SVE,
there is no patterns to select AArch64ISD::LD1LANEpost
nodes, so we should do an early exit.
Reviewed By: paulwalker-arm
Differential Revision: https://reviews.llvm.org/D117674
In the Zve* extensions, the vlen could be 64. This patch change the vlen constraint of low bound to 64.
Differential Revision: https://reviews.llvm.org/D118217
According to GNU as documentation, PowerPC supports some .gnu_attribute
tags to represent the vector and float ABI type in the object file.
Some linkers like GNU ld respects the attribute and will prevent objects
with conflicting ABIs being linked.
This patch emits gnu_attribute value in assembly printer according to
the float-abi metadata. More attributes for soft-fp, hard single/double
and even vector ABI need to be supported in the future.
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D117193