There was no pattern to fold into these instructions. This patch adds
the pattern obtained from the following ACLE intrinsics so that they
generate sqdmlal/sqdmlsl instructions instead of separate sqdmull and
sqadd/sqsub instructions:
- vqdmlalh_s16, vqdmlslh_s16
- vqdmlalh_lane_s16, vqdmlalh_laneq_s16, vqdmlslh_lane_s16,
vqdmlslh_laneq_s16 (when the lane index is 0)
It also modifies the result of the existing pattern for the latter, when
the lane index is not 0, to use the v1i32_indexed instructions instead
of the v4i16_indexed ones.
Fixes#49997.
Differential Revision: https://reviews.llvm.org/D131700
As discussed in D85414 <https://reviews.llvm.org/D85414>, two tests
currently `FAIL` on Sparc since that backend uses the Sun assembler syntax
for the `.section` directive, controlled by
`SunStyleELFSectionSwitchSyntax`.
Instead of adapting the affected tests, this patch changes that default.
The internal assembler still accepts both forms as input, only the output
syntax is affected.
Current support for the Sun syntax is cursory at best: the built-in
assembler cannot even assemble some of the directives emitted by GCC, and
the set supported by the Solaris assembler is even larger: SPARC Assembly
Language Reference Manual, 3.4 Pseudo-Op Attributes
<https://docs.oracle.com/cd/E37838_01/html/E61063/gmabi.html#scrolltoc>.
A few Sparc test cases need to be adjusted. At the same time, the patch
fixes the failures from D85414 <https://reviews.llvm.org/D85414>.
Tested on `sparcv9-sun-solaris2.11`.
Differential Revision: https://reviews.llvm.org/D85415
Add scheduling resource for vector segment load/store instructions in D128886.
I miss to add scheduling resource for pseudo segment instructions.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D130222
There are two different senses in which a block can be "address-taken".
There can be a BlockAddress involved, which means we need to map the
IR-level value to some specific block of machine code. Or there can be
constructs inside a function which involve using the address of a basic
block to implement certain kinds of control flow.
Mixing these together causes a problem: if target-specific passes are
marking random blocks "address-taken", if we have a BlockAddress, we
can't actually tell which MachineBasicBlock corresponds to the
BlockAddress.
So split this into two separate bits: one for BlockAddress, and one for
the machine-specific bits.
Discovered while trying to sort out related stuff on D102817.
Differential Revision: https://reviews.llvm.org/D124697
This time using N1 instead of N0 since N1 points to the original
setcc. This now affects scheduling as I expected.
Original commit message:
We change seteq<->setne but it doesn't change the semantics
of the setcc. We should keep original debug location. This is
consistent with visitXor in the generic DAGCombiner.
We change seteq<->setne but it doesn't change the semantics
of the setcc. We should keep original debug location. This is
consistent with visitXor in the generic DAGCombiner.
While (sub 0, X) can use x0 for the 0, I believe (add X, -1) is
still preferrable. (addi X, -1) can be compressed, sub with x0 on
the LHS is never compressible.
This introduce an xori in some cases. I don't believe it was the
intention of the original patch. This was an accident because
nonan FP equality compares also use SETEQ/SETNE.
Also pass the correct type to getSetCCInverse.
-Rename variable NnzC -> N0C.
-Use SelectionDAG::getSetCC to reduce code.
-Use SDValue::getOperand instead of operator-> and SDNode::getOperand.
Initial steps to add another similar combine to this code.
Currenlty all temporal loads are mapped to `LDP` or `LDR`. This patch will map all the non temporal 256-bit loads into `LDNP`. Future patches should address other non-temporal loads.
Reviewed By: fhahn, dmgreen
Differential Revision: https://reviews.llvm.org/D131773
There is an existing mechanism to escape strings, therefore the
functions created to escape Tag_also_compatible_with values are not
really needed. We can simply use the pre-existing utilities.
Reviewed By: pratlucas
Differential Revision: https://reviews.llvm.org/D131680
Previously, LegaizeDAG didn't check mask.compress's passthrough might be float, and this lead to getConstant crash since it doesn't support fp
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D131947
We have a good selection of W instructions, so promoting a truncated
value back to i64 is often free.
This appears to be a net code size reduction on SPECINT2006.
This has been split from D130397 as one of the patches needed to
complete that.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D131819
I had hoped to make this a generic fold in DAGCombine, but there's quite a few regressions in Thumb2 MVE that need addressing first.
Fixes regressions from D106675.
R1 is a reserved register, but LLVM gives the APIs to know when it is
used or not. So this patch uses these APIs to only save/clear/restore R1
in interrupts when necessary.
The main issue here was getting inline assembly to work. One could argue
that this is the job of Clang, but for consistency I've made sure that
R1 is always usable in inline assembly even if that means clearing it
when it might not be needed.
Information on inline assembly in AVR can be found here:
https://www.nongnu.org/avr-libc/user-manual/inline_asm.html#asm_code
Essentially, this seems to suggest that r1 can be freely used in avr-gcc
inline assembly, even without specifying it as an input operand.
Differential Revision: https://reviews.llvm.org/D117426
The code to support the case when the register allocator has assigned
the same register to the src and the dst register operand isn't actually
needed:
* LDWRdPtr and LDDWRdPtrQ have an @earlyclobber on the output
register, so the register allocator will make sure to allocate a
different register for the output register.
* LDDWRdYQ does not have an @earlyclobber, but the pointer register is
the fixed Y register which is reserved. The register allocator won't
use reserved registers for the output value.
This removes a special case in the code that makes the pseudo
instruction expansion pass more complicated than it needs to be.
Differential Revision: https://reviews.llvm.org/D131844
This is to avoid f16->i64 being lowered to `__fixhfdi/__fixunshfdi` on 32-bits since neither libgcc nor compiler-rt provide them. https://godbolt.org/z/cjWEsea5v
It also helps to improve the performance by promoting the vector type.
Reviewed By: LuoYuanke
Differential Revision: https://reviews.llvm.org/D131828
This reverts commit 38c2366b3f.
This patch seems to break boostraping LLVM with `-fglobal-isel -O3`
on AArch64 hardware. Without the revert, there are 500+ test
failures for the `check-llvm-codegen-x86` target.
(setcc x, y, eq/neq) are seqz, snez that set rd = 0/1.
addi is used to process immediate, which can save instructions for load immediate.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D131471
The linker may convert such an ADD into a LEA, so we must not
use the EFLAGS output.
This causes miscompiles with -fsanitize=null after
bacdf80f42 added
llvm.threadlocal.address -- previously, global variables were known to
be non-null, but the intrinsic is not currently known to return
nonnull. (That should be corrected, but it shouldn't've caused
miscompiles!)
Differential Revision: https://reviews.llvm.org/D131716
This patch supports SPIR-V capabilities and extensions. In addition,
it inserts decorations related to MIFlags and improves support of switches.
Five tests are included to demonstrate the improvement.
Differential Revision: https://reviews.llvm.org/D131221
Co-authored-by: Aleksandr Bezzubikov <zuban32s@gmail.com>
Co-authored-by: Michal Paszkowski <michal.paszkowski@outlook.com>
Co-authored-by: Andrey Tretyakov <andrey1.tretyakov@intel.com>
Co-authored-by: Konrad Trifunovic <konrad.trifunovic@intel.com>
Since c5b3de6745 (git main,
August 11th), Clang does generate working hidden visibility
on MinGW targets. Using that reduces the number of exports from
a dylib build of LLVM significantly, which is vital for fitting
within the limit of 64k exported symbols from a DLL.
It's essential that if we set CMAKE_CXX_VISIBILITY_PRESET=hidden
(which passes -fvisibility=hidden on the command line), we also
must define LLVM_EXTERNAL_VISIBILITY consistently to override
it. (If there are mismatches, e.g. setting hidden visibility generally
but never overriding it back to default for the symbols that do need
to be exported, we'd get broken builds in such configurations.)
We don't want to be using __attribute__((visibility("hidden"))) on
MinGW with GCC, because GCC produces a warning about it. (GCC hasn't
warned about the command line options that set hidden visibility
though.) Clang has historically not warned about either of them, so
it is harmless to use the hidden visibility when building with older
Clang (so we don't need to detect the exact version of Clang/LLVM where
it has an effect).
This reduces the number of exported symbols for a dylib build of LLVM;
previously libLLVM exported around 64650 symbols (when the maximum is
65536) when the ARM, AArch64 and X86 targets were enabled. If enabling
more targets (or if building with e.g. assertions enabled), it would
exceed the limit. Now with visibility flags in use, the same build
with ARM, AArch64 and X86 ends up at around 35k exported symbols.
Differential Revision: https://reviews.llvm.org/D131661
Including patterns to select addiw if only the lower 32 bits are used.
I'm not excited about adding this many patterns. I'm looking at whether
we can create the xori during lowering and move the ineg patterns to
DAGCombiner.
lowerShuffleWithVPMOV currently only matches shuffle(truncate(x)) patterns, but on VLX targets the truncate isn't usually necessary to make the VPMOV node worthwhile (as we're only targetting v16i8/v8i16 shuffles we're almost always ending up with a PSHUFB node instead). PACKSS/PACKUS are still preferred vs VPMOV due to their lower uop count.
Fixes the remaining regression from the fixes in rG293899c64b75