Commit Graph

68078 Commits

Author SHA1 Message Date
Phoebe Wang 190518da4b [X86] Use generic tuning for "x86-64" if "tune-cpu" is not specified
This is an alternative to D129154. See discussions on https://discourse.llvm.org/t/fast-scalar-fsqrt-tuning-in-x86/63605

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D129647
2022-07-15 10:05:08 +08:00
Craig Topper 79016f6eef [RISCV] Refine the heuristics for our custom (mul (and X, C2), C1) isel.
Prefer to use SLLI instead of zext.w/zext.h in more cases. SLLI
might be better for compression.
2022-07-14 18:24:10 -07:00
Craig Topper bc0d656558 [RISCV] Fix mistake in RISCVTTIImpl::getIntImmCostInst.
zext.w requires Zba not Zbb. The test was also wrong, but had the
correct comment.
2022-07-14 16:42:35 -07:00
jeff 8a12f20ef7 [AMDGPU] Update the mechanism used to check for cycles and add eges in power-sched mutation 2022-07-14 16:24:13 -07:00
Craig Topper 9913ea490a [RISCV] Make TuneSiFive7 depend on TuneNoDefaultUnroll instead of listing it for every SiFive7 CPU 2022-07-14 15:57:30 -07:00
Alexander Timofeev 2e29b0138c [AMDGPU] Lowering VGPR to SGPR copies to v_readfirstlane_b32 if profitable.
Since the divergence-driven instruction selection has been enabled for AMDGPU,
 all the uniform instructions are expected to be selected to SALU form, except those not having one.
 VGPR to SGPR copies appear in MIR to connect values producers and consumers. This change implements an algorithm
 that evolves a reasonable tradeoff between the profit achieved from keeping the uniform instructions in SALU form
 and overhead introduced by the data transfer between the VGPRs and SGPRs.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D128252
2022-07-14 23:59:02 +02:00
Craig Topper 1a8468ba61 [RISCV] Add a RISCV specific CodeGenPrepare pass.
Initial optimization is to convert (i64 (zext (i32 X))) to
(i64 (sext (i32 X))) if the dominating condition for the basic block
guaranteed the sign bit of X is zero.

This frequently occurs in loop preheaders where a signed induction
variable that can never be negative has been widened. There will be
a dominating check that the 32-bit trip count isn't negative or zero.
The check here is not restricted to that specific case though.

A i32->i64 sext is cheaper than zext on RV64 without the Zba
extension. Later optimizations can often remove the sext from the
preheader basic block because the dominating block also needs a sext to
evaluate the greater than 0 check.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D129732
2022-07-14 10:20:59 -07:00
Fraser Cormack d1a5669f5e [RISCV] Disable subregister liveness by default
We previously enabled subregister liveness by default when compiling
with RVV. This has been shown to cause miscompilations where RVV
register operand constraints are not met. A test was added for this in
D129639 which explains the issue in more detail.

Until this issue is fixed in some way, we should not be enabling
subregister liveness unless the user asks for it.

Reviewed By: craig.topper, rogfer01, kito-cheng

Differential Revision: https://reviews.llvm.org/D129646
2022-07-14 17:04:10 +01:00
Nikita Popov dcf4b733ef [SCEVExpander] Make CanonicalMode handing in isSafeToExpand() more robust (PR50506)
isSafeToExpand() for addrecs depends on whether the SCEVExpander
will be used in CanonicalMode. At least one caller currently gets
this wrong, resulting in PR50506.

Fix this by a) making the CanonicalMode argument on the freestanding
functions required and b) adding member functions on SCEVExpander
that automatically take the SCEVExpander mode into account. We can
use the latter variant nearly everywhere, and thus make sure that
there is no chance of CanonicalMode mismatch.

Fixes https://github.com/llvm/llvm-project/issues/50506.

Differential Revision: https://reviews.llvm.org/D129630
2022-07-14 14:41:51 +02:00
Jay Foad e45aa230ad [AMDGPU] Update LiveVariables after killing an immediate def
D114999 added code to kill an immediate def if it was folded into its
only use by convertToThreeAddress. This patch updates LiveVariables when
that happens in order to fix verification failures exposed by D129213.

Differential Revision: https://reviews.llvm.org/D129661
2022-07-14 10:49:41 +01:00
Cullen Rhodes 055b409cea [AArch64][NFC] Drop 'V' from ASIMD FP convert, other, D/Q-form regex
In the Cortex A57 Optimization Guide [1] VCVTAU (AArch32) is incorrectly
listed in the AArch64 instructions for instruction groups:

  - ASIMD FP convert, other, D-form
  - ASIMD FP convert, other, Q-form

It's meant to be FCVTAU, this will be fixed in future releases of the guide.

[1] https://developer.arm.com/documentation/uan0015/b
2022-07-14 09:32:20 +00:00
Weining Lu 9b87ad33c1 [LoongArch] Implement OR combination to generate bstrins.w/d
Differential Revision: https://reviews.llvm.org/D129357
2022-07-14 17:20:43 +08:00
David Green 3e0bf1c7a9 [CodeGen] Move instruction predicate verification to emitInstruction
D25618 added a method to verify the instruction predicates for an
emitted instruction, through verifyInstructionPredicates added into
<Target>MCCodeEmitter::encodeInstruction. This is a very useful idea,
but the implementation inside MCCodeEmitter made it only fire for object
files, not assembly which most of the llvm test suite uses.

This patch moves the code into the <Target>_MC::verifyInstructionPredicates
method, inside the InstrInfo.  The allows it to be called from other
places, such as in this patch where it is called from the
<Target>AsmPrinter::emitInstruction methods which should trigger for
both assembly and object files. It can also be called from other places
such as verifyInstruction, but that is not done here (it tends to catch
errors earlier, but in reality just shows all the mir tests that have
incorrect feature predicates). The interface was also simplified
slightly, moving computeAvailableFeatures into the function so that it
does not need to be called externally.

The ARM, AMDGPU (but not R600), AVR, Mips and X86 backends all currently
show errors in the test-suite, so have been disabled with FIXME
comments.

Recommitted with some fixes for the leftover MCII variables in release
builds.

Differential Revision: https://reviews.llvm.org/D129506
2022-07-14 09:33:28 +01:00
Jannik Silvanus e5c4cde451 [AMDGPU] SIMachineScheduler: Add support for several MachineScheduler features
The SI machine scheduler inherits from ScheduleDAGMI.
This patch adds support for a few features that are implemented
in ScheduleDAGMI (or its base classes) that were missing so far
because their support is implemented in overridden functions.

* Support cl::opt -view-misched-dags
  This option allows to open a graphical window of the scheduling DAG.

* Support cl::opt -misched-print-dags
  This option allows to print the scheduling DAG in text form.

* After constructing the scheduling DAG, call postprocessDAG()
  to apply any registered DAG mutations.
  Note that currently there are no mutations defined in AMDGPUTargetMachine.cpp
  in case SIScheduler is used.
  Still add this to avoid surprises in the future in case mutations are added.

Differential Revision: https://reviews.llvm.org/D128808
2022-07-14 09:45:31 +02:00
Kazu Hirata 611ffcf4e4 [llvm] Use value instead of getValue (NFC) 2022-07-13 23:11:56 -07:00
owenca 18a910dfba [llvm] Make lib/Target/BPF/BTF.h self-contained 2022-07-13 20:54:26 -07:00
Zi Xuan Wu (Zeson) 033324db6f [CSKY] Fix the br target operand type in td
br target operand should be Operand<OtherVT> type instead of Operand<iPTR>
2022-07-14 11:27:31 +08:00
Yonghong Song 6e6c1efe04 [BPF] Handle anon record for CO-RE relocations
When doing experiment in kernel, for kernel data structure sockptr_t
in CO-RE operation, I hit an assertion error. The sockptr_t definition
and usage look like below:
  #pragma clang attribute push (__attribute__((preserve_access_index)), apply_to = record)
  typedef struct {
        union {
                void    *kernel;
                void    *user;
        };
        unsigned is_kernel : 1;
  } sockptr_t;
  #pragma clang attribute pop
  int test(sockptr_t *arg) {
    return arg->is_kernel;
  }
The assertion error looks like
  clang: ../lib/Target/BPF/BPFAbstractMemberAccess.cpp:878: llvm::Value*
   {anonymous}::BPFAbstractMemberAccess::computeBaseAndAccessKey(llvm::CallInst*,
   {anonymous}::BPFAbstractMemberAccess::CallInfo&, std::__cxx11::string&,
   llvm::MDNode*&): Assertion `TypeName.size()' failed.

In this particular, the clang frontend attach the debuginfo metadata associated
with anon structure with the preserve_access_info IR intrinsic. But the first
debuginfo type has to be a named type so libbpf can have a sound start to
do CO-RE relocation.

Besides the above approach using pragma to push attribute, the below typedef/struct
definition can have preserve_access_index directly applying to the anon struct.
  typedef struct {
        union {
                void    *kernel;
                void    *user;
        };
        unsigned is_kernel : 1;
  } __attribute__((preserve_access_index) sockptr_t;

This patch fixed the issue by preprocessing function argument/return types
and local variable types used by other CO-RE intrinsics. For any
   typedef struct/union { ... } typedef_name
an association of <anon struct/union, typedef> is recorded to replace
the IR intrinsic metadata 'anon struct/union' to 'typedef'.
It is possible that two different 'typedef' types may have identical
anon struct/union type. For such a case, the association will be
<anon struct/union, nullptr> to indicate the invalid case.

Differential Revision: https://reviews.llvm.org/D129621
2022-07-13 15:16:16 -07:00
Craig Topper 257755530a [RISCV] Fold (sra (sext_inreg (shl X, C1), i32), C2) -> (sra (shl X, C1+32), C2+32).
The former pattern will select as slliw+sraiw while the latter
will select as slli+srai. This can enable the slli+srai to be
compressed.

Differential Revision: https://reviews.llvm.org/D129688
2022-07-13 14:34:17 -07:00
Philip Reames dde2a7fb6d [RISCV] Exploit fact that vscale is always power of two to replace urem sequence
When doing scalable vectorization, the loop vectorizer uses a urem in the computation of the vector trip count. The RHS of that urem is a (possibly shifted) call to @llvm.vscale.

vscale is effectively the number of "blocks" in the vector register. (That is, types such as <vscale x 8 x i8> and <vscale x 1 x i8> both fill one 64 bit block, and vscale is essentially how many of those blocks there are in a single vector register at runtime.)

We know from the RISCV V extension specification that VLEN must be a power of two between ELEN and 2^16. Since our block size is 64 bits, the must be a power of two numbers of blocks. (For everything other than VLEN<=32, but that's already broken.)

It is worth noting that AArch64 SVE specification explicitly allows non-power-of-two sizes for the vector registers and thus can't claim that vscale is a power of two by this logic.

Differential Revision: https://reviews.llvm.org/D129609
2022-07-13 10:54:47 -07:00
David Green 95252133e1 Revert "Move instruction predicate verification to emitInstruction"
This reverts commit e2fb8c0f4b as it does
not build for Release builds, and some buildbots are giving more warning
than I saw locally. Reverting to fix those issues.
2022-07-13 13:28:11 +01:00
David Green e2fb8c0f4b Move instruction predicate verification to emitInstruction
D25618 added a method to verify the instruction predicates for an
emitted instruction, through verifyInstructionPredicates added into
<Target>MCCodeEmitter::encodeInstruction. This is a very useful idea,
but the implementation inside MCCodeEmitter made it only fire for object
files, not assembly which most of the llvm test suite uses.

This patch moves the code into the <Target>_MC::verifyInstructionPredicates
method, inside the InstrInfo.  The allows it to be called from other
places, such as in this patch where it is called from the
<Target>AsmPrinter::emitInstruction methods which should trigger for
both assembly and object files. It can also be called from other places
such as verifyInstruction, but that is not done here (it tends to catch
errors earlier, but in reality just shows all the mir tests that have
incorrect feature predicates). The interface was also simplified
slightly, moving computeAvailableFeatures into the function so that it
does not need to be called externally.

The ARM, AMDGPU (but not R600), AVR, Mips and X86 backends all currently
show errors in the test-suite, so have been disabled with FIXME
comments.

Differential Revision: https://reviews.llvm.org/D129506
2022-07-13 12:53:32 +01:00
Cullen Rhodes 7c3cda551a [AArch64][SVE] Prefer SIMD&FP variant of clast[ab]
The scalar variant with GPR source/dest has considerably higher latency
than the SIMD&FP scalar variant across a variety of micro-architectures:

  Core           Scalar    SIMD&FP
  --------------------------------
  Neoverse V1     9 cyc      3 cyc
  Neoverse N2     8 cyc      3 cyc
  Cortex A510     8 cyc      4 cyc
  A64FX          29 cyc      6 cyc
2022-07-13 08:53:36 +00:00
Fraser Cormack b336cf856e [RISCV] Add early-exit to RVV stack computation. NFCI.
This patch was split off from D126465, where an early-exit is necessary
as it checks the VLEN and that asserts that V instructions are present.

Since this makes logical sense on its own, I think it's worth landing
regardless of D126465.

Reviewed By: kito-cheng

Differential Revision: https://reviews.llvm.org/D129617
2022-07-13 08:50:08 +01:00
gonglingqin 47f3dc6d49 [LoongArch] Add codegen support for atomic fence, atomic load and atomic store
Differential Revision: https://reviews.llvm.org/D128901
2022-07-13 15:25:45 +08:00
gonglingqin e147a0f65a [LoongArch] Add codegen support for converting between unsigned integer and floating-point
Differential Revision: https://reviews.llvm.org/D128900
2022-07-13 15:25:44 +08:00
gonglingqin 1df96ce518 [LoongArch] Add codegen support for fpround, fpextend and converting between signed integer and floating-point
Differential Revision: https://reviews.llvm.org/D128899
2022-07-13 15:25:43 +08:00
Monk Chiang 2b045324b2 [RISCV] Add scheduling resources for vector segment instructions.
Add scheduling resources for vector segment instructions

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D128886
2022-07-12 22:51:58 -07:00
Simon Pilgrim 66bfd1ba8c [X86] Move isInRange(ArrayRef<int>) inside assert to fix NDEBUG builds. NFC.
Fix unused static function warning introduced by D129207
2022-07-12 21:51:07 +01:00
Nick Desaulniers 2240d72f15 [X86] initial -mfunction-return=thunk-extern support
Adds support for:
* `-mfunction-return=<value>` command line flag, and
* `__attribute__((function_return("<value>")))` function attribute

Where the supported <value>s are:
* keep (disable)
* thunk-extern (enable)

thunk-extern enables clang to change ret instructions into jmps to an
external symbol named __x86_return_thunk, implemented as a new
MachineFunctionPass named "x86-return-thunks", keyed off the new IR
attribute fn_ret_thunk_extern.

The symbol __x86_return_thunk is expected to be provided by the runtime
the compiled code is linked against and is not defined by the compiler.
Enabling this option alone doesn't provide mitigations without
corresponding definitions of __x86_return_thunk!

This new MachineFunctionPass is very similar to "x86-lvi-ret".

The <value>s "thunk" and "thunk-inline" are currently unsupported. It's
not clear yet that they are necessary: whether the thunk pattern they
would emit is beneficial or used anywhere.

Should the <value>s "thunk" and "thunk-inline" become necessary,
x86-return-thunks could probably be merged into x86-retpoline-thunks
which has pre-existing machinery for emitting thunks (which could be
used to implement the <value> "thunk").

Has been found to build+boot with corresponding Linux
kernel patches. This helps the Linux kernel mitigate RETBLEED.
* CVE-2022-23816
* CVE-2022-28693
* CVE-2022-29901

See also:
* "RETBLEED: Arbitrary Speculative Code Execution with Return
Instructions."
* AMD SECURITY NOTICE AMD-SN-1037: AMD CPU Branch Type Confusion
* TECHNICAL GUIDANCE FOR MITIGATING BRANCH TYPE CONFUSION REVISION 1.0
  2022-07-12
* Return Stack Buffer Underflow / Return Stack Buffer Underflow /
  CVE-2022-29901, CVE-2022-28693 / INTEL-SA-00702

SystemZ may eventually want to support "thunk-extern" and "thunk"; both
options are used by the Linux kernel's CONFIG_EXPOLINE.

This functionality has been available in GCC since the 8.1 release, and
was backported to the 7.3 release.

Many thanks for folks that provided discrete review off list due to the
embargoed nature of this hardware vulnerability. Many Bothans died to
bring us this information.

Link: https://www.youtube.com/watch?v=IF6HbCKQHK8
Link: https://github.com/llvm/llvm-project/issues/54404
Link: https://gcc.gnu.org/legacy-ml/gcc-patches/2018-01/msg01197.html
Link: https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/advisory-guidance/return-stack-buffer-underflow.html
Link: https://arstechnica.com/information-technology/2022/07/intel-and-amd-cpus-vulnerable-to-a-new-speculative-execution-attack/?comments=1
Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ce114c866860aa9eae3f50974efc68241186ba60
Link: https://www.intel.com/content/www/us/en/security-center/advisory/intel-sa-00702.html
Link: https://www.intel.com/content/www/us/en/security-center/advisory/intel-sa-00707.html

Reviewed By: aaron.ballman, craig.topper

Differential Revision: https://reviews.llvm.org/D129572
2022-07-12 09:17:54 -07:00
Jay Foad 5d41fe0768 [AMDGPU] SILowerControlFlow uses LiveIntervals
The availability of LiveIntervals affects kill flags in the output, so
declare the use to avoid strange effects where the output of this pass
is different depending on what other passes are scheduled after it.

Differential Revision: https://reviews.llvm.org/D129555
2022-07-12 16:53:53 +01:00
Igor Kudrin 9ff10a0d62 [NVPTX] Add missing pass names
Differential Revision:
2022-07-12 07:58:13 -07:00
Rosie Sumpter e5edc1b5ee [AArch64][SVE] Ensure PTEST operands have type nxv16i1
Currently any legal predicate types will be pattern-matched when
creating a PTEST instruction. This could be a problem in future since
PTEST always uses the .B specifier for the operand, but it is not
always guaranteed that the extra lanes of unpacked types (e.g. nxv4i1)
are zero. This patch ensures the operands of PTEST are type nxv16i1,
where the undef lanes are set to zero.

Differential Revision: https://reviews.llvm.org/D129282/
2022-07-12 09:27:59 +01:00
Craig Topper c5be6a8308 [RISCV] Use X0 in place of VLMaxSentinel in lowering.
I thought I had already fixed all of these, but I guess I missed one.
2022-07-11 23:29:04 -07:00
Xiang1 Zhang a45dd3d814 [X86] Support -mstack-protector-guard-symbol
Reviewed By: nickdesaulniers

Differential Revision: https://reviews.llvm.org/D129346
2022-07-12 10:17:00 +08:00
Xiang1 Zhang 643786213b Revert "[X86] Support -mstack-protector-guard-symbol"
This reverts commit efbaad1c4a.
due to miss adding review info.
2022-07-12 10:14:32 +08:00
Xiang1 Zhang efbaad1c4a [X86] Support -mstack-protector-guard-symbol 2022-07-12 10:13:48 +08:00
Craig Topper c3c17b1695 [RISCV] Use MVT for the argument to getMaskTypeFor. NFC
Only one caller didn't already have an MVT and that was easy to
fix. Since the return type is MVT and it uses MVT::getVectorVT,
taking an MVT as input makes the most sense.
2022-07-11 15:14:44 -07:00
Piotr Sobczak 2bd8e74b94 [AMDGPU] Fix bitcast v4i64/v16i16
Fix a regression introduced in D128865.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D129375
2022-07-11 22:27:52 +02:00
Craig Topper 759e5e0096 [RISCV] Remove doPeepholeLoadStoreADDI.
All of the cases should be handled by SelectAddrRegImm now.

Reviewed By: asb, luismarques

Differential Revision: https://reviews.llvm.org/D129451
2022-07-11 10:44:33 -07:00
Craig Topper 907d923a20 [RISCV] Move the custom isel for (add X, imm) into SelectAddrRegImm.
This custom isel was used to split the lo12 bits of the imm so that
they could be folded into load/store addresses via a post-isel
peephole.

This patch instead splits the immediate during isel and folds the
lo12 removing the need for the post-isel peephole to do anything.

After this we'll be able to remove the post-isel peephole.

Reviewed By: asb, luismarques

Differential Revision: https://reviews.llvm.org/D129450
2022-07-11 10:44:33 -07:00
spupyrev eecd41aa09 Revert "Rebase: [Facebook] [MC] Introduce NeverAlign fragment type"
This reverts commit 6d0528636a.
2022-07-11 09:50:47 -07:00
Craig Topper 1a2bd44b77 [RISCV] Make shouldConvertConstantLoadToIntImm return true unless enableUnalignedScalarMem is true.
This restores the old behavior before D129402 when
enableUnalignedScalarMem is false. This fixes a regression spotted
by @asb.

To fix this correctly, we need to consider alignment of the load
we'd be replacing, but that's not possible in the current interface.
2022-07-11 09:40:08 -07:00
Rafael Auler 6d0528636a Rebase: [Facebook] [MC] Introduce NeverAlign fragment type
Summary:
Introduce NeverAlign fragment type.

The intended usage of this fragment is to insert it before a pair of
macro-op fusion eligible instructions. NeverAlign fragment ensures that
the next fragment (first instruction in the pair) does not end at a
given alignment boundary by emitting a minimal size nop if necessary.

In effect, it ensures that a pair of macro-fusible instructions is not
split by a given alignment boundary, which is a precondition for
macro-op fusion in modern Intel Cores (64B = cache line size, see Intel
Architecture Optimization Reference Manual, 2.3.2.1 Legacy Decode
Pipeline: Macro-Fusion).

This patch introduces functionality used by BOLT when emitting code with
MacroFusion alignment already in place.

The use case is different from BoundaryAlign and instruction bundling:
- BoundaryAlign can be extended to perform the desired alignment for the
first instruction in the macro-op fusion pair (D101817). However, this
approach has higher overhead due to reliance on relaxation as
BoundaryAlign requires in the general case - see
https://reviews.llvm.org/D97982#2710638.
- Instruction bundling: the intent of NeverAlign fragment is to prevent
the first instruction in a pair ending at a given alignment boundary, by
inserting at most one minimum size nop. It's OK if either instruction
crosses the cache line. Padding both instructions using bundles to not
cross the alignment boundary would result in excessive padding. There's
no straightforward way to request instruction bundling to avoid a given
end alignment for the first instruction in the bundle.

LLVM: https://reviews.llvm.org/D97982

Manual rebase conflict history:
https://phabricator.intern.facebook.com/D30142613

Test Plan: sandcastle

Reviewers: #llvm-bolt

Subscribers: phabricatorlinter

Differential Revision: https://phabricator.intern.facebook.com/D31361547
2022-07-11 09:31:52 -07:00
Simon Pilgrim 97868fb972 [X86] isTargetShuffleEquivalent - attempt to match SM_SentinelZero shuffle mask elements using known bits
If the combined shuffle mask requires zero elements, we don't currently have much chance of matching them against the expected source vector. This patch uses the SelectionDAG::MaskedVectorIsZero wrapper to attempt to determine if the expected lement we want to use is already known to be zero.

I've also tightened up the ExpectedMask assertion to always be in range - we're never giving it a target shuffle mask that has sentinels at all - allowing to remove some of the confusing bounds checks.

This attempts to address some of the regressions uncovered by D129150 where we more aggressively fold shuffles as AND / 'clear' masks which results in more combined shuffles using SM_SentinelZero.

Differential Revision: https://reviews.llvm.org/D129207
2022-07-11 15:29:44 +01:00
John Brawn ddd9485129 [MVE] Don't distribute add of vecreduce if it has more than one use
If the add has more than one use then applying the transformation
won't cause it to be removed, so we can end up applying it again
causing an infinite loop.

Differential Revision: https://reviews.llvm.org/D129361
2022-07-11 14:13:29 +01:00
David Sherwood 03fee6712a [LoopVectorize] Add option to use active lane mask for loop control flow
Currently, for vectorised loops that use the get.active.lane.mask
intrinsic we only use the mask for predicated vector operations,
such as masked loads and stores, etc. The loop itself is still
controlled by comparing the canonical induction variable with the
trip count. However, for some targets this is inefficient when it's
cheap to use the mask itself to control the loop.

This patch adds support for using the active lane mask for control
flow by:

1. Generating the active lane mask for the next iteration of the
vector loop, rather than the current one. If there are still any
remaining iterations then at least the first bit of the mask will
be set.
2. Extract the first bit of this mask and use this bit for the
conditional branch.

I did this by creating a new VPActiveLaneMaskPHIRecipe that sets
up the initial PHI values in the vector loop pre-header. I've also
made use of the new BranchOnCond VPInstruction for the final
instruction in the loop region.

Differential Revision: https://reviews.llvm.org/D125301
2022-07-11 13:46:55 +01:00
David Green 0a11ad2aa8 [ARM] Expand MVE i1 fptoint and inttofp if mve.fp is not present.
If MVE.fp is not present then we cannot select the vector i1 fp
operations to VCMP instructions, so need to expand.
2022-07-11 13:03:30 +01:00
David Green 46fc4de065 [AArch64] Guard FP16 fptosi_sat patterns with HasFullFP16. NFC
We shouldn't get this far as the operations are already legalized, but
the patterns should be guarded with hasFullFP16.
2022-07-11 08:35:40 +01:00
LiaoChunyu 3f68f0f816 [RISCV] Optimize 2x SELECT for floating-point types
Including the following opcode:
 Select_FPR16_Using_CC_GPR
 Select_FPR32_Using_CC_GPR
 Select_FPR64_Using_CC_GPR

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D127871
2022-07-11 14:10:27 +08:00