Commit Graph

22732 Commits

Author SHA1 Message Date
Kazu Hirata 3f3930a451 Remove redundaunt virtual specifiers (NFC)
Identified with tidy-modernize-use-override.
2022-07-25 23:00:59 -07:00
Luo, Yuanke 5fb4134210 [X86][DAGISel] Don't widen shuffle element with AVX512
Currently the X86 shuffle lowering would widen the element type for
shuffle if the mask element value is adjacent. For below example

  %t2 = add nsw <16 x i32> %t0, %t1
  %t3 = sub nsw <16 x i32> %t0, %t1
  %t4 = shufflevector <16 x i32> %t2, <16 x i32> %t3,
                      <16 x i32> <i32 16, i32 17, i32 2, i32 3, i32 4,
                       i32 5, i32 6, i32 7, i32 8, i32 9, i32 10,
                       i32 11, i32 12, i32 13, i32 14, i32 15>

  ret <16 x i32> %t4

Compiler would transform the shuffle to
  %t4 = shufflevector <8 x i64> %t2, <8 x i64> %t3,
                      <8 x i64> <i32 8, i32 1, i32 2, i32 3, i32 4,
                                 i32 5, i32 6, i32 7>
This may lose the oppotunity to let ISel select mask instruction when
avx512 is enabled.

This patch is to prevent the tranform when avx512 feature is enabled.
Thank Simon for the idea.

Differential Revision: https://reviews.llvm.org/D129537
2022-07-26 11:56:03 +08:00
Craig Topper 00060a7b97 [X86] Custom type legalize v2i32 smulo/umulo to use a single pmuldq/pmuludq.
With SSE4.1 and above we were using 3 multiply instructions. This
was due to type legalization widening to v4i32 and the low half
being done with pmulld while the high half used two pmuldq/pmuludq.

Instead of that, we can use a single pmuludq/pmuldq to calculate
the full product at once, extract the high and low bits and compare
to check for overflow.

I've restricted SMULO to sse4.1 to get pmuldq. We can probably
do a fixup to pmuludq on earlier targets, but that's for another day.

I was going through my git stash and found an early version of this patch
from a year or two ago so I went ahead and finished it.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D130432
2022-07-25 09:12:35 -07:00
Kazu Hirata b5188591a0 [llvm] Remove redundaunt virtual specifiers (NFC)
Identified with modernize-use-override.
2022-07-24 21:50:35 -07:00
Simon Pilgrim 0708771cce [DAG] MaskedVectorIsZero - don't bother with (-1).isSubsetOf mask check. NFC.
Just use KnownBits::isZero() to ensure all the bits are known zero.
2022-07-24 13:12:21 +01:00
Simon Pilgrim 69d1e805ce [X86] combineAndnp - remove unused variable. NFC. 2022-07-24 11:32:44 +01:00
Simon Pilgrim ce81a0df67 [X86][SSE] Enable X86ISD::ANDNP constant folding 2022-07-24 11:07:34 +01:00
Simon Pilgrim 293899c64b [X86] Don't assume an AND/ANDNP element is undef/undemanded just because one element is undef
For mask ops like these, the other operand's corresponding element might be zero (result = zero) - so we must demand all the bits and that element.

This appears to be what D128570 was trying to fix - both sides of the funnel shift mask of the vXi64 (legalized to v2Xi32) were incorrectly simplifying the upper 32-bit halves to undef, resulting in bad folds later on.

I intend to address the test case regressions, but this close to the release branch I'd prefer to get a fix in first.
2022-07-24 10:53:38 +01:00
Simon Pilgrim 676a03d8a5 [X86] matchBinaryShuffle - limit SHUFFLE(X,Y) -> OR(X,Y) cases to where X + Y are the same width as the result
Minor bit of prep work toward not unnecessarily widening shuffle operands in combineX86ShufflesRecursively, instead only widening in combineX86ShuffleChain if we actual find a match - see Issue #45319
2022-07-23 16:56:45 +01:00
Arnold Schwaighofer 58e6ee0e1f llvm.swift.async.context.addr cannot be modeled as NoMem because we don't want it to be cse'd accross async suspends
An async suspend models the split between two partial async functions.
`llvm.swift.async.context.addr ` will have a different value in the two
partial functions so it is not correct to generally CSE the instruction.

rdar://97336162

Differential Revision: https://reviews.llvm.org/D130201
2022-07-22 11:50:58 -07:00
Phoebe Wang 02fe96b240 [X86][FP16] Do not split FP64->FP16 to FP64->FP32->FP16
Truncation from double to half is not always identical to truncating to float first and then to half. https://godbolt.org/z/56s9517hd

On the other hand, expanding to float and then to double is always identical to expanding to double directly. https://godbolt.org/z/Ye8vbYPnY

Reviewed By: RKSimon, skan

Differential Revision: https://reviews.llvm.org/D130151
2022-07-22 08:36:05 +08:00
Haohai Wen d946fb8d95 [X86] Make sure load size is not larger than stack slot
Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D130084
2022-07-20 12:17:44 +08:00
Sanjay Patel f0dd12ec5c [x86] use zero-extending load of a byte outside of loops too (2nd try)
The first attempt missed changing test files for tools
(update_llc_test_checks.py).

Original commit message:

This implements the main suggested change from issue #56498.
Using the shorter (non-extending) instruction with only
-Oz ("minsize") rather than -Os ("optsize") is left as a
possible follow-up.

As noted in the bug report, the zero-extending load may have
shorter latency/better throughput across a wide range of x86
micro-arches, and it avoids a potential false dependency.
The cost is an extra instruction byte.

This could cause perf ups and downs from secondary effects,
but I don't think it is possible to account for those in
advance, and that will likely also depend on exact micro-arch.
This does bring LLVM x86 codegen more in line with existing
gcc codegen, so if problems are exposed they are more likely
to occur for both compilers.

Differential Revision: https://reviews.llvm.org/D129775
2022-07-19 21:27:08 -04:00
Sanjay Patel 95401b0153 Revert "[x86] use zero-extending load of a byte outside of loops too"
This reverts commit 9d1ea1774c.
There are tests of update_llc_tests_checks.py that missed being updated.
2022-07-19 17:37:22 -04:00
Sanjay Patel 9d1ea1774c [x86] use zero-extending load of a byte outside of loops too
This implements the main suggested change from issue #56498.
Using the shorter (non-extending) instruction with only
-Oz ("minsize") rather than -Os ("optsize") is left as a
possible follow-up.

As noted in the bug report, the zero-extending load may have
shorter latency/better throughput across a wide range of x86
micro-arches, and it avoids a potential false dependency.
The cost is an extra instruction byte.

This could cause perf ups and downs from secondary effects,
but I don't think it is possible to account for those in
advance, and that will likely also depend on exact micro-arch.
This does bring LLVM x86 codegen more in line with existing
gcc codegen, so if problems are exposed they are more likely
to occur for both compilers.

Differential Revision: https://reviews.llvm.org/D129775
2022-07-19 16:43:47 -04:00
Bing1 Yu e01bf5a3e2 [X86] Promote v32f16's fadd into v32f32's fadd when it is avx512 without avx512fp16
Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D130059
2022-07-19 14:37:50 +08:00
Matt Arsenault 8d0383eb69 CodeGen: Remove AliasAnalysis from regalloc
This was stored in LiveIntervals, but not actually used for anything
related to LiveIntervals. It was only used in one check for if a load
instruction is rematerializable. I also don't think this was entirely
correct, since it was implicitly assuming constant loads are also
dereferenceable.

Remove this and rely only on the invariant+dereferenceable flags in
the memory operand. Set the flag based on the AA query upfront. This
should have the same net benefit, but has the possible disadvantage of
making this AA query nonlazy.

Preserve the behavior of assuming pointsToConstantMemory implying
dereferenceable for now, but maybe this should be changed.
2022-07-18 17:23:41 -04:00
Benjamin Kramer 9234a7c0df [X86][FP16] Don't crash when lowering SELECT on fp16 vectors
This is a regression from f187948162
2022-07-18 13:41:00 +02:00
Phoebe Wang f187948162 [X86][FP16] Enable vector support for FP16 emulation
This is follow up of D107082, which enable vector support according to psABI.

Reviewed By: skan

Differential Revision: https://reviews.llvm.org/D127982
2022-07-16 09:38:58 +08:00
Phoebe Wang 190518da4b [X86] Use generic tuning for "x86-64" if "tune-cpu" is not specified
This is an alternative to D129154. See discussions on https://discourse.llvm.org/t/fast-scalar-fsqrt-tuning-in-x86/63605

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D129647
2022-07-15 10:05:08 +08:00
David Green 3e0bf1c7a9 [CodeGen] Move instruction predicate verification to emitInstruction
D25618 added a method to verify the instruction predicates for an
emitted instruction, through verifyInstructionPredicates added into
<Target>MCCodeEmitter::encodeInstruction. This is a very useful idea,
but the implementation inside MCCodeEmitter made it only fire for object
files, not assembly which most of the llvm test suite uses.

This patch moves the code into the <Target>_MC::verifyInstructionPredicates
method, inside the InstrInfo.  The allows it to be called from other
places, such as in this patch where it is called from the
<Target>AsmPrinter::emitInstruction methods which should trigger for
both assembly and object files. It can also be called from other places
such as verifyInstruction, but that is not done here (it tends to catch
errors earlier, but in reality just shows all the mir tests that have
incorrect feature predicates). The interface was also simplified
slightly, moving computeAvailableFeatures into the function so that it
does not need to be called externally.

The ARM, AMDGPU (but not R600), AVR, Mips and X86 backends all currently
show errors in the test-suite, so have been disabled with FIXME
comments.

Recommitted with some fixes for the leftover MCII variables in release
builds.

Differential Revision: https://reviews.llvm.org/D129506
2022-07-14 09:33:28 +01:00
David Green 95252133e1 Revert "Move instruction predicate verification to emitInstruction"
This reverts commit e2fb8c0f4b as it does
not build for Release builds, and some buildbots are giving more warning
than I saw locally. Reverting to fix those issues.
2022-07-13 13:28:11 +01:00
David Green e2fb8c0f4b Move instruction predicate verification to emitInstruction
D25618 added a method to verify the instruction predicates for an
emitted instruction, through verifyInstructionPredicates added into
<Target>MCCodeEmitter::encodeInstruction. This is a very useful idea,
but the implementation inside MCCodeEmitter made it only fire for object
files, not assembly which most of the llvm test suite uses.

This patch moves the code into the <Target>_MC::verifyInstructionPredicates
method, inside the InstrInfo.  The allows it to be called from other
places, such as in this patch where it is called from the
<Target>AsmPrinter::emitInstruction methods which should trigger for
both assembly and object files. It can also be called from other places
such as verifyInstruction, but that is not done here (it tends to catch
errors earlier, but in reality just shows all the mir tests that have
incorrect feature predicates). The interface was also simplified
slightly, moving computeAvailableFeatures into the function so that it
does not need to be called externally.

The ARM, AMDGPU (but not R600), AVR, Mips and X86 backends all currently
show errors in the test-suite, so have been disabled with FIXME
comments.

Differential Revision: https://reviews.llvm.org/D129506
2022-07-13 12:53:32 +01:00
Simon Pilgrim 66bfd1ba8c [X86] Move isInRange(ArrayRef<int>) inside assert to fix NDEBUG builds. NFC.
Fix unused static function warning introduced by D129207
2022-07-12 21:51:07 +01:00
Nick Desaulniers 2240d72f15 [X86] initial -mfunction-return=thunk-extern support
Adds support for:
* `-mfunction-return=<value>` command line flag, and
* `__attribute__((function_return("<value>")))` function attribute

Where the supported <value>s are:
* keep (disable)
* thunk-extern (enable)

thunk-extern enables clang to change ret instructions into jmps to an
external symbol named __x86_return_thunk, implemented as a new
MachineFunctionPass named "x86-return-thunks", keyed off the new IR
attribute fn_ret_thunk_extern.

The symbol __x86_return_thunk is expected to be provided by the runtime
the compiled code is linked against and is not defined by the compiler.
Enabling this option alone doesn't provide mitigations without
corresponding definitions of __x86_return_thunk!

This new MachineFunctionPass is very similar to "x86-lvi-ret".

The <value>s "thunk" and "thunk-inline" are currently unsupported. It's
not clear yet that they are necessary: whether the thunk pattern they
would emit is beneficial or used anywhere.

Should the <value>s "thunk" and "thunk-inline" become necessary,
x86-return-thunks could probably be merged into x86-retpoline-thunks
which has pre-existing machinery for emitting thunks (which could be
used to implement the <value> "thunk").

Has been found to build+boot with corresponding Linux
kernel patches. This helps the Linux kernel mitigate RETBLEED.
* CVE-2022-23816
* CVE-2022-28693
* CVE-2022-29901

See also:
* "RETBLEED: Arbitrary Speculative Code Execution with Return
Instructions."
* AMD SECURITY NOTICE AMD-SN-1037: AMD CPU Branch Type Confusion
* TECHNICAL GUIDANCE FOR MITIGATING BRANCH TYPE CONFUSION REVISION 1.0
  2022-07-12
* Return Stack Buffer Underflow / Return Stack Buffer Underflow /
  CVE-2022-29901, CVE-2022-28693 / INTEL-SA-00702

SystemZ may eventually want to support "thunk-extern" and "thunk"; both
options are used by the Linux kernel's CONFIG_EXPOLINE.

This functionality has been available in GCC since the 8.1 release, and
was backported to the 7.3 release.

Many thanks for folks that provided discrete review off list due to the
embargoed nature of this hardware vulnerability. Many Bothans died to
bring us this information.

Link: https://www.youtube.com/watch?v=IF6HbCKQHK8
Link: https://github.com/llvm/llvm-project/issues/54404
Link: https://gcc.gnu.org/legacy-ml/gcc-patches/2018-01/msg01197.html
Link: https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/advisory-guidance/return-stack-buffer-underflow.html
Link: https://arstechnica.com/information-technology/2022/07/intel-and-amd-cpus-vulnerable-to-a-new-speculative-execution-attack/?comments=1
Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ce114c866860aa9eae3f50974efc68241186ba60
Link: https://www.intel.com/content/www/us/en/security-center/advisory/intel-sa-00702.html
Link: https://www.intel.com/content/www/us/en/security-center/advisory/intel-sa-00707.html

Reviewed By: aaron.ballman, craig.topper

Differential Revision: https://reviews.llvm.org/D129572
2022-07-12 09:17:54 -07:00
Xiang1 Zhang a45dd3d814 [X86] Support -mstack-protector-guard-symbol
Reviewed By: nickdesaulniers

Differential Revision: https://reviews.llvm.org/D129346
2022-07-12 10:17:00 +08:00
Xiang1 Zhang 643786213b Revert "[X86] Support -mstack-protector-guard-symbol"
This reverts commit efbaad1c4a.
due to miss adding review info.
2022-07-12 10:14:32 +08:00
Xiang1 Zhang efbaad1c4a [X86] Support -mstack-protector-guard-symbol 2022-07-12 10:13:48 +08:00
spupyrev eecd41aa09 Revert "Rebase: [Facebook] [MC] Introduce NeverAlign fragment type"
This reverts commit 6d0528636a.
2022-07-11 09:50:47 -07:00
Rafael Auler 6d0528636a Rebase: [Facebook] [MC] Introduce NeverAlign fragment type
Summary:
Introduce NeverAlign fragment type.

The intended usage of this fragment is to insert it before a pair of
macro-op fusion eligible instructions. NeverAlign fragment ensures that
the next fragment (first instruction in the pair) does not end at a
given alignment boundary by emitting a minimal size nop if necessary.

In effect, it ensures that a pair of macro-fusible instructions is not
split by a given alignment boundary, which is a precondition for
macro-op fusion in modern Intel Cores (64B = cache line size, see Intel
Architecture Optimization Reference Manual, 2.3.2.1 Legacy Decode
Pipeline: Macro-Fusion).

This patch introduces functionality used by BOLT when emitting code with
MacroFusion alignment already in place.

The use case is different from BoundaryAlign and instruction bundling:
- BoundaryAlign can be extended to perform the desired alignment for the
first instruction in the macro-op fusion pair (D101817). However, this
approach has higher overhead due to reliance on relaxation as
BoundaryAlign requires in the general case - see
https://reviews.llvm.org/D97982#2710638.
- Instruction bundling: the intent of NeverAlign fragment is to prevent
the first instruction in a pair ending at a given alignment boundary, by
inserting at most one minimum size nop. It's OK if either instruction
crosses the cache line. Padding both instructions using bundles to not
cross the alignment boundary would result in excessive padding. There's
no straightforward way to request instruction bundling to avoid a given
end alignment for the first instruction in the bundle.

LLVM: https://reviews.llvm.org/D97982

Manual rebase conflict history:
https://phabricator.intern.facebook.com/D30142613

Test Plan: sandcastle

Reviewers: #llvm-bolt

Subscribers: phabricatorlinter

Differential Revision: https://phabricator.intern.facebook.com/D31361547
2022-07-11 09:31:52 -07:00
Simon Pilgrim 97868fb972 [X86] isTargetShuffleEquivalent - attempt to match SM_SentinelZero shuffle mask elements using known bits
If the combined shuffle mask requires zero elements, we don't currently have much chance of matching them against the expected source vector. This patch uses the SelectionDAG::MaskedVectorIsZero wrapper to attempt to determine if the expected lement we want to use is already known to be zero.

I've also tightened up the ExpectedMask assertion to always be in range - we're never giving it a target shuffle mask that has sentinels at all - allowing to remove some of the confusing bounds checks.

This attempts to address some of the regressions uncovered by D129150 where we more aggressively fold shuffles as AND / 'clear' masks which results in more combined shuffles using SM_SentinelZero.

Differential Revision: https://reviews.llvm.org/D129207
2022-07-11 15:29:44 +01:00
Nicolai Hähnle ede600377c ManagedStatic: remove many straightforward uses in llvm
(Reapply after revert in e9ce1a5880 due to
Fuchsia test failures. Removed changes in lib/ExecutionEngine/ other
than error categories, to be checked in more detail and reapplied
separately.)

Bulk remove many of the more trivial uses of ManagedStatic in the llvm
directory, either by defining a new getter function or, in many cases,
moving the static variable directly into the only function that uses it.

Differential Revision: https://reviews.llvm.org/D129120
2022-07-10 10:29:15 +02:00
Nicolai Hähnle e9ce1a5880 Revert "ManagedStatic: remove many straightforward uses in llvm"
This reverts commit e6f1f06245.

Reverting due to a failure on the fuchsia-x86_64-linux buildbot.
2022-07-10 09:54:30 +02:00
Nicolai Hähnle e6f1f06245 ManagedStatic: remove many straightforward uses in llvm
Bulk remove many of the more trivial uses of ManagedStatic in the llvm
directory, either by defining a new getter function or, in many cases,
moving the static variable directly into the only function that uses it.

Differential Revision: https://reviews.llvm.org/D129120
2022-07-10 09:15:08 +02:00
Phoebe Wang 8fb083d33e [X86][FP16] Add constrained FP support for scalar emulation
This is a follow up patch to support constrained FP in FP16 emulation.

Reviewed By: skan

Differential Revision: https://reviews.llvm.org/D128114
2022-07-08 20:33:42 +08:00
Sanjay Patel 8b75671314 [SDAG] try to replace subtract-from-constant with xor
This is almost the same as the abandoned D48529, but it
allows splat vector constants too.

This replaces the x86-specific code that was added with
the alternate patch D48557 with the original generic
combine.

This transform is a less restricted form of an existing
InstCombine and the proposed SDAG equivalent for that
in D128080:
https://alive2.llvm.org/ce/z/OUm6N_

Differential Revision: https://reviews.llvm.org/D128123
2022-07-08 08:14:24 -04:00
Haohai Wen 18a1085e02 [X86] Fix collectLeaves for adds used by phi that forms loop
When add has additional users, we should indentify whether add's
user is phi that forms loop rather than root's.

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D129169
2022-07-08 10:39:02 +08:00
Phoebe Wang 6c535f9f1b [X86][FP16] Fix crash when lowering copysign for f16
This is to address the assertion fail reported in https://reviews.llvm.org/D107082#3635612
Not sure if it is a problem of promoting FCOPYSIGN + libcall FP_ROUND.
The promoting will set the rounding mode to 1 a442c62888/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp (L4810-L4814)
While libcall cannot handle the rounding mode equals to 1 a442c62888/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp (L4324-L4328)
So changing the action to Expand to workaround the problem.

Reviewed By: clementval, MaskRay

Differential Revision: https://reviews.llvm.org/D129294
2022-07-07 19:17:26 -07:00
Nicolai Hähnle 64a78c8501 Remove unnecessary includes of ManagedStatic.h
Differential Revision: https://reviews.llvm.org/D129115
2022-07-07 14:29:20 +02:00
Tim Northover 8d9dc83f35 X86: add newline to end of FMA instruction comments.
The newline is used by Disassembler.cpp (`emitComments`) to work out how to
format them properly, and if there's no newline it goes into an infinite loop.

Unfortunately I couldn't get llvm-objdump to be affected, only the MacOS otool
utility which dlopens libLTO.
2022-07-07 12:35:28 +01:00
Simon Pilgrim fbb51ac0ba [X86] LowerShift - lower some shuffles directly to X86ISD::PSHUFLW nodes.
These are expected to lower to X86ISD::PSHUFLW but we were seeing some regressions in D129150 because it'd managed to exploit the masking of the shift amounts to create unintended clear masks instead.
2022-07-06 18:01:03 +01:00
Shilei Tian 1023ddaf77 [LLVM] Add the support for fmax and fmin in atomicrmw instruction
This patch adds the support for `fmax` and `fmin` operations in `atomicrmw`
instruction. For now (at least in this patch), the instruction will be expanded
to CAS loop. There are already a couple of targets supporting the feature. I'll
create another patch(es) to enable them accordingly.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D127041
2022-07-06 10:57:53 -04:00
Paul Robinson 08e4fe6c61 [X86] Add RDPRU instruction
Add support for the RDPRU instruction on Zen2 processors.

User-facing features:

- Clang option -m[no-]rdpru to enable/disable the feature
- Support is implicit for znver2/znver3 processors
- Preprocessor symbol __RDPRU__ to indicate support
- Header rdpruintrin.h to define intrinsics
- "rdpru" mnemonic supported for assembler code

Internal features:

- Clang builtin __builtin_ia32_rdpru
- IR intrinsic @llvm.x86.rdpru

Differential Revision: https://reviews.llvm.org/D128934
2022-07-06 07:17:47 -07:00
Craig Topper 2bfca35614 [X86] Disable combineVectorSizedSetCCEquality for soft float.
The vector types aren't legal with soft float.
Also disable under NoImplicitFloat for good measure.

Fixes PR56351.

Differential Revision: https://reviews.llvm.org/D129060
2022-07-04 08:33:30 -07:00
Simon Pilgrim 26708fa166 Revert rG057db2002bb3: [X86] combineAndnp - constant fold ANDNP(C,X) -> AND(~C,X)
If the LHS op has a single use then using the more general AND op is likely to allow commutation, load folding, generic folds etc.

Reverted due to reports from @alexfh about it causing an infinite loop (repro still pending).
2022-07-01 10:36:09 +01:00
Simon Pilgrim e961e05d59 [SLP][X86] Add 32-bit vector stores to help vectorization opportunities
Building on the work on D124284, this patch tags v4i8 and v2i16 vector loads as custom, enabling SLP to try to vectorize these types ending in a partial store (using the SSE MOVD instruction) - we already do something similar for 64-bit vector types.

Differential Revision: https://reviews.llvm.org/D127604
2022-06-30 20:25:50 +01:00
Amir Ayupov cb75faf40c [X86][BOLT] Use getOperandType to determine memory access size
Generate INSTRINFO_OPERAND_TYPE table in X86GenInstrInfo.inc.

This diff adds support for instructions that were previously reported as having
memory access size 0. It replaces the heuristic of looking at instruction
register width to determine memory access width by instead checking the memory
operand type using tablegen-provided tables.

Reviewed By: skan

Differential Revision: https://reviews.llvm.org/D126116
2022-06-30 00:25:32 -07:00
Luo, Yuanke 5cb0979870 [X86][AMX] Split greedy RA for tile register
When we fill the shape to tile configure memory, the shape is gotten
from AMX pseudo instruction. However the register for the shape may be
split or spilled by greedy RA. That cause we fill the shape to config
memory after ldtilecfg is executed, so that the shape configuration
would be wrong.
This patch is to split the tile register allocation from greedy register
allocation, so that after tile registers are allocated the shape
registers are still virtual register. The shape register only may be
redefined or multi-defined by phi elimination pass, two address pass.
That doesn't affect tile register configuration.

Differential Revision: https://reviews.llvm.org/D128584
2022-06-29 10:35:43 +08:00
Craig Topper 3706bdad4a [X86] Remove unnecessary COPY from EmitLoweredCascadedSelect.
I believe we already checked that the destination of the first
CMOV is only used by the second CMOV so I don't think there is any
reason we need the PHI to write the register that was used by the
first CMOV. We can directly use the second CMOV destination and
avoid the copy.

This may be a left over from when the cascaded select handling
was part of the main algorithm before it was refactored in D35685.

Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D128124
2022-06-28 09:33:33 -07:00
Simon Pilgrim 0b998053db [X86] combineConcatVectorOps - IsConcatFree must check extraction index
Identified in the regression reported by @alexfh on rGb5d7beeb9792 - IsConcatFree wasn't ensuring the subvector extraction index matched the position it would be concatenated back into.
2022-06-27 11:46:49 +01:00