Commit Graph

22902 Commits

Author SHA1 Message Date
Simon Pilgrim 83552e8c72 [CostModel][X86] Add CostKinds handling for SSE FCMP_ONE/FCMP_UEQ predicates
These require special handling to account for their expansion in lowering.

I'm trying very hard not to have to add predicate specific costs - but it might be inevitable.....
2022-09-06 12:05:22 +01:00
Simon Pilgrim c1b5e36d74 [CostModel][X86] Add CostKinds handling for fcmp ops
This was achieved with an updated version of the 'cost-tables vs llvm-mca' script D103695 (although it still struggles with avx512 predicate numbers which had to be done manually)

SSE numbers are still too low for FCMP_ONE/FCMP_UEQ cases which expand to a more complex sequence than the existing 'ExtraCost' system can manage.
2022-09-06 10:34:53 +01:00
Freddy Ye d5fa8b1c2c [X86] Support SAE for VCVTPS2PH from intrinsic.
For now, clang and gcc both failed to generate sae version from _mm512_cvt_roundps_ph:
https://godbolt.org/z/oh7eTGY5z. Intrinsic guide description is also wrong, which will be
update soon.

Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D132641
2022-09-06 11:28:12 +08:00
Simon Pilgrim bd0801cddf [X86] Cleanup SLM SSE shift and CMPGTQ scheduler model numbers
These were causing weird mismatches for the D103695 script report as I'm trying to enable cost kinds support for vector shift and integer comparisons.

The SSE shifts by (non-constant) scalar are half-rate but still only 1uop and PCMPGT is half-rate and only on Pipe0 (although not as slow as PCMPEQQ which we already handle).
2022-09-05 13:44:05 +01:00
Simon Pilgrim 8534f51474 [CostModel][X86] Add CostKinds handling for sqrt intrinsicc
This was achieved using the 'cost-tables vs llvm-mca' script from D103695

Some of the znver1/znver2 latency/throughput numbers were really weird (some copy+paste afaict) - I've used the numbers from the AMD SoG, which roughly match the 'worst case' range value from Agner
2022-09-04 18:39:21 +01:00
Simon Pilgrim 626a84db47 [CostModel][X86] getTypeBasedIntrinsicInstrCost - convert to CostKindTblEntry
Begin the refactoring to use CostKindTblEntry and return real latency/codesize/sizelatency costs instead of reusing the throughput numbers
2022-09-04 17:59:08 +01:00
Simon Pilgrim 80d4b3a275 Revert rG06e73626cf0fc33b025a0f98f1eee4a302279982 "[CostModel][X86] getTypeBasedIntrinsicInstrCost - convert to CostKindTblEntry"
Some arm buildbots are complaining about a phase ordering test failure in unsigned-multiply-overflow-check.ll - I guess this test needs making x86 specific first
2022-09-04 17:51:11 +01:00
Simon Pilgrim 06e73626cf [CostModel][X86] getTypeBasedIntrinsicInstrCost - convert to CostKindTblEntry
Begin the refactoring to use CostKindTblEntry and return real latency/codesize/sizelatency costs instead of reusing the throughput numbers
2022-09-04 17:28:45 +01:00
Simon Pilgrim 59dbd6a0cf [CostModel][X86] Remove redundant AVX512 v64i8 shift costs
These are handled earlier (and more accurately) in AVX512BWShiftCostTable
2022-09-04 14:06:26 +01:00
Simon Pilgrim c444af1c20 [CostModel][X86] Add CostKinds handling for mul ops
This was achieved using the 'cost-tables vs llvm-mca' script D103695

Also fix a missing pmullw v16i16 half-rate throughput as znver1 double-pumps - matches numbers from AMD SoG + Agner
2022-09-04 11:59:05 +01:00
Kazu Hirata 9eca5ed790 [llvm] Use std::enable_if_t (NFC) 2022-09-03 11:17:44 -07:00
Simon Pilgrim 444685de06 [CostModel][X86] Adjust mul v4i32/v8i32 throughput cost
Based off the numbers from AMD SoG + Agner - vXi32 are both half-rate, and znver1 double-pumps the v8i32 op

We should have caught this earlier as many Intel models have half-rate pmulld already :-(
2022-09-03 18:45:08 +01:00
Simon Pilgrim 114b7762a9 [CostModel][X86] Add CostKinds handling for add/sub ops
This was achieved using the 'cost-tables vs llvm-mca' script D103695
2022-09-03 18:45:08 +01:00
Simon Pilgrim 5aee2726d8 [CostModel][X86] Add CostKinds handling for fdiv ops
This was achieved with an updated version of the 'cost-tables vs llvm-mca' script D103695

As we're using 'typical' worst case values, not all cost entries come from a single CPU - e.g. the latency/throughput from haswell but the size-latency(uops) from zen1/alderlake-e due to 'double pumping'

As the uop count (used for TCK_SizeAndLatency) for divss/divps is typically so low, we need to override isExpensiveToSpeculativelyExecute to ensure we keep fdiv calls behind branches - although for some very recent cpu targets it might not be necessary any more and could be relaxed.
2022-09-03 15:48:39 +01:00
Simon Pilgrim bddbd408b7 [X86] Fix fdiv throughput/latency/uops counts
Matches znver1/2 numbers from AMD SoG + Agner - no additional uops for folded instructions and znver1 double pumps 256-bit vectors

Matches skylake/icelake throughput numbers from Intel AoM + Agner/instlatx64

Noticed while adding fdiv CostKinds support
2022-09-03 15:23:46 +01:00
Simon Pilgrim 1c12e12111 [CostModel][X86] Add fdiv(double) throughput x87 costs for 2022-09-03 14:08:25 +01:00
Simon Pilgrim bd956b7db3 [X86] Fix fmul throughput/latency/uops counts
Matches numbers from AMD SoG + Agner - should always be on FPU Pipes 0+1, no additional uops for folded instructions and znver1 double pumps 256-bit vectors and is always latency = 4cy for f64 multiplies

Noticed while adding fmul CostKinds support to the x86 cost models in rG0735200e3f50 and znver1 wasn't being flagged as requiring 2uop for 256-bit vectors
2022-09-03 11:10:51 +01:00
Simon Pilgrim 0735200e3f [CostModel][X86] Add CostKinds handling for fmul ops
This was achieved with an updated version of the 'cost-tables vs llvm-mca' script D103695

As we're using 'typical' worst case values, not all cost entries come from a single CPU - e.g. the latency/throughput from haswell but the size-latency(uops) from zen1/alderlake-e due to 'double pumping'
2022-09-03 10:42:20 +01:00
Simon Pilgrim 82090cb85e [CostModel][X86] Remove unused float x87 costs
We only need the double costs for SSE1 fallback
2022-09-03 09:59:20 +01:00
Simon Pilgrim 116d8f8cf0 Revert rG11765b77be84d793ebedc5b5436c463490746131 "[CostModel][X86] Add CostKinds handling for fmul ops"
I need to address some x87 codegen changes before re-committing this.
2022-09-02 17:21:25 +01:00
Simon Pilgrim 11765b77be [CostModel][X86] Add CostKinds handling for fmul ops
This was achieved with an updated version of the 'cost-tables vs llvm-mca' script D103695

As we're using 'typical' worst case values, not all cost entries come from a single CPU - e.g. the latency/throughput from haswell but the size-latency(uops) from zen1/alderlake-e due to 'double pumping'
2022-09-02 16:57:23 +01:00
Simon Pilgrim 6941b1f6c1 [CostModel][X86] Add CostKinds to SSE42 fadd/fsub/fneg ops
These were missed in an earlier commit, the latency/codesize/size-latency numbers aren't different from the SSE2 values that it was falling through to, hence no test change, but it did mean we were wasting a lookup.
2022-09-02 16:32:44 +01:00
Simon Pilgrim ad16f3e413 [CostModel][X86] Add CostKinds handling for fadd/fsub/fneg ops
This was achieved with an updated version of the 'cost-tables vs llvm-mca' script D103695 which I'll update shortly

As we're using 'typical' worst case values, not all cost entries come from a single CPU - e.g. the latency/throughput from haswell but the size-latency(uops) from zen1/alderlake-e due to 'double pumping'
2022-09-02 11:50:01 +01:00
Antonio Frighetto f0c50447f6 [X86InstPrinter] Introduce markup tags emission
x86 assembly syntax emission now leverages markup tags, if enabled.

Differential Revision: https://reviews.llvm.org/D129869
2022-09-01 21:04:35 -07:00
Simon Pilgrim f8d4da7630 [X86] Fix reciprocal instruction throughput/uops counts
Matches numbers from AMD SoG + Agner - should always be on FPU Pipes 0+1, no additional uops for folded instructions and znver1 double pumps 256-bit vectors

Noticed while adding CostKinds support to the x86 cost models
2022-09-01 20:25:52 +01:00
Simon Pilgrim ad8e4dd2ad [CostModel][X86] Add and/or/xor general cost kinds support
Account for double-pumping on early AVX1/AVX2 targets
2022-08-31 17:26:05 +01:00
Simon Pilgrim 1209b9c2c2 [CostModel][X86] Replace CostKindCosts constructor with default values.
This improves static initialization of the cost tables and significantly speeds up MSVC compile time.
2022-08-31 10:44:44 +01:00
Simon Pilgrim 7830445086 [CostModel][X86] Account for add/sub 512-bit vector splitting costs on non-AVX512BW targets 2022-08-30 16:54:06 +01:00
Kazu Hirata 8feb60756c [llvm] Use range-based for loops (NFC) 2022-08-28 23:28:58 -07:00
Kazu Hirata 2833760c57 [Target] Qualify auto in range-based for loops (NFC) 2022-08-28 17:35:09 -07:00
Kazu Hirata c63f823875 [llvm] Use range-based for loops (NFC) 2022-08-28 17:35:04 -07:00
Kazu Hirata a33ef8f2b7 Use llvm::all_equal (NFC) 2022-08-27 09:53:10 -07:00
Simon Pilgrim 0d0dc4e6ab [CostModel][X86] Add CodeSize handling for and/or/xor ops
Eventually this will be part of the cost table lookup
2022-08-26 18:42:52 +01:00
Simon Pilgrim f9445ae75c [CostModel][X86] Add CodeSize handling for fneg ops
Eventually this will be part of the cost table lookup
2022-08-26 17:34:52 +01:00
Simon Pilgrim 7790518c1f [CostModel][X86] getArithmeticInstrCost - use the cost tables for all cost kinds
The tables currently only have TCK_RecipThroughput costs, but we should now be able to add individual entries without any further refactoring
2022-08-26 17:34:52 +01:00
Simon Pilgrim fef3eeef48 [CostModel][X86] Convert AVX2 SRA by uniform constant to cost table
When adding cost kind support it will be easier to maintain these if we're not calculating on the fly
2022-08-26 16:14:13 +01:00
Simon Pilgrim f3590b6440 [CostModel][X86] getArithmeticInstrCost - move SLM reduceVMULWidth cost handling into the generic MUL handling
This is still SLM specific atm, but converting this to more closely match the codegen from reduceVMULWidth should be straightforward
2022-08-26 16:14:12 +01:00
Simon Pilgrim 9c29b4a0ac [CostModel][X86] Convert AVX1/SSE41 SREM/SDIV by constants to cost tables
When adding cost kind support it will be easier to maintain these if we're not calculating on the fly
2022-08-26 16:14:12 +01:00
Simon Pilgrim 488861d0e3 [X86] Add SimplifyMultipleUseDemandedBitsForTargetNode X86ISD::ANDNP handling
See if we can remove X86ISD::ANDNP nodes by checking the state of the knownbits of the demanded elements.

This is equivalent to the generic ISD::AND handling, but flips the bits of the LHS operand to account for ANDNP.

Differential Revision: https://reviews.llvm.org/D128216
2022-08-26 14:28:35 +01:00
Simon Pilgrim 9f94240fe1 [CostModel][X86] getArithmeticInstrCost - use cost kind specific look up tables
Building on D132216, use CostKindTblEntry cost tables to simplify the transition to supporting cost kinds other than recip-throughput

Adding full cost kinds support is going to take a while, but by converting to CostKindTblEntry first it will make it easier to support the costs on a per-ISD basis.
2022-08-26 14:28:35 +01:00
Simon Pilgrim 1736f76948 [CostModel][X86] getTypeBasedIntrinsicInstrCost - adjustTableCost - split CostTblEntry into ISD/Cost pair. NFC
This will be necessary to allow us to reuse this for other cost kind types
2022-08-26 11:17:22 +01:00
Simon Pilgrim 3edec9ba60 [CostModel][X86] Support cost kind specific look up tables (REAPPLIED)
Most of our cost model tables have been created assuming cost kind == recip-throughput. But we're starting to see passes wanting to get accurate costs for the other kinds as well. Some of these can be determined procedurally (e.g. codesize by default could just be the split count after type legalization), but others are going to need to be handled in cost tables - this is especially true for x86 which has so many ISA combinations.

I've created a 'CostKindCosts' struct which can hold cost values for the 4 cost kinds, defaulting to -1U for unknown cost, this can be used with the existing CostTblEntryT/CostTableLookup template code. I've also added a [TargetCostKind] accessor to make it much easier to look up individual <Optional> costs.

This just changes the ISD::SELECT costs to check the effect (and also to check that the ISD::SETCC are correctly handled for default/None cost kinds) - the plan would be to slowly extend this and move the CostKindTblEntry type somewhere generic to allow other targets to use it once its matured.

I'm also going to resurrect D103695 so that it can help with latency/codesize/sizelatency coverage testing.

For sizelatency - IIRC the definition was vague to let it be target specific - I've tried to use typical uop counts so they're comparable to MicroOpBufferSize etc.

REAPPLIED: Added early out to prevent getCmpSelInstrCost being used for anything but generic integer/float scalar/vector types - getTypeLegalizationCost can't handle the "exotic" TypeID enums that some passes attempt to get a costs for (aggregates etc.).

Differential Revision: https://reviews.llvm.org/D132216
2022-08-25 16:49:17 +01:00
David Majnemer bd28bd59a3 [clang-cl] /kernel should toggle bit 30 in @feat.00
The linker is supposed to detect when an object with /kernel is linked
with another object which is not compiled with /kernel. The linker
detects this by checking bit 30 in @feat.00.
2022-08-25 14:17:26 +00:00
Benjamin Kramer ab85996e47 Revert "[CostModel][X86] Support cost kind specific look up tables"
This reverts commit 45846854a2.

This triggers an assertion failure during Clang selfhost

Unknown type!
UNREACHABLE executed at llvm/lib/CodeGen/ValueTypes.cpp:548!
*** SIGABRT received by PID 6107 (TID 6107) on cpu 218 from PID 6107; stack trace: ***
    @     0x556c8827c2d1         64  llvm::llvm_unreachable_internal()
    @     0x556c82a5542a         32  llvm::MVT::getVT()
    @     0x556c82a54a28         80  llvm::EVT::getEVT()
    @     0x556c7dda1526         80  llvm::TargetLoweringBase::getValueType()
    @     0x556c8174dd38        112  llvm::BasicTTIImplBase<>::getTypeLegalizationCost()
    @     0x556c81755e72        144  llvm::X86TTIImpl::getCmpSelInstrCost()
    @     0x556c8174cadf        512  llvm::TargetTransformInfoImplCRTPBase<>::getInstructionCost()
    @     0x556c84ab4dd2         32  llvm::TargetTransformInfo::getInstructionCost()
    @     0x556c82ead283       1968  llvm::sinkRegion()
2022-08-25 15:42:44 +02:00
Simon Pilgrim 2e5f16516a [CostModel][X86] Add CodeSize handling for fdiv ops
Eventually this will be part of the cost table lookup
2022-08-25 14:08:03 +01:00
Simon Pilgrim 45846854a2 [CostModel][X86] Support cost kind specific look up tables
Most of our cost model tables have been created assuming cost kind == recip-throughput. But we're starting to see passes wanting to get accurate costs for the other kinds as well. Some of these can be determined procedurally (e.g. codesize by default could just be the split count after type legalization), but others are going to need to be handled in cost tables - this is especially true for x86 which has so many ISA combinations.

I've created a 'CostKindCosts' struct which can hold cost values for the 4 cost kinds, defaulting to -1U for unknown cost, this can be used with the existing CostTblEntryT/CostTableLookup template code. I've also added a [TargetCostKind] accessor to make it much easier to look up individual <Optional> costs.

This just changes the ISD::SELECT costs to check the effect (and also to check that the ISD::SETCC are correctly handled for default/None cost kinds) - the plan would be to slowly extend this and move the CostKindTblEntry type somewhere generic to allow other targets to use it once its matured.

I'm also going to resurrect D103695 so that it can help with latency/codesize/sizelatency coverage testing.

For sizelatency - IIRC the definition was vague to let it be target specific - I've tried to use typical uop counts so they're comparable to MicroOpBufferSize etc.

Differential Revision: https://reviews.llvm.org/D132216
2022-08-25 12:23:36 +01:00
Sami Tolvanen cff5bef948 KCFI sanitizer
The KCFI sanitizer, enabled with `-fsanitize=kcfi`, implements a
forward-edge control flow integrity scheme for indirect calls. It
uses a !kcfi_type metadata node to attach a type identifier for each
function and injects verification code before indirect calls.

Unlike the current CFI schemes implemented in LLVM, KCFI does not
require LTO, does not alter function references to point to a jump
table, and never breaks function address equality. KCFI is intended
to be used in low-level code, such as operating system kernels,
where the existing schemes can cause undue complications because
of the aforementioned properties. However, unlike the existing
schemes, KCFI is limited to validating only function pointers and is
not compatible with executable-only memory.

KCFI does not provide runtime support, but always traps when a
type mismatch is encountered. Users of the scheme are expected
to handle the trap. With `-fsanitize=kcfi`, Clang emits a `kcfi`
operand bundle to indirect calls, and LLVM lowers this to a
known architecture-specific sequence of instructions for each
callsite to make runtime patching easier for users who require this
functionality.

A KCFI type identifier is a 32-bit constant produced by taking the
lower half of xxHash64 from a C++ mangled typename. If a program
contains indirect calls to assembly functions, they must be
manually annotated with the expected type identifiers to prevent
errors. To make this easier, Clang generates a weak SHN_ABS
`__kcfi_typeid_<function>` symbol for each address-taken function
declaration, which can be used to annotate functions in assembly
as long as at least one C translation unit linked into the program
takes the function address. For example on AArch64, we might have
the following code:

```
.c:
  int f(void);
  int (*p)(void) = f;
  p();

.s:
  .4byte __kcfi_typeid_f
  .global f
  f:
    ...
```

Note that X86 uses a different preamble format for compatibility
with Linux kernel tooling. See the comments in
`X86AsmPrinter::emitKCFITypeId` for details.

As users of KCFI may need to locate trap locations for binary
validation and error handling, LLVM can additionally emit the
locations of traps to a `.kcfi_traps` section.

Similarly to other sanitizers, KCFI checking can be disabled for a
function with a `no_sanitize("kcfi")` function attribute.

Relands 67504c9549 with a fix for
32-bit builds.

Reviewed By: nickdesaulniers, kees, joaomoreira, MaskRay

Differential Revision: https://reviews.llvm.org/D119296
2022-08-24 22:41:38 +00:00
Sami Tolvanen a79060e275 Revert "KCFI sanitizer"
This reverts commit 67504c9549 as using
PointerEmbeddedInt to store 32 bits breaks 32-bit arm builds.
2022-08-24 19:30:13 +00:00
Sami Tolvanen 67504c9549 KCFI sanitizer
The KCFI sanitizer, enabled with `-fsanitize=kcfi`, implements a
forward-edge control flow integrity scheme for indirect calls. It
uses a !kcfi_type metadata node to attach a type identifier for each
function and injects verification code before indirect calls.

Unlike the current CFI schemes implemented in LLVM, KCFI does not
require LTO, does not alter function references to point to a jump
table, and never breaks function address equality. KCFI is intended
to be used in low-level code, such as operating system kernels,
where the existing schemes can cause undue complications because
of the aforementioned properties. However, unlike the existing
schemes, KCFI is limited to validating only function pointers and is
not compatible with executable-only memory.

KCFI does not provide runtime support, but always traps when a
type mismatch is encountered. Users of the scheme are expected
to handle the trap. With `-fsanitize=kcfi`, Clang emits a `kcfi`
operand bundle to indirect calls, and LLVM lowers this to a
known architecture-specific sequence of instructions for each
callsite to make runtime patching easier for users who require this
functionality.

A KCFI type identifier is a 32-bit constant produced by taking the
lower half of xxHash64 from a C++ mangled typename. If a program
contains indirect calls to assembly functions, they must be
manually annotated with the expected type identifiers to prevent
errors. To make this easier, Clang generates a weak SHN_ABS
`__kcfi_typeid_<function>` symbol for each address-taken function
declaration, which can be used to annotate functions in assembly
as long as at least one C translation unit linked into the program
takes the function address. For example on AArch64, we might have
the following code:

```
.c:
  int f(void);
  int (*p)(void) = f;
  p();

.s:
  .4byte __kcfi_typeid_f
  .global f
  f:
    ...
```

Note that X86 uses a different preamble format for compatibility
with Linux kernel tooling. See the comments in
`X86AsmPrinter::emitKCFITypeId` for details.

As users of KCFI may need to locate trap locations for binary
validation and error handling, LLVM can additionally emit the
locations of traps to a `.kcfi_traps` section.

Similarly to other sanitizers, KCFI checking can be disabled for a
function with a `no_sanitize("kcfi")` function attribute.

Reviewed By: nickdesaulniers, kees, joaomoreira, MaskRay

Differential Revision: https://reviews.llvm.org/D119296
2022-08-24 18:52:42 +00:00
Simon Pilgrim f9de13232f [X86] Promote i8/i16 CTTZ (BSF) instructions and remove speculation branch
This patch adds a Type operand to the TLI isCheapToSpeculateCttz/isCheapToSpeculateCtlz callbacks, allowing targets to decide whether branches should occur on a type-by-type/legality basis.

For X86, this patch proposes to allow CTTZ speculation for i8/i16 types that will lower to promoted i32 BSF instructions by masking the operand above the msb (we already do something similar for i8/i16 TZCNT). This required a minor tweak to CTTZ lowering - if the src operand is known never zero (i.e. due to the promotion masking) we can remove the CMOV zero src handling.

Although BSF isn't very fast, most CPUs from the last 20 years don't do that bad a job with it, although there are some annoying passthrough EFLAGS dependencies. Additionally, now that we emit 'REP BSF' in most cases, we are tending towards assuming this will most likely be executed as a TZCNT instruction on any semi-modern CPU.

Differential Revision: https://reviews.llvm.org/D132520
2022-08-24 17:28:18 +01:00
Phoebe Wang 12b203ea7c [X86][FP16] Add the missing legal action for EXTRACT_SUBVECTOR
Fixes #57340

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D132563
2022-08-24 23:25:07 +08:00
Simon Pilgrim 9317e6311f [TTI] Add SK_Splice shuffle mask detection and X86 costs
Enables fixed sized vectors to detect SK_Splice shuffle patterns and provides basic X86 cost support

Differential Revision: https://reviews.llvm.org/D132374
2022-08-23 20:07:30 +01:00
Philip Reames df20ff9ae2 [TTI] Kill last couple uses of OperandValueKind in targets [nfc]
Use the accessor methods on the containing class instead so that we can change the representation.
2022-08-23 08:54:41 -07:00
Philip Reames c9608d57b8 [TTI] Plumb through OperandValueInfo in getMemoryOpCost [NFC]
This has the effect of exposing the power-of-two property for use in memory op costing, but no target actually uses it yet.  The main point of this change is simple consistency with the recently changes getArithmeticInstrCost, and to remove the last (interface) use of OperandValueKind.
2022-08-23 07:55:42 -07:00
Philip Reames 104fa367ee [TTI] Use OperandValueInfo in getArithmeticInstrCost implementation [NFC]
This change completes the process of replacing OperandValueKind and OperandValueProperties which were previously passed independently in this API with a single container class which contains both.

This is the change which motivated the whole sequence which preceeded it.  In an original spike version of this change, I'd noticed a nasty bug: I'd changed the signature without changing names, and as result, we silently passed additional information through a callsite which previously dropped the power-of-two fact.  This might be harmless in most cases, but at least a couple clearly dependend for correctness on not passing that property through.

I did my best to split off prior changes which reduced the scope of this one, and which made it possible to use compiler assistance.  For instance, every parameter which changes type in this change also changes name.  This was intentional to make sure that every call site possible effected must show up in the diff.  This let me audit each one closely.
2022-08-22 15:16:39 -07:00
Philip Reames 478cf94378 [X86][AArch64][WebAsm][RISCV] Query operand properties instead of using enums directly [nfc]
This is part of an ongoing transition to use OperandValueInfo which combines OperandValueKind and OperandValueProperties.  This change adds some accessor methods and uses them to simplify backend code.  The primary motivation of doing so is removing uses of the parameters so that an upcoming api change is less error prone.
2022-08-22 13:37:59 -07:00
Philip Reames 5e87a020a5 [X86][TTI] Rename OpNInfo to OpNKind [nfc]
Both are reasonable names; this is solely that an upcoming change can use the OpNInfo name, and the compiler can tell me if I forgot to update something (instead of silently passing along properties that might not hold.)
2022-08-22 13:37:59 -07:00
Simon Pilgrim dd5b48976c [CostModel][X86] getShuffleCost - treat SK_Splice as SK_PermuteTwoSrc
SK_Splice should be equivalent to a PALIGNR instruction etc. - but as discussed on D132308, until full fixed vector support for SK_Splice is in place, just assume its a SK_PermuteTwoSrc.
2022-08-22 10:51:08 +01:00
Simon Pilgrim 7ff2a9f250 [CostModel][X86] Add CodeSize handling for fadd/fsub/fmul/fsqrt ops
Eventually this will be part of the cost table lookup
2022-08-21 17:42:11 +01:00
Simon Pilgrim 3c4391b4bb Revert rG15de7aaae52ef4be9f9ff3b130804e5b5ccd29f4 "[CostModel][X86] Add CodeSize/SizeLatency handling for fadd/fsub/fmul/fsqrt ops"
This is unintentionally affecting some backend tests
2022-08-21 16:51:45 +01:00
Simon Pilgrim 15de7aaae5 [CostModel][X86] Add CodeSize/SizeLatency handling for fadd/fsub/fmul/fsqrt ops
Eventually this will be part of the cost table lookup
2022-08-21 16:39:57 +01:00
Simon Pilgrim 5263155d5b [CostModel] Add CostKind argument to getShuffleCost
Defaults to TCK_RecipThroughput - as most explicit calls were assuming TCK_RecipThroughput (vectorizers) or was just doing a before-vs-after comparison (vectorcombiner). Calls via getInstructionCost were just dropping the CostKind, so again there should be no change at this time (as getShuffleCost and its expansions don't use CostKind yet) - but it will make it easier for us to better account for size/latency shuffle costs in inline/unroll passes in the future.

Differential Revision: https://reviews.llvm.org/D132287
2022-08-21 10:54:51 +01:00
Kazu Hirata 8b1b0d1d81 Revert "Use std::is_same_v instead of std::is_same (NFC)"
This reverts commit c5da37e42d.

This patch seems to break builds with some versions of MSVC.
2022-08-20 23:00:39 -07:00
Kazu Hirata c5da37e42d Use std::is_same_v instead of std::is_same (NFC) 2022-08-20 22:36:26 -07:00
Kazu Hirata 258531b7ac Remove redundant initialization of Optional (NFC) 2022-08-20 21:18:28 -07:00
Simon Pilgrim fa96383506 [X86] Fold PMULUDQ(X,1) -> AND(X,(1<<32)-1) 'getZeroExtendInReg'
Fix cases where shl/srem/urem expansion results in a mulh/mul_lohi(x,1) 'pass through' that gets lowered to pmuludq.

Fixes #56684
2022-08-20 14:58:25 +01:00
Simon Pilgrim a7441289e2 [X86] Fix znver1 256-bit ALU/Logic/Blend uop counts
ymm instructions are double pumped on znver1 - noticed while trying to review size-latency costkinds numbers for D132216

Matches AMD 17h SOG / Agner / uops.info
2022-08-19 19:09:39 +01:00
Alexey Bataev 0e7ed32c71 [SLP]Cost for a constant buildvector.
In many cases constant buildvector results in a vector load from a
constant/data pool. Need to consider this cost too.

Differential Revision: https://reviews.llvm.org/D126885
2022-08-19 08:02:42 -07:00
Alexey Bataev d53e245951 [COST][NFC]Introduce OperandValueKind in getMemoryOpCost, NFC.
Added OperandValueKind OpdInfo parameter to getMemoryOpCost functions to
better estimate cost with immediate values.

Part of D126885.
2022-08-19 07:33:00 -07:00
Simon Pilgrim b864cad7b4 [CostModel][X86] Adjust SLM select costs to match poor throughput of pblendvb/blendvpd/blendvps 2022-08-18 17:05:38 +01:00
Simon Pilgrim 55b1a147f2 [CostModel][X86] getArithmeticInstrCost - use MUL/DIV/REM expansions for all cost kinds
The costs tables still assume throughput, but the general expansion patterns should be good for any cost kind
2022-08-18 14:18:54 +01:00
Simon Pilgrim fdec50182d [CostModel] Replace getUserCost with getInstructionCost
* Replace getUserCost with getInstructionCost, covering all cost kinds.
* Remove getInstructionLatency, it's not implemented by any backends, and we should fold the functionality into getUserCost (now getInstructionCost) to make it easier for targets to handle the cost kinds with their existing cost callbacks.

Original Patch by @samparker (Sam Parker)

Differential Revision: https://reviews.llvm.org/D79483
2022-08-18 11:55:23 +01:00
Haohai Wen f4410d471f [X86] Add schedule module for Alderlake-P
The X86SchedAlderlakeP.td file is automatically generated by schedtool
(D130897). Most of instruction's scheduling information is based on
measured ADL-P data in uops.info. Some data is from GLC tpt/lat data
provided by intel doc. The rest instruction's scheduling information is
from skylake client schedule model in order to get a relative complete
model.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D130959
2022-08-18 16:40:14 +08:00
Daniil Fukalov 7ed3d81333 [NFCI] Move cost estimation from TargetLowering to TargetTransformInfo.
TragetLowering had two last InstructionCost related `getTypeLegalizationCost()`
and `getScalingFactorCost()` members, but all other costs are processed in TTI.

E.g. it is not comfortable to use other TTI members in these two functions
overrided in a target.

Minor refactoring: `getTypeLegalizationCost()` now doesn't need DataLayout
parameter - it was always passed from TTI.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D117723
2022-08-18 00:38:55 +03:00
Nick Desaulniers 6b0e2fa6f0 [SelectionDAG] make INLINEASM_BR use MachineBasicBlocks instead of BlockAddresses
As part of re-architecting callbr to no longer use blockaddresses
(https://reviews.llvm.org/D129288), we don't really need them in MIR.
They make comparing MachineBasicBlocks of indirect targets during
MachineVerifier a PITA.

Suggested by @efriedma from the discussion:
https://reviews.llvm.org/D130290#3669531

Reviewed By: efriedma, void

Differential Revision: https://reviews.llvm.org/D130316
2022-08-17 09:34:31 -07:00
Eli Friedman cfd2c5ce58 Untangle the mess which is MachineBasicBlock::hasAddressTaken().
There are two different senses in which a block can be "address-taken".
There can be a BlockAddress involved, which means we need to map the
IR-level value to some specific block of machine code.  Or there can be
constructs inside a function which involve using the address of a basic
block to implement certain kinds of control flow.

Mixing these together causes a problem: if target-specific passes are
marking random blocks "address-taken", if we have a BlockAddress, we
can't actually tell which MachineBasicBlock corresponds to the
BlockAddress.

So split this into two separate bits: one for BlockAddress, and one for
the machine-specific bits.

Discovered while trying to sort out related stuff on D102817.

Differential Revision: https://reviews.llvm.org/D124697
2022-08-16 16:15:44 -07:00
Bing1 Yu 807b8cb06c [X86] Fix a lowering issue of mask.compress which has undef float passthrough
Previously, LegaizeDAG didn't check mask.compress's passthrough might be float, and this lead to getConstant crash since it doesn't support fp

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D131947
2022-08-16 17:54:45 +08:00
Simon Pilgrim a7b85e4c0c [X86] Freeze shl(x,1) -> add(x,x) vector fold (PR50468)
Vector fold shl(x,1) -> add(freeze(x),freeze(x)) to avoid the undef issues identified in PR50468

Differential Revision: https://reviews.llvm.org/D106675
2022-08-15 16:17:21 +01:00
Simon Pilgrim 41bdb8cd36 [X86] Fold insert_vector_elt(undef, elt, 0) --> scalar_to_vector(elt)
I had hoped to make this a generic fold in DAGCombine, but there's quite a few regressions in Thumb2 MVE that need addressing first.

Fixes regressions from D106675.
2022-08-15 14:56:30 +01:00
Simon Pilgrim 8b47e29fa0 [X86] combineVectorShiftImm - fold (shl (add X, X), C) -> (shl X, (C + 1))
Noticed while investigating the regressions in D106675
2022-08-14 17:42:02 +01:00
Phoebe Wang 8b69549dc5 [X86][FP16] Promote FP16->[U]INT to FP16->FP32->[U]INT
This is to avoid f16->i64 being lowered to `__fixhfdi/__fixunshfdi` on 32-bits since neither libgcc nor compiler-rt provide them. https://godbolt.org/z/cjWEsea5v

It also helps to improve the performance by promoting the vector type.

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D131828
2022-08-14 09:37:33 +08:00
Kazu Hirata 109df7f9a4 [llvm] Qualify auto in range-based for loops (NFC)
Identified with readability-qualified-auto.
2022-08-13 12:55:42 -07:00
James Y Knight 4d7f9b7489 X86: Don't fold TEST into ADD ...@GOTTPOFF/GOTNTPOFF/INDNTPOFF
The linker may convert such an ADD into a LEA, so we must not
use the EFLAGS output.

This causes miscompiles with -fsanitize=null after
bacdf80f42 added
llvm.threadlocal.address -- previously, global variables were known to
be non-null, but the intrinsic is not currently known to return
nonnull. (That should be corrected, but it shouldn't've caused
miscompiles!)

Differential Revision: https://reviews.llvm.org/D131716
2022-08-12 20:52:00 +00:00
Simon Pilgrim 6ba5fc2dee [X86] lowerShuffleWithVPMOV - support direct lowering to VPMOV on VLX targets
lowerShuffleWithVPMOV currently only matches shuffle(truncate(x)) patterns, but on VLX targets the truncate isn't usually necessary to make the VPMOV node worthwhile (as we're only targetting v16i8/v8i16 shuffles we're almost always ending up with a PSHUFB node instead). PACKSS/PACKUS are still preferred vs VPMOV due to their lower uop count.

Fixes the remaining regression from the fixes in rG293899c64b75
2022-08-11 17:40:07 +01:00
Simon Pilgrim 5dcf0c342b [X86] lowerShuffleWithVPMOV - remove oneuse constraints on shuffle(trunc(x),undef) -> vpmov(x) lowering
These were added in rG057bdd63 but shuffle combining has gotten a lot better at folding different vector widths since then.
2022-08-11 14:06:42 +01:00
aqjune 02e56e2533 [CodeGen] Generate efficient assembly for freeze(poison) version of `mm*_cast*` intel intrinsics
This patch makes the variants of `mm*_cast*` intel intrinsics that use `shufflevector(freeze(poison), ..)` emit efficient assembly.
(These intrinsics are planned to use `shufflevector(freeze(poison), ..)` after shufflevector's semantics update; relevant thread: D103874)

To do so, this patch

1. Updates `LowerAVXCONCAT_VECTORS` in X86ISelLowering.cpp to recognize `FREEZE(UNDEF)` operand of `CONCAT_VECTOR` in addition to `UNDEF`
2. Updates X86InstrVecCompiler.td to recognize `insert_subvector` of `FREEZE(UNDEF)` vector as its first operand.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D130339
2022-08-11 13:36:21 +09:00
Amaury Séchet 9bceb8981d [X86] (0 - SetCC) | C -> (zext (not SetCC)) * (C + 1) - 1 if we can get a LEA out of it.
This adresses various regression in D131260 , as well as is a useful optimization in itself.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D131358
2022-08-10 15:12:00 +00:00
Simon Pilgrim b92c7dc211 [X86] Use DAG.getFreeze() to create freeze node. NFC. 2022-08-10 15:03:56 +01:00
Alex Bradbury 7e7860c5d7 [X86][NFCI] Remove target-specific branch optimisation that's handled in BranchFolding
This specific optimisation is handled in OptimizeBlock in BranchFolding
so is redundant. As discussed on the review thread, I've verified that
we have test coverage for that optimisation within test/CodeGen/X86 by
disabling the BranchFolding version of this transform after applying
this patch and rerunning the test suite.

Differential Revision: https://reviews.llvm.org/D129204
2022-08-10 10:35:31 +01:00
Phoebe Wang c7ec6e19d5 [X86][BF16] Make backend type bf16 to follow the psABI
X86 psABI has updated to support __bf16 type, the ABI of which is the
same as FP16. See https://discourse.llvm.org/t/patch-add-optional-bfloat16-support/63149

This is an alternative of D129858, which has less code modification and
supports the vector type as well.

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D130832
2022-08-10 08:58:56 +08:00
Luo, Yuanke aaf6c7b05c [globalisel] Select register bank for DBG_VALUE
The register operand of DBG_VALUE is not selected to a proper register
bank in both AArch64 and X86. This would cause getRegClass crash after
global ISel. After discussion, we think the MIR should assume all
vritual register should be set proper register class after global ISel,
so this patch is to fix the gap of DBG_VALUE for AArch64 and X86.

Differential Revision: https://reviews.llvm.org/D129037
2022-08-09 13:11:51 +08:00
Fangrui Song de9d80c1c5 [llvm] LLVM_FALLTHROUGH => [[fallthrough]]. NFC
With C++17 there is no Clang pedantic warning or MSVC C5051.
2022-08-08 11:24:15 -07:00
Simon Pilgrim 9ea54ac9ce [X86] X86ISelDAGToDAG.cpp - use auto for all values derived from cast/dyn_cast (style). NFC. 2022-08-08 14:35:06 +01:00
Kazu Hirata ba0407ba86 [llvm] Use range-based for loops (NFC) 2022-08-07 00:16:21 -07:00
Kazu Hirata 54199d805a [x86] Remove unused declaration processWaitCnt (NFC)
The declaration was introduced without a corresponding definition on
Jan 2, 2022 in commit 85e6e748d4.
2022-08-07 00:16:19 -07:00
Kazu Hirata a2d4501718 [llvm] Fix comment typos (NFC) 2022-08-07 00:16:14 -07:00
Krzysztof Parzyszek 2bc390bdd6 [RDF] Use default TargetOperandInfo if not given in constructor
All current in-tree users use the default implementation.
2022-08-06 14:32:52 -05:00
Dawid Jurczak 1bd31a6898 [NFC] Add SmallVector constructor to allow creation of SmallVector<T> from ArrayRef of items convertible to type T
Extracted from https://reviews.llvm.org/D129781 and address comment:
https://reviews.llvm.org/D129781#3655571

Differential Revision: https://reviews.llvm.org/D130268
2022-08-05 13:35:41 +02:00
Phoebe Wang 2312b747b8 [X86] Move getting module flag into `runOnMachineFunction` to reduce compile-time. NFCI
Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D131245
2022-08-05 01:58:17 -07:00
Phoebe Wang 7f648d27a8 Reland "[X86][MC] Always emit `rep` prefix for `bsf`"
`BMI` new instruction `tzcnt` has better performance than `bsf` on new
processors. Its encoding has a mandatory prefix '0xf3' compared to
`bsf`. If we force emit `rep` prefix for `bsf`, we will gain better
performance when the same code run on new processors.

GCC has already done this way: https://c.godbolt.org/z/6xere6fs1

Fixes #34191

Reviewed By: craig.topper, skan

Differential Revision: https://reviews.llvm.org/D130956
2022-08-05 10:22:48 +08:00