Commit Graph

22873 Commits

Author SHA1 Message Date
Sylvestre Ledru cd20a18286 Revert "[clang, llvm] Add __declspec(safebuffers), support it in CodeView"
Causing:
https://github.com/llvm/llvm-project/issues/57709

This reverts commit ab56719acd.
2022-09-13 10:53:59 +02:00
David Majnemer ab56719acd [clang, llvm] Add __declspec(safebuffers), support it in CodeView
__declspec(safebuffers) is equivalent to
__attribute__((no_stack_protector)).  This information is recorded in
CodeView.

While we are here, add support for strict_gs_check.
2022-09-12 21:15:34 +00:00
Kazu Hirata 9606608474 [llvm] Use x.empty() instead of llvm::empty(x) (NFC)
I'm planning to deprecate and eventually remove llvm::empty.

I thought about replacing llvm::empty(x) with std::empty(x), but it
turns out that all uses can be converted to x.empty().  That is, no
use requires the ability of std::empty to accept C arrays and
std::initializer_list.

Differential Revision: https://reviews.llvm.org/D133677
2022-09-12 13:34:35 -07:00
Craig Topper 38ffa2bb96 [LegalizeTypes] Improve splitting for urem/udiv by constant for some constants.
For remainder:
If (1 << (Bitwidth / 2)) % Divisor == 1, we can add the high and low halves
together and use a (Bitwidth / 2) urem. If (BitWidth /2) is a legal integer
type, this urem will be expand by DAGCombiner using multiply by magic
constant. We do have to take into account that adding high and low
together can produce a carry, making it a (BitWidth / 2)+1 bit number.
So we need to also add back in the carry from the first addition.

For division:
We can use the above trick to compute the remainder, subtract that
remainder from the dividend, then multiply by the multiplicative
inverse of the Divisor modulo (1 << BitWidth).

This is based on the section "Remainder by Summing Digits" in
Hacker's delight.

The remainder trick is similar to a trick you may have learned for
determining if a decimal number is divisible by 3. You can add all the
digits together and see if the sum is divisible by 3. If you're not sure
if the sum is divisible by 3, you can add its digits together. This
can be repeated until you have a single decimal digit. If that digit
is 3, 6, or 9, then the original number is divisible by 3. This works
because 10 % 3 == 1.

gcc already does this same trick. There are additional tricks gcc
does urem as well as srem, udiv, and sdiv that I plan to add in
future patches.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D130862
2022-09-12 10:34:52 -07:00
Matthias Gehre c1502425ba Move TargetTransformInfo::maxLegalDivRemBitWidth -> TargetLowering::maxSupportedDivRemBitWidth
Also remove new-pass-manager version of ExpandLargeDivRem because there is no way
yet to access TargetLowering in the new pass manager.

Differential Revision: https://reviews.llvm.org/D133691
2022-09-12 17:06:16 +01:00
Simon Pilgrim 20ad05f9b4 [CostModel][X86] Add CostKinds handling for abs ops
This was achieved with an updated version of the 'cost-tables vs llvm-mca' script D103695
2022-09-12 16:34:37 +01:00
Simon Pilgrim bd0109f392 [CostModel][X86] Move AVX512/AVX2 uniform shift costs into the generic uniform cost tables
They shouldn't be happening after XOP shift costs - AVX2 shift supports takes preference over XOP for everything but vXi8 shifts - the improvement is pretty limited as it only affects bdver4 targets but it does help clean up a fraction of the messy shift cost logic....
2022-09-12 12:08:42 +01:00
Simon Pilgrim a931dbfbd3 [CostModel][X86] Merge AVX512BW vXi8/vXi16 shifts into default AVX512BW cost table
We only need to handle the uniform cases early
2022-09-10 18:18:42 +01:00
Simon Pilgrim 10edf88458 [CostModel][X86] Update CTPOP costs
With the bdver2 model updates, many of the AVX1 costs were far too high - it also helped expose some costs mismatches for Atom/Silvermont
2022-09-10 17:57:20 +01:00
Simon Pilgrim 4994f87ca1 [X86] Fix bdver2 128-bit shuffles throughputs
Noticed while trying to get vector ctpop/ctlz/cttz costs fixed using the script from D103695 - all of these are full-rate but the throughput costs were weirdly high for bdver2

Matches AMD 15h SoG, Agner and instlatx64
2022-09-10 17:34:40 +01:00
Simon Pilgrim 7785bd34e7 [X86] Fix bdver2 128-bit ALU/logic/shift throughputs
Noticed while trying to get vector shifts costs fixed using the script from D103695 - all of these are full-rate but the throughput costs were weirdly high for bdver2

Matches AMD 15h SoG, Agner and instlatx64
2022-09-10 16:23:29 +01:00
Simon Pilgrim 05f56f10ed [X86] Fix VPPERM load folding latency
Noticed while investigating BITREVERSE cost numbers with the D103695 script - VPPERM folded loads was using the WriteVarShuffleX defaults and was missing an override like the VPPERM reg-reg variants
2022-09-09 13:57:39 +01:00
Simon Pilgrim 55b78e28d8 [CostModel][X86] Add missing i8 throughput cost 2022-09-09 10:58:51 +01:00
Joe Loser 5e96cea1db [llvm] Use std::size instead of llvm::array_lengthof
LLVM contains a helpful function for getting the size of a C-style
array: `llvm::array_lengthof`. This is useful prior to C++17, but not as
helpful for C++17 or later: `std::size` already has support for C-style
arrays.

Change call sites to use `std::size` instead.

Differential Revision: https://reviews.llvm.org/D133429
2022-09-08 09:01:53 -06:00
Simon Pilgrim e74102a963 [CostModel][X86] Merge getTypeBasedIntrinsicInstrCost into getIntrinsicInstrCost
For the few non type based intrinsic cases we can just check for !isTypeBasedOnly() to access the args directly.

I don't think we have a need to keep getTypeBasedIntrinsicInstrCost in BasicTTIImpl.h any more and can do a similar merge there as well - but it's a messier refactor and will take a while.
2022-09-07 12:04:09 +01:00
Marco Elver 0ba8886af5 [FastISel] Propagate PCSections metadata to MachineInstr
Propagate PC sections metadata to MachineInstr when FastISel is doing
instruction selection.

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D130884
2022-09-07 11:36:01 +02:00
Xiang1 Zhang c836ddaf72 [X86][NFC] Refine load/store reg to StackSlot for extensibility
Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D133078
2022-09-07 14:35:42 +08:00
Simon Pilgrim 648e182d92 [CostModel][X86] getIntrinsicInstrCost - convert to CostKindTblEntry
Begin the refactoring to use CostKindTblEntry and return real latency/codesize/sizelatency costs instead of reusing the throughput numbers

This should allow us to merge getTypeBasedIntrinsicInstrCost into getIntrinsicInstrCost and remove all remaining references
2022-09-06 22:05:32 +01:00
Markus Böck f049b2c3fc [MC] Emit Stackmaps before debug info
This patch is essentially an alternative to https://reviews.llvm.org/D75836 and was mentioned by @lhames in a comment.

The gist of the issue is that Mach-O has restrictions on which kind of sections are allowed after debug info has been emitted, which is also properly asserted within LLVM. Problem is that stack maps are currently emitted as one of the last sections in each target-specific AsmPrinter so far, which would cause the assertion to trigger. The current approach of special casing for the `__LLVM_STACKMAPS` section is not viable either, as downstream users can overwrite the stackmap format using plugins, which may want to use different sections.

This patch fixes the issue by emitting the stack map earlier, right before debug info is emitted. The way this is implemented is by taking the choice when to emit the StackMap away from the target AsmPrinter and doing so in the base class. The only disadvantage of this approach is that the `StackMaps` member is now part of the base class, even for targets that do not support them. This is functionaly not a problem however, as emitting an empty `StackMaps` is a no-op.

Differential Revision: https://reviews.llvm.org/D132708
2022-09-06 20:20:56 +02:00
Simon Pilgrim 10e0f3e948 [CostModel][X86] Add CostKinds handling for ctpop ops
This was achieved with an updated version of the 'cost-tables vs llvm-mca' script D103695 (although it still struggles with avx512 predicate numbers which had to be done manually)

Some of the pre-AVX values still aren't great - atom/slm worst case numbers for ctpop expansion really affect these (especially throughput/latency), so we need to clean them up in a more consistent way - its a pity we don't have models for more older cpus (merom/nehalem etc.) as other examples.
2022-09-06 17:27:24 +01:00
Matthias Gehre 2090e85fee [llvm/CodeGen] Enable the ExpandLargeDivRem pass for X86, Arm and AArch64
This adds the ExpandLargeDivRem to the default pass pipeline.
The limit at which it expands div/rem instructions is configured
via a new TargetTransformInfo hook (default: no expansion)
X86, Arm and AArch64 backends implement this hook to expand div/rem
instructions with more than 128 bits.

Differential Revision: https://reviews.llvm.org/D130076
2022-09-06 15:32:04 +01:00
Simon Pilgrim 83552e8c72 [CostModel][X86] Add CostKinds handling for SSE FCMP_ONE/FCMP_UEQ predicates
These require special handling to account for their expansion in lowering.

I'm trying very hard not to have to add predicate specific costs - but it might be inevitable.....
2022-09-06 12:05:22 +01:00
Simon Pilgrim c1b5e36d74 [CostModel][X86] Add CostKinds handling for fcmp ops
This was achieved with an updated version of the 'cost-tables vs llvm-mca' script D103695 (although it still struggles with avx512 predicate numbers which had to be done manually)

SSE numbers are still too low for FCMP_ONE/FCMP_UEQ cases which expand to a more complex sequence than the existing 'ExtraCost' system can manage.
2022-09-06 10:34:53 +01:00
Freddy Ye d5fa8b1c2c [X86] Support SAE for VCVTPS2PH from intrinsic.
For now, clang and gcc both failed to generate sae version from _mm512_cvt_roundps_ph:
https://godbolt.org/z/oh7eTGY5z. Intrinsic guide description is also wrong, which will be
update soon.

Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D132641
2022-09-06 11:28:12 +08:00
Simon Pilgrim bd0801cddf [X86] Cleanup SLM SSE shift and CMPGTQ scheduler model numbers
These were causing weird mismatches for the D103695 script report as I'm trying to enable cost kinds support for vector shift and integer comparisons.

The SSE shifts by (non-constant) scalar are half-rate but still only 1uop and PCMPGT is half-rate and only on Pipe0 (although not as slow as PCMPEQQ which we already handle).
2022-09-05 13:44:05 +01:00
Simon Pilgrim 8534f51474 [CostModel][X86] Add CostKinds handling for sqrt intrinsicc
This was achieved using the 'cost-tables vs llvm-mca' script from D103695

Some of the znver1/znver2 latency/throughput numbers were really weird (some copy+paste afaict) - I've used the numbers from the AMD SoG, which roughly match the 'worst case' range value from Agner
2022-09-04 18:39:21 +01:00
Simon Pilgrim 626a84db47 [CostModel][X86] getTypeBasedIntrinsicInstrCost - convert to CostKindTblEntry
Begin the refactoring to use CostKindTblEntry and return real latency/codesize/sizelatency costs instead of reusing the throughput numbers
2022-09-04 17:59:08 +01:00
Simon Pilgrim 80d4b3a275 Revert rG06e73626cf0fc33b025a0f98f1eee4a302279982 "[CostModel][X86] getTypeBasedIntrinsicInstrCost - convert to CostKindTblEntry"
Some arm buildbots are complaining about a phase ordering test failure in unsigned-multiply-overflow-check.ll - I guess this test needs making x86 specific first
2022-09-04 17:51:11 +01:00
Simon Pilgrim 06e73626cf [CostModel][X86] getTypeBasedIntrinsicInstrCost - convert to CostKindTblEntry
Begin the refactoring to use CostKindTblEntry and return real latency/codesize/sizelatency costs instead of reusing the throughput numbers
2022-09-04 17:28:45 +01:00
Simon Pilgrim 59dbd6a0cf [CostModel][X86] Remove redundant AVX512 v64i8 shift costs
These are handled earlier (and more accurately) in AVX512BWShiftCostTable
2022-09-04 14:06:26 +01:00
Simon Pilgrim c444af1c20 [CostModel][X86] Add CostKinds handling for mul ops
This was achieved using the 'cost-tables vs llvm-mca' script D103695

Also fix a missing pmullw v16i16 half-rate throughput as znver1 double-pumps - matches numbers from AMD SoG + Agner
2022-09-04 11:59:05 +01:00
Kazu Hirata 9eca5ed790 [llvm] Use std::enable_if_t (NFC) 2022-09-03 11:17:44 -07:00
Simon Pilgrim 444685de06 [CostModel][X86] Adjust mul v4i32/v8i32 throughput cost
Based off the numbers from AMD SoG + Agner - vXi32 are both half-rate, and znver1 double-pumps the v8i32 op

We should have caught this earlier as many Intel models have half-rate pmulld already :-(
2022-09-03 18:45:08 +01:00
Simon Pilgrim 114b7762a9 [CostModel][X86] Add CostKinds handling for add/sub ops
This was achieved using the 'cost-tables vs llvm-mca' script D103695
2022-09-03 18:45:08 +01:00
Simon Pilgrim 5aee2726d8 [CostModel][X86] Add CostKinds handling for fdiv ops
This was achieved with an updated version of the 'cost-tables vs llvm-mca' script D103695

As we're using 'typical' worst case values, not all cost entries come from a single CPU - e.g. the latency/throughput from haswell but the size-latency(uops) from zen1/alderlake-e due to 'double pumping'

As the uop count (used for TCK_SizeAndLatency) for divss/divps is typically so low, we need to override isExpensiveToSpeculativelyExecute to ensure we keep fdiv calls behind branches - although for some very recent cpu targets it might not be necessary any more and could be relaxed.
2022-09-03 15:48:39 +01:00
Simon Pilgrim bddbd408b7 [X86] Fix fdiv throughput/latency/uops counts
Matches znver1/2 numbers from AMD SoG + Agner - no additional uops for folded instructions and znver1 double pumps 256-bit vectors

Matches skylake/icelake throughput numbers from Intel AoM + Agner/instlatx64

Noticed while adding fdiv CostKinds support
2022-09-03 15:23:46 +01:00
Simon Pilgrim 1c12e12111 [CostModel][X86] Add fdiv(double) throughput x87 costs for 2022-09-03 14:08:25 +01:00
Simon Pilgrim bd956b7db3 [X86] Fix fmul throughput/latency/uops counts
Matches numbers from AMD SoG + Agner - should always be on FPU Pipes 0+1, no additional uops for folded instructions and znver1 double pumps 256-bit vectors and is always latency = 4cy for f64 multiplies

Noticed while adding fmul CostKinds support to the x86 cost models in rG0735200e3f50 and znver1 wasn't being flagged as requiring 2uop for 256-bit vectors
2022-09-03 11:10:51 +01:00
Simon Pilgrim 0735200e3f [CostModel][X86] Add CostKinds handling for fmul ops
This was achieved with an updated version of the 'cost-tables vs llvm-mca' script D103695

As we're using 'typical' worst case values, not all cost entries come from a single CPU - e.g. the latency/throughput from haswell but the size-latency(uops) from zen1/alderlake-e due to 'double pumping'
2022-09-03 10:42:20 +01:00
Simon Pilgrim 82090cb85e [CostModel][X86] Remove unused float x87 costs
We only need the double costs for SSE1 fallback
2022-09-03 09:59:20 +01:00
Simon Pilgrim 116d8f8cf0 Revert rG11765b77be84d793ebedc5b5436c463490746131 "[CostModel][X86] Add CostKinds handling for fmul ops"
I need to address some x87 codegen changes before re-committing this.
2022-09-02 17:21:25 +01:00
Simon Pilgrim 11765b77be [CostModel][X86] Add CostKinds handling for fmul ops
This was achieved with an updated version of the 'cost-tables vs llvm-mca' script D103695

As we're using 'typical' worst case values, not all cost entries come from a single CPU - e.g. the latency/throughput from haswell but the size-latency(uops) from zen1/alderlake-e due to 'double pumping'
2022-09-02 16:57:23 +01:00
Simon Pilgrim 6941b1f6c1 [CostModel][X86] Add CostKinds to SSE42 fadd/fsub/fneg ops
These were missed in an earlier commit, the latency/codesize/size-latency numbers aren't different from the SSE2 values that it was falling through to, hence no test change, but it did mean we were wasting a lookup.
2022-09-02 16:32:44 +01:00
Simon Pilgrim ad16f3e413 [CostModel][X86] Add CostKinds handling for fadd/fsub/fneg ops
This was achieved with an updated version of the 'cost-tables vs llvm-mca' script D103695 which I'll update shortly

As we're using 'typical' worst case values, not all cost entries come from a single CPU - e.g. the latency/throughput from haswell but the size-latency(uops) from zen1/alderlake-e due to 'double pumping'
2022-09-02 11:50:01 +01:00
Antonio Frighetto f0c50447f6 [X86InstPrinter] Introduce markup tags emission
x86 assembly syntax emission now leverages markup tags, if enabled.

Differential Revision: https://reviews.llvm.org/D129869
2022-09-01 21:04:35 -07:00
Simon Pilgrim f8d4da7630 [X86] Fix reciprocal instruction throughput/uops counts
Matches numbers from AMD SoG + Agner - should always be on FPU Pipes 0+1, no additional uops for folded instructions and znver1 double pumps 256-bit vectors

Noticed while adding CostKinds support to the x86 cost models
2022-09-01 20:25:52 +01:00
Simon Pilgrim ad8e4dd2ad [CostModel][X86] Add and/or/xor general cost kinds support
Account for double-pumping on early AVX1/AVX2 targets
2022-08-31 17:26:05 +01:00
Simon Pilgrim 1209b9c2c2 [CostModel][X86] Replace CostKindCosts constructor with default values.
This improves static initialization of the cost tables and significantly speeds up MSVC compile time.
2022-08-31 10:44:44 +01:00
Simon Pilgrim 7830445086 [CostModel][X86] Account for add/sub 512-bit vector splitting costs on non-AVX512BW targets 2022-08-30 16:54:06 +01:00
Kazu Hirata 8feb60756c [llvm] Use range-based for loops (NFC) 2022-08-28 23:28:58 -07:00