Commit Graph

17234 Commits

Author SHA1 Message Date
Sanjay Patel 97948620b1 [x86] add test for 3 fcmps and logic; NFC
This is a more complex pattern than we handled with the
initial patch for PR51245:
D110342 / 09e71c367a

We could extend the logic matching to allow a setcc as
one operand and an extract of vector setcc (or even an
arbitrary bool?) as the other.
2021-09-30 11:02:59 -04:00
Roman Lebedev 3bd02ec977
[NFC][X86][Codegen] Add test coverage for interleaved i64 load/store stride=2 2021-09-30 17:31:18 +03:00
Roman Lebedev 7dffb8b4da
[NFC][X86][Codegen] Add test coverage for interleaved i32 load/store stride=2 2021-09-29 22:17:12 +03:00
Wesley Wiser 2dd883439c [Mangler] Calculate the argument list byte count suffix correctly when returning large values
`__stdcall`, `__fastcall` and `__vectorcall` return large values via a
hidden pointer argument. However, the size of that argument should not
be included in the argument list byte count suffix added to the
function's decorated name.

This patch fixes that issue so that LLVM generates the same decorated
name as MSVC does.

MSVC example: https://godbolt.org/z/nc35MKPhr

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D110719
2021-09-29 11:42:28 -07:00
Roman Lebedev 465d2adbfb
[NFC][X86] Add codegen test coverage for interleaved load/store i8 stride=2 2021-09-29 15:28:05 +03:00
Martin Storsjö d6216e2cd1 [X86] Fix handling of i128<->fp on Windows
On Windows, i128 arguments are passed as indirect arguments, and
they are returned in xmm0.

This is mostly fixed up by `WinX86_64ABIInfo::classify` in Clang, making
the IR functions return v2i64 instead of i128, and making the arguments
indirect. However for cases where libcalls are generated in the target
lowering, the lowering uses the default x86_64 calling convention for
i128, where they are passed/returned as a register pair.

Add custom lowering logic, similar to the existing logic for i128
div/mod (added in 4a406d32e9),
manually making the libcall (while overriding the return type to
v2i64 or passing the arguments as pointers to arguments on the stack).

X86CallingConv.td doesn't seem to handle i128 at all, otherwise
the windows specific behaviours would ideally be implemented as
overrides there, in generic code, handling these cases automatically.

This fixes https://bugs.llvm.org/show_bug.cgi?id=48940.

Differential Revision: https://reviews.llvm.org/D110413
2021-09-29 13:05:59 +03:00
Arthur Eubanks aa53785f23 Reland [clang] Rework dontcall attributes
To avoid using the AST when emitting diagnostics, split the "dontcall"
attribute into "dontcall-warn" and "dontcall-error", and also add the
frontend attribute value as the LLVM attribute value. This gives us all
the information to report diagnostics we need from within the IR (aside
from access to the original source).

One downside is we directly use LLVM's demangler rather than using the
existing Clang diagnostic pretty printing of symbols.

Previous revisions didn't properly declare the new dependencies.

Reviewed By: nickdesaulniers

Differential Revision: https://reviews.llvm.org/D110364
2021-09-28 15:31:30 -07:00
Arthur Eubanks 7833d20f1f Revert "[clang] Rework dontcall attributes"
This reverts commit 2943071e2e.

Breaks bots
2021-09-28 14:49:27 -07:00
Arthur Eubanks 2943071e2e [clang] Rework dontcall attributes
To avoid using the AST when emitting diagnostics, split the "dontcall"
attribute into "dontcall-warn" and "dontcall-error", and also add the
frontend attribute value as the LLVM attribute value. This gives us all
the information to report diagnostics we need from within the IR (aside
from access to the original source).

One downside is we directly use LLVM's demangler rather than using the
existing Clang diagnostic pretty printing of symbols.

Reviewed By: nickdesaulniers

Differential Revision: https://reviews.llvm.org/D110364
2021-09-28 14:21:10 -07:00
Jay Foad b2b1a8b833 [LiveIntervals] Improve repair after convertToThreeAddress
After TwoAddressInstructionPass calls
TargetInstrInfo::convertToThreeAddress, improve the LiveIntervals repair
to cope with convertToThreeAddress creating more than one new
instruction.

This mostly seems to benefit X86. For example in
test/CodeGen/X86/zext-trunc.ll it converts:

  %4:gr32 = ADD32rr %3:gr32(tied-def 0), %2:gr32, implicit-def dead $eflags

to:

  undef %6.sub_32bit:gr64 = COPY %3:gr32
  undef %7.sub_32bit:gr64_nosp = COPY %2:gr32
  %4:gr32 = LEA64_32r killed %6:gr64, 1, killed %7:gr64_nosp, 0, $noreg

Differential Revision: https://reviews.llvm.org/D110335
2021-09-28 08:10:08 +01:00
Liu, Chen3 57e8f840b6 [X86][FP16] Fix a bug when Combine the FADD(A, FMA(B, C, 0)) to FMA(B, C, A).
This bug was introduced by D109953. The operand order of generated FMA
is wrong.

Differential Revision: https://reviews.llvm.org/D110606
2021-09-28 11:38:53 +08:00
Xiang1 Zhang ebe9944a34 [ISel] Legalized arithmetic.fence.f128 for 32-bits target
Reviewed By: Craig Topper, Wang Pengfei

Differential Revision: https://reviews.llvm.org/D110467
2021-09-28 10:27:25 +08:00
Simon Pilgrim 540ed354d3 [X86] Add slow/fast pmulld test coverage to vector-mul.ll 2021-09-27 21:53:56 +01:00
Roman Lebedev ee6228ff8c
[NFC][X86] Add 'gather' optsize/minsize test coverage 2021-09-27 23:49:10 +03:00
Roman Lebedev f7e82e4fa8
[NFC][X86] Add test showing that legal `GATHER`'s are expoanded on Znver3 2021-09-27 22:41:09 +03:00
Simon Pilgrim 468ff703e1 [X86] combineVectorHADDSUB - remove the broken HOP(x,x) merging code (PR51974)
This intention of this code turns out to be superfluous as we can handle this with shuffle combining, and it has a critical flaw in that it doesn't check for dependencies.

Fixes PR51974
2021-09-27 10:41:22 +01:00
Freddy Ye 902ec6142a [X86][ISel] Lowering FROUND(f16) and FROUNDEVEN(f16)
When AVX512FP16 is enabled, FROUND(f16) cannot be dealt with
TypeLegalize, and no libcall in libm is ready for fround(f16) now.
FROUNDEVEN(f16) has related instruction in AVX512FP16.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D110312
2021-09-27 13:35:03 +08:00
Amara Emerson acd13994d1 [GlobalISel] Re-generate some call lowering tests with the new CHECK-NEXT behaviour. 2021-09-26 17:25:38 -07:00
Simon Pilgrim c0eff50fc5 [X86][SSE] combineMulToPMADDWD - enable sext_extend_vector_inreg(vXi16) -> zext_extend_vector_inreg(vXi16) fold
The plan is to allow combineMulToPMADDWD to match illegal vector types (as long as they're still pow2), which should allow us to start removing the 128-bit limit on more of the PMADDWD combines.
2021-09-26 19:37:23 +01:00
Simon Pilgrim ed3e4917b3 [X86] Fold PACK(*_EXTEND_VECTOR_INREG, UNDEF) -> *_EXTEND_VECTOR_INREG
For 128-bit vectors, we can remove a PACK of a EXTEND_VECTOR_INREG node and just create a smaller extension to the result/packed type.
2021-09-26 19:37:22 +01:00
Simon Pilgrim 3fe9767204 [X86] Fold ADD(VPMADDWD(X,Y),VPMADDWD(Z,W)) -> VPMADDWD(SHUFFLE(X,Z), SHUFFLE(Y,W))
Merge addition of VPMADDWD nodes if each element pair doesn't use the upper element in each pair (i.e. its zero) - we can generalize this to either element in the pair if we one day create VPMADDWD with zero lower elements.

There are still a number of issues with extending/shuffling with 256/512-bit VPMADDWD nodes so this initially only works for v2i32/v4i32 cases - I'm working on removing all these limitations but there's still a bit of yak shaving to go.....
2021-09-26 18:08:29 +01:00
Simon Pilgrim 18c8ed5416 [DAG] ReduceLoadOpStoreWidth - replace getABITypeAlign with allowsMemoryAccess (PR45116)
One of the cases identified in PR45116 - we don't need to limit store narrowing to ABI alignment, we can use allowsMemoryAccess - which tests using getABITypeAlign, but also checks if a target permits (fast) misaligned memory access by checking allowsMisalignedMemoryAccesses as a fallback.
2021-09-25 18:35:57 +01:00
Simon Pilgrim eb7c78c2c5 [X86][SSE] combineMulToPMADDWD - mask off upper bits of sign-extended vXi32 constants
If we are multiplying by a sign-extended vXi32 constant, then we can mask off the upper 16 bits to allow folding to PMADDWD and make use of its implicit sign-extension from i16
2021-09-25 15:50:45 +01:00
Simon Pilgrim 2a4fa0c27c [X86][SSE] combineMulToPMADDWD - enable sext(v8i16) -> zext(v8i16) fold on sub-128 bit vectors 2021-09-25 15:50:45 +01:00
Simon Pilgrim f5a26ccae2 [X86][SSE] combineMulToPMADDWD - enable sext(v8i16) -> zext(v8i16) fold on pre-SSE41 targets
We already do this on SSE41 targets where we have sext/zext instructions, now that combineShiftToPMULH handles SSE2 targets, we can enable this here as well.
2021-09-25 14:35:31 +01:00
Simon Pilgrim a25f25c3b7 [X86] combineShiftToPMULH - relax from ISA from SSE41 to SSE2
With improved shuffle combines (in particular canonicalizeShuffleWithBinOps), we can now usefully perform this on any SSE2+ target.

We should be able to remove this entirely and just use DAGCombiner's combineShiftToMULH if we can someday get it to support illegal (pre-widened) types.
2021-09-25 14:08:03 +01:00
Paul Robinson 6185ad03f1 [TargetLibraryInfo] Correctly handle sqrt*_finite
Other <math>_finite calls are marked as unavailable except on GNU/Linux;
it looks like the sqrt set was just overlooked.

Differential Revision: https://reviews.llvm.org/D110418
2021-09-24 11:57:38 -07:00
Stanislav Mekhanoshin 08d7eec06e Revert "Allow rematerialization of virtual reg uses"
Reverted due to two distcint performance regression reports.

This reverts commit 92c1fd19ab.
2021-09-24 10:26:11 -07:00
Simon Pilgrim d8fc9f8727 [X86][SSE] combineMulToPMADDWD - replace sext(v8i16) -> zext(v8i16)
As suggested on D108522, if we're sign extending a v4i16 source before multiplying as a v4i32, then we can replace that with a zero extension and rely on the implicit sign-extension of PMADDWD.
2021-09-24 16:42:01 +01:00
Sanjay Patel 09e71c367a [x86] convert logic-of-FP-compares to FP logic-of-vector-compares
This is motivated by the examples and discussion in:
https://llvm.org/PR51245
...and related bugs.

By using vector compares and vector logic, we can convert 2 'set'
instructions into 1 'movd' or 'movmsk' and generally improve
throughput/reduce instructions.

Unfortunately, we don't have a complete vector compare ISA before
AVX, so I left SSE-only out of this patch. Ie, we'd need extra logic
ops to simulate the missing predicates for SSE 'cmpp*', so it's not
as clearly a win.

Differential Revision: https://reviews.llvm.org/D110342
2021-09-24 11:38:19 -04:00
Jay Foad 7863cc6c1c [LiveIntervals] Fix repairOldRegInRange for simple def cases
The fix applied in D23303 "LiveIntervalAnalysis: fix a crash in repairOldRegInRange"
was over-zealous. It would bail out when the end of the range to be
repaired was in the middle of the first segment of the live range of
Reg, which was always the case when the range contained a single def of
Reg.

This patch fixes it as suggested by Matthias Braun in post-commit review
on the original patch, and tests it by adding -early-live-intervals to
a selection of existing lit tests that now pass.

(Note that D23303 was originally applied to fix a crash in
SILoadStoreOptimizer, but that is now moot since D23814 updated
SILoadStoreOptimizer to run before scheduling so it no longer has to
update live intervals.)

Differential Revision: https://reviews.llvm.org/D110238

Unrevert with some changes to the tests:
- Add -verify-machineinstrs to check for remaining problems in live
  interval support in TwoAddressInstructionPass.
- Drop test/CodeGen/AMDGPU/extract-load-i1.ll since it suffers from
  some of those remaining problems.
2021-09-24 11:44:49 +01:00
Sanjay Patel 5188e2c9ce [x86] add AVX512 run for fcmp+logic ops; NFC
Suggested in D110342
2021-09-23 14:28:22 -04:00
Jay Foad deb2ca566a Revert "[LiveIntervals] Fix repairOldRegInRange for simple def cases"
This reverts commit 8229cb7412.

It was failing on buildbots with expensive checks enabled.
2021-09-23 17:55:05 +01:00
Jay Foad 8229cb7412 [LiveIntervals] Fix repairOldRegInRange for simple def cases
The fix applied in D23303 "LiveIntervalAnalysis: fix a crash in repairOldRegInRange"
was over-zealous. It would bail out when the end of the range to be
repaired was in the middle of the first segment of the live range of
Reg, which was always the case when the range contained a single def of
Reg.

This patch fixes it as suggested by Matthias Braun in post-commit review
on the original patch, and tests it by adding -early-live-intervals to
a selection of existing lit tests that now pass.

(Note that D23303 was originally applied to fix a crash in
SILoadStoreOptimizer, but that is now moot since D23814 updated
SILoadStoreOptimizer to run before scheduling so it no longer has to
update live intervals.)

Differential Revision: https://reviews.llvm.org/D110238
2021-09-23 17:16:14 +01:00
Sanjay Patel b240a2980b [x86] add AVX run to tests of fcmp logic; NFC
The ISA before AVX has predicate gaps for both fcmp
codegen alternatives, so that requires a more
complicated fix to get ideal asm in all cases.
2021-09-23 11:21:39 -04:00
Liu, Chen3 76656ec8ec [X86][FP16] Combine the FADD(A, FMA(B, C, 0)) to FMA(B, C, A)
This patch is to support transform something like
_mm512_add_ph(acc, _mm512_fmadd_pch(a, b, _mm512_setzero_ph()))
to _mm512_fmadd_pch(a, b, acc).

Differential Revision: https://reviews.llvm.org/D109953
2021-09-23 15:37:08 +08:00
Wang, Pengfei ebec077e07 [X86][FP16] Change the order of the operands in complex FMA intrinsics to allow swap between the mul operands.
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D109658
2021-09-23 11:02:48 +08:00
Hongtao Yu d9b511d8e8 [CSSPGO] Set PseudoProbeInserter as a default pass.
Currenlty PseudoProbeInserter is a pass conditioned on a target switch. It works well with a single clang invocation. It doesn't work so well when the backend is called separately (i.e, through the linker or llc), where user has always to pass -pseudo-probe-for-profiling explictly. I'm making the pass a default pass that requires no command line arg to trigger, but will be actually run depending on whether the CU comes with `llvm.pseudo_probe_desc` metadata.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D110209
2021-09-22 09:09:48 -07:00
Kirill Stoimenov 2649999579 [asan] Fixed a bug causing a crash when redzone optimization kicked in on X86 with -asan-optimize-callbacks flag on.
This change adds the ASan intrinsic to the list whihc are setting hasCopyImplyingStackAdjustment.

Reviewed By: eugenis

Differential Revision: https://reviews.llvm.org/D110012
2021-09-21 22:26:03 +00:00
Craig Topper b81e26c7f4 Recommit "[X86] Clear kill flags when rewriting SETCC uses in flag copy lowering."
This time with the right bug number.

When we rewrite the setcc we replace set old setcc output register
with the new CondReg. But since CondReg can be shared by other
replacements, we don't know if the kill flags for the old register
are valid for CondReg. So be conservative and remove them.

The test case has a SETCCr and a SETCCm on the same condition so
they end up sharing the same CondReg. The SETCCr had one use with
a kill flag. This kill flag isn't valid after the replacement because
CondReg needs a live range extending to the later SETCCm replacment.

Fixes PR51903.
2021-09-21 14:59:25 -07:00
Craig Topper 51a82e051e Revert "[X86] Clear kill flags when rewriting SETCC uses in flag copy lowering."
This reverts commit 7550f146ff.

I botched the bug number.
2021-09-21 14:33:44 -07:00
Craig Topper 7550f146ff [X86] Clear kill flags when rewriting SETCC uses in flag copy lowering.
When we rewrite the setcc we replace set old setcc output register
with the new CondReg. But since CondReg can be shared by other
replacements, we don't know if the kill flags for the old register
are valid for CondReg. So be conservative and remove them.

The test case has a SETCCr and a SETCCm on the same condition so
they end up sharing the same CondReg. The SETCCr had one use with
a kill flag. This kill flag isn't valid after the replacement because
CondReg needs a live range extending to the later SETCCm replacment.

Fixes PR51908.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D110046
2021-09-21 14:29:46 -07:00
Amara Emerson 4ceea77409 [X86] Rename the X86WinAllocaExpander pass and related symbols to "DynAlloca". NFC.
For x86 Darwin, we have a stack checking feature which re-uses some of this
machinery around stack probing on Windows. Renaming this to be more appropriate
for a generic feature.

Differential Revision: https://reviews.llvm.org/D109993
2021-09-20 16:19:28 -07:00
Alex Richardson 817e23d481 [update_mir_test_checks.py] Use -NEXT FileCheck directories
Previously the script emitted output using plain CHECK directives. This
can result in a test passing even if there are some instructions between
CHECK directives that should have been removed. It also makes debugging
tests that have the output in a different order more difficult since
FileCheck can match with a later line and then complain about the "wrong"
directive not being found.

This will cause quite large diffs when updating existing tests, but I'm not sure we need an opt-in flag here.

Depends on D109765 (pre-commit tests)

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D109767
2021-09-20 12:55:56 +01:00
Petar Avramovic e4c46ddd91 [GlobalISel] Improve elimination of dead instructions in legalizer
Add eraseInstr(s) utility functions. Before deleting an instruction
collects its use instructions. After deletion deletes use instructions
that became trivially dead.
This patch clears all dead instructions in existing legalizer mir tests.

Differential Revision: https://reviews.llvm.org/D109154
2021-09-20 13:00:58 +02:00
Simon Pilgrim 0e89ff8195 [X86] SimplifyDemandedBits - only narrow a broadcast source if we only have one use.
Helps with the regression noted on D109065 - don't truncate a broadcast source if the source has multiple uses.
2021-09-19 22:53:30 +01:00
Craig Topper 391fa371fd [X86] Remove Commutable flag from mpsadbw intrinsics.
Unlike psadbw, mpsadbw is not commutable because of how it operates
on blocks. We already marked as not commutable for MachineIR, but
had it commutable for the tablegened isel patterns.

Fixes PR51908.
2021-09-19 13:22:22 -07:00
Craig Topper 2bde3dcd32 [X86] Add test cases for pr51908. NFC 2021-09-19 13:22:22 -07:00
Simon Pilgrim b7342e3137 [X86] Fold SHUFPS(shuffle(x),shuffle(y),mask) -> SHUFPS(x,y,mask')
We can combine unary shuffles into either of SHUFPS's inputs and adjust the shuffle mask accordingly.

Unlike general shuffle combining, we can be more aggressive and handle multiuse cases as we're not going to accidentally create additional shuffles.
2021-09-19 20:39:19 +01:00
Roman Lebedev 5f2fe48d06
[X86][TLI] SimplifyDemandedVectorEltsForTargetNode(): don't break apart broadcasts from which not just the 0'th elt is demanded
Apparently this has no test coverage before D108382,
but D108382 itself shows a few regressions that this fixes.

It doesn't seem worthwhile breaking apart broadcasts,
assuming we want the broadcasted value to be preset in several elements,
not just the 0'th one.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D108411
2021-09-19 17:38:32 +03:00