Commit Graph

21411 Commits

Author SHA1 Message Date
Philip Reames 4824d876f0 Revert "Allow invokable sub-classes of IntrinsicInst"
This reverts commit d87b9b81cc.

Post commit review raised concerns, reverting while discussion happens.
2021-04-20 15:38:38 -07:00
Philip Reames d87b9b81cc Allow invokable sub-classes of IntrinsicInst
It used to be that all of our intrinsics were call instructions, but over time, we've added more and more invokable intrinsics. According to the verifier, we're up to 8 right now. As IntrinsicInst is a sub-class of CallInst, this puts us in an awkward spot where the idiomatic means to check for intrinsic has a false negative if the intrinsic is invoked.

This change switches IntrinsicInst from being a sub-class of CallInst to being a subclass of CallBase. This allows invoked intrinsics to be instances of IntrinsicInst, at the cost of requiring a few more casts to CallInst in places where the intrinsic really is known to be a call, not an invoke.

After this lands and has baked for a couple days, planned cleanups:
    Make GCStatepointInst a IntrinsicInst subclass.
    Merge intrinsic handling in InstCombine and use idiomatic visitIntrinsicInst entry point for InstVisitor.
    Do the same in SelectionDAG.
    Do the same in FastISEL.

Differential Revision: https://reviews.llvm.org/D99976
2021-04-20 15:03:49 -07:00
Simon Pilgrim 2a419a0b99 [X86][SSE] combineX86ShuffleChain - check if we're blending with zero into already zero elements
Add a SelectionDAG::MaskedElementsAreZero helper that wraps SelectionDAG::MaskedValueIsZero testing for entirely zero vector elements
2021-04-20 17:09:49 +01:00
Nick Desaulniers c440b97d89 [TargetLowering] move "o" and "X" constraint handling to base class
These constraints are machine agnostic; there's no reason to handle
these per-arch. If arches don't support these constraints, then they
will fail elsewhere during instruction selection. We don't need virtual
calls to look these up; TargetLowering::getInlineAsmMemConstraint should
only be overridden by architectures with additional unique memory
constraints.

Reviewed By: echristo, MaskRay

Differential Revision: https://reviews.llvm.org/D100416
2021-04-19 10:53:31 -07:00
Roman Lebedev df9597cf5a
[X86][CostModel] X86TTIImpl::getShuffleCost(): subvector insertions are cheap
This is similar to the subvector extractions,
except that the 0'th subvector isn't free to insert,
because we generally don't know whether or not
the upper elements need to be preserved:
https://godbolt.org/z/rsxP5W4sW

This is needed to avoid regressions in D100684

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D100698
2021-04-19 13:24:58 +03:00
Serge Guelton d6de1e1a71 Normalize interaction with boolean attributes
Such attributes can either be unset, or set to "true" or "false" (as string).
throughout the codebase, this led to inelegant checks ranging from

        if (Fn->getFnAttribute("no-jump-tables").getValueAsString() == "true")

to

        if (Fn->hasAttribute("no-jump-tables") && Fn->getFnAttribute("no-jump-tables").getValueAsString() == "true")

Introduce a getValueAsBool that normalize the check, with the following
behavior:

no attributes or attribute set to "false" => return false
attribute set to "true" => return true

Differential Revision: https://reviews.llvm.org/D99299
2021-04-17 08:17:33 +02:00
Roman Lebedev b06c55a698
[X86][CostModel] Fix cost model for non-power-of-two vector load/stores
Sometimes LV has to produce really wide vectors,
and sometimes they end up being not powers of two.
As it can be seen from the diff, the cost computation
is currently completely non-sensical in those cases.

Instead of just scalarizing everything, split/factorize the wide vector
into a number of subvectors, each one having a power-of-two elements,
recurse to get the cost of op on this subvector. Also, check how we'd
legalize this subvector, and if the legalized type is scalar,
also account for the scalarization cost.

Note that for sub-vector loads, we might be able to do better,
when the vectors are properly aligned.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D100099
2021-04-16 15:30:57 +03:00
Simon Pilgrim 9d57a77b81 [X86] combineCMP - fold cmpEQ/NE(TRUNC(X),0) -> cmpEQ/NE(X,0)
If we are truncating from a i32 source before comparing the result against zero, then see if we can directly compare the source value against zero.

If the upper (truncated) bits are known to be zero then we can compare against that, hopefully increasing the chances of us folding the compare into a EFLAG result of the source's operation.

Fixes PR49028.

Differential Revision: https://reviews.llvm.org/D100491
2021-04-15 13:55:51 +01:00
Sander de Smalen 4f42d873c2 [TTI] NFC: Change getArithmeticInstrCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D100317
2021-04-14 17:20:36 +01:00
Sander de Smalen 1af35e77f4 [TTI] NFC: Change getVectorInstrCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D100315
2021-04-14 17:20:35 +01:00
Sander de Smalen 174e8f6c5e [TTI] NFC: Change getShuffleCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D100314
2021-04-14 17:20:35 +01:00
Sander de Smalen 14b934f8a6 [TTI] NFC: Change getCFInstrCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: samparker

Differential Revision: https://reviews.llvm.org/D100313
2021-04-14 17:20:34 +01:00
Simon Pilgrim 4fbe761572 [X86][SSE] canonicalizeShuffleWithBinOps - check for more combos of merge-able binary shuffles.
In the fold SHUFFLE(BINOP(X,Y),BINOP(Z,W)) -> BINOP(SHUFFLE(X,Z),SHUFFLE(Y,W)), check if both X/Z AND Y/W have at least one merge-able shuffle in which case the total number of shuffle should still fall.

Helps with instruction count regressions we saw while fixing PR48823
2021-04-14 15:24:41 +01:00
Simon Pilgrim 73737fe990 [X86] Fold cmpeq/ne(trunc(x),0) --> cmpeq/ne(x,0)
Relax the fold from rGbaadbe04bf75 to compare any op, not just logic ops, now that the movmsk regressions have been handled.
2021-04-14 11:02:02 +01:00
Simon Pilgrim 016ceb8382 [X86][SSE] combineSetCCMOVMSK - allow comparison with upper (known zero) bits in MOVMSK(SHUFFLE(X,u)) -> MOVMSK(X) fold
Extension to rG74f98391a7a4, we can also include any of the upper (known zero) bits in the comparison in the shuffle removal fold, just as long as we demand all the elements of the movmsk source vector.
2021-04-14 11:02:01 +01:00
Bogdan Graur 0acf4e5005
[NFC] Fix unused warning.
Differential Revision: https://reviews.llvm.org/D100449
2021-04-14 09:09:20 +02:00
Wang, Pengfei a3b52a9d13 [X86][AMX] Refactor for PostRA ldtilecfg pass.
This is a follow up of D99010. We didn't consider the live range of shape registers when hoist ldtilecfg. There maybe risks, e.g. we happen to insert it to an invalid range of some registers and get unexpected error.

This patch fixes this problem by storing the value to corresponding stack place of ldtilecfg after all its definition immediately.

This patch also fix a problem in previous code: If we don't have a ldtilecfg which dominates all AMX instructions, we cannot initialize shapes for other ldtilecfg.

There're still some optimization points left. E.g. eliminate unused mov instructions, break the def-use dependency before RA etc.

Reviewed By: LuoYuanke, xiangzhangllvm

Differential Revision: https://reviews.llvm.org/D99966
2021-04-14 10:08:23 +08:00
Simon Pilgrim 74f98391a7 [X86][SSE] combineSetCCMOVMSK - allow comparison with upper (known zero) bits in CMP(MOVMSK(PACKSS())) -> CMP(MOVMSK()) fold
We already allow the comparison of the upper bits of 'IsAllOf' (allbits) patterns, but we can safely compare the known zero bits for 'IsAnyOf' (zerobits) patterns as well.

This fixes an issues where we are comparing a type wide than the number of vector elements, which avoids a regression mentioned in rGbaadbe04bf75.
2021-04-13 17:37:24 +01:00
Sander de Smalen 03f47bdcb1 [TTI] NFC: Change get[Interleaved]MemoryOpCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D100205
2021-04-13 14:21:02 +01:00
Sander de Smalen d676b5749d [TTI] NFC: Change getMaskedMemoryOpCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D100204
2021-04-13 14:21:01 +01:00
Sander de Smalen db134e2428 [TTI] NFC: Change getCmpSelInstrCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D100203
2021-04-13 14:21:01 +01:00
Sander de Smalen 2285dfb73f [TTI] NFC: Change getMinMaxReductionCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D100202
2021-04-13 14:21:00 +01:00
Sander de Smalen bd86824d98 [TTI] NFC: Change getArithmeticReductionCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

This patch is practically NFC, with the exception of an AArch64 SVE related
cost-model change, where we can now return an Invalid cost instead of some
bogus number.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D100201
2021-04-13 14:20:59 +01:00
Sander de Smalen fd1f8a5462 [TTI] NFC: Change getGatherScatterOpCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D100200
2021-04-13 14:20:59 +01:00
Sander de Smalen 92d8421f49 [TTI] NFC: Change getCastInstrCost and getExtractWithExtendCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D100199
2021-04-13 14:20:58 +01:00
Freddy Ye 3fc1fe8db8 [X86] Support -march=rocketlake
Reviewed By: skan, craig.topper, MaskRay

Differential Revision: https://reviews.llvm.org/D100085
2021-04-13 09:48:13 +08:00
Simon Pilgrim baadbe04bf [X86] Fold cmpeq/ne(trunc(logic(x)),0) --> cmpeq/ne(logic(x),0)
Fixes the issues noted in PR48768, where the and/or/xor instruction had been promoted to avoid i8/i16 partial-dependencies, but the test against zero had not.

We can almost certainly relax this fold to work for any truncation, although it breaks a number of existing folds (notable movmsk folds which tend to rely on the truncate to determine the demanded bits/elts in the source vector).

There is a reverse combine in TargetLowering.SimplifySetCC so we must wait until after legalization before attempting this.
2021-04-12 16:05:34 +01:00
Wang, Pengfei 4cbaaf4a24 [X86][AMX] Hoist ldtilecfg
The previous code calculated the first ldtilecfg by dominating all AMX registers' def. This may result in the ldtilecfg being inserted into a loop.

This patch try to calculate the nearest point where all shapes of AMX registers are reachable.

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D99010
2021-04-12 22:36:41 +08:00
Bing1 Yu 747111ea71 [X86] Pass to transform tdpbsud&tdpbusd&tdpbuud intrinsics to scalar operation
Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D99244
2021-04-12 13:58:14 +08:00
Freddy Ye 5cb47be410 [X86] Remove FeatureCLWB from FeaturesICLClient
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D100279
2021-04-12 12:08:59 +08:00
Simon Pilgrim 231b87618b [X86][AVX512] Fold not(kmov(x)) -> kmov(not(x)) and not(widen_subvector(x)) -> widen_subvector(not(x))
Improve AVX512 mask inversion, rG38c799bce801 exposed some missing opportunities to move scalar not() back onto the boolvector types for folding with setcc etc.
2021-04-11 20:07:09 +01:00
Simon Pilgrim 13bdac5709 [X86] combineXor - Pull out repeated getOperand() calls. NFCI. 2021-04-11 19:01:59 +01:00
Simon Pilgrim 38c799bce8 [X86] Fold cmpeq/ne(and(X,Y),Y) --> cmpeq/ne(and(~X,Y),0)
Followup to D100177, handle an similar (demorgan inverse style) case from PR47797 as well

The AVX512 test cases could be further improved if we folded not(iX bitcast(vXi1)) -> (iX bitcast(not(vXi1)))

Alive2: https://alive2.llvm.org/ce/z/AnA_-W
2021-04-11 18:42:01 +01:00
dfukalov 8f4b7e94a2 [AMDGPU][CostModel] Refine cost model for control-flow instructions.
Added cost estimation for switch instruction, updated costs of branches, fixed
phi cost.
Had to increase `-amdgpu-unroll-threshold-if` default value since conditional
branch cost (size) was corrected to higher value.
Test renamed to "control-flow.ll".

Removed redundant code in `X86TTIImpl::getCFInstrCost()` and
`PPCTTIImpl::getCFInstrCost()`.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D96805
2021-04-10 09:20:24 +03:00
Simon Pilgrim d8bc4de3cf [X86] Fold cmpeq/ne(or(X,Y),X) --> cmpeq/ne(and(~X,Y),0) on non-BMI targets (PR44136)
Followup to D100177, enable the fold for non-BMI targets as well.
2021-04-09 16:11:11 +01:00
Simon Pilgrim 245036950a [X86][BMI] Fold cmpeq/ne(or(X,Y),X) --> cmpeq/ne(and(~X,Y),0) (PR44136)
I've initially just enabled this for BMI which has the ANDN instruction for i32/i64 - the i16/i8 cases give an idea of what'd we get when we enable it in all cases (I'll do this as a later commit).

Additionally, the i16/i8 cases could be freely promoted to i32 (as the args are already zeroext) and we could then make use of ANDN + the free cmp0 there as well - this has come up in PR48768 and PR49028 so I'm going to look at this soon.

https://alive2.llvm.org/ce/z/QVWHP_
https://alive2.llvm.org/ce/z/pLngT-

Vector cases do not appear to benefit from this as we end up with having to generate the zero vector as well - this is one of the reasons I didn't try to tie this into hasAndNot/hasAndNotCompare.

Differential Revision: https://reviews.llvm.org/D100177
2021-04-09 15:52:03 +01:00
dfukalov d066079728 [NFC][AA] Prepare to convert AliasResult to class with PartialAlias offset.
Main reason is preparation to transform AliasResult to class that contains
offset for PartialAlias case.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D98027
2021-04-09 12:54:22 +03:00
Simon Pilgrim 3ae0a405fc [X86] combineHorizOpWithShuffle - peek through one use bitcasts when decoding shuffles.
Checking for one use, peek through bitcasts of the horizop args to allows us to merge shuffles of different widths through the horizop.
2021-04-09 10:51:04 +01:00
Simon Pilgrim 302e748065 [X86] Improve optimizeCompareInstr for signed comparisons after AND/OR/XOR instructions
Extend D94856 to handle 'and', 'or' and 'xor' instructions as well

We still fail on many i8/i16 cases as the test and the logic-op are performed on different widths
2021-04-07 14:28:42 +01:00
Simon Pilgrim 583258723f [X86] Improve optimizeCompareInstr for signed comparisons after BZHI instructions
Extend D94856 to handle 'bzhi' instructions as well
2021-04-07 12:07:26 +01:00
Nicolás Alvarez a1aada75f5 [docs] Fix doxygen comments wrongly attached to the llvm namespace
Looking at the Doxygen-generated documentation for the llvm namespace
currently shows all sorts of random comments from different parts of the
codebase. These are mostly caused by:

- File doc comments that aren't marked with \file, so they're attached to
  the next declaration, which is usually "namespace llvm {".
- Class doc comments placed before the namespace rather than before the
  class.
- Code comments before the namespace that (in my opinion) shouldn't be
  extracted by doxygen at all.

This commit fixes these comments. The generated doxygen documentation now
has proper docs for several classes and files, and the docs for the llvm
and llvm::detail namespaces are now empty.

Reviewed By: thakis, mizvekov

Differential Revision: https://reviews.llvm.org/D96736
2021-04-07 01:20:18 +02:00
Simon Pilgrim 53283cc2f1 [X86][SSE] canonicalizeShuffleWithBinOps - add MOVSD/MOVSS handling. 2021-04-06 16:42:18 +01:00
Simon Pilgrim 1dcb5b5e89 [X86] Improve optimizeCompareInstr for signed comparisons after ANDN instructions
Extend D94856 to handle 'andn' instructions as well
2021-04-06 14:16:16 +01:00
Simon Pilgrim 201877d572 [CostModel][X86] Improve accuracy of vXi8 multiply reduction costs
After rG47321c311bdbe0145b9bf45d822185c37b19fa50 we promote vXi8 reductions to vXi16 to create a much faster PMULLW mul reduction, followed by a (free) truncation. This avoids the high cost of repeated vXi8 multiplications (which extend+multiply+truncate to/from vXi16 types....).

Fixes the missing vXi8 mul reduction vectorization in PR42674 (Comment #20) 'mul16' test case.
2021-04-06 11:53:22 +01:00
Simon Pilgrim ddbb58736a [KnownBits] Rename KnownBits::computeForMul to KnownBits::mul. NFCI.
As promised in D98866
2021-04-06 10:11:41 +01:00
Simon Pilgrim 36d4f6d7f8 [X86] Fold xor(zext(xor(x,c1)),c2) -> xor(zext(x),xor(zext(c1),c2))
Fixes PR47603 (second case) by extending rG89afec348dbd3e5078f176e978971ee2d3b5dec8
2021-04-05 11:40:37 +01:00
Roman Lebedev 7727cc242d
[NFC][X86] Split VPMOV* AVX2 instructions into their own sched class
At least on all three Zen's, all such instructions cleanly map
into this new class with no overrides needed.
2021-04-03 22:39:07 +03:00
Nikita Popov 665065821e [FastISel] Remove kill tracking
This is a followup to D98145: As far as I know, tracking of kill
flags in FastISel is just a compile-time optimization. However,
I'm not actually seeing any compile-time regression when removing
the tracking. This probably used to be more important in the past,
before FastRA was switched to allocate instructions in reverse
order, which means that it discovers kills as a matter of course.

As such, the kill tracking doesn't really seem to serve a purpose
anymore, and just adds additional complexity and potential for
errors. This patch removes it entirely. The primary changes are
dropping the hasTrivialKill() method and removing the kill
arguments from the emitFast methods. The rest is mechanical fixup.

Differential Revision: https://reviews.llvm.org/D98294
2021-04-03 15:50:13 +02:00
Simon Pilgrim 89afec348d [X86] Fold xor(truncate(xor(x,c1)),c2) -> xor(truncate(x),xor(truncate(c1),c2))
Fixes PR47603

This should probably be transferable to DAGCombine - the main limitation with the existing trunc(logicop) DAG fold is we don't know if legalization has tried to promote truncated logicops already. We might be able to peek through extensions as well.
2021-04-03 12:43:05 +01:00
Simon Pilgrim 7c17f1ea84 [X86][SSE] isHorizontalBinOp - use getTargetShuffleInputs helper (REAPPLIED)
Use the getTargetShuffleInputs helper for all shuffle decoding

Reapplied (after reversion in rGfa0aff6d6960) with fix+test for subvector splitting - we weren't accounting for peeking through bitcasts changing the vector element count of the shuffle sources.
2021-04-03 11:59:19 +01:00