Commit Graph

33072 Commits

Author SHA1 Message Date
Jessica Paquette 970cb99e0a [GlobalISel] Combine `(x + y) - y -> x` and friends
This adds a combine that handles

```
(x + y) - y -> x
(x + y) - x -> y
x - (y + x) -> 0 - y
x - (x + z) -> 0 - z
```

On AArch64, we get added benefit for `0 - y` because it can be selected to a
`neg` instruction.

Differential Revision: https://reviews.llvm.org/D135010
2022-10-03 10:06:48 -07:00
Philip Reames 21f97fdc97 [DAG] Use getSplatBuildVector in a couple more places [nfc] 2022-10-03 09:48:49 -07:00
Markus Böck 36af4c8418 [SelectionDAG] Fix use-after-free introduced in D130881
The code introduced in https://reviews.llvm.org/D130881 has a bug as it may cause a use-after-free error that can be caught by ASAN.
The bug essentially boils down to iterator invalidation of `DenseMap`. The expression `SDEI[To] = I->second;` may cause `SDEI` to grow if `To` is inserted for the very first time. When that happens, all existing iterators to the map are invalidated as their backing storage has been freed. Accessing `I->second` is then invalid and attempts to access freed memory (as `I` is an iterator of `SDEI`).

This patch fixes that quite simply by first making a copy of `I->second`, and then moving into the possibly newly inserted KV of the ` DenseMap`.

No test attached as I am not sure it is practible to test.

Differential revision: https://reviews.llvm.org/D135019
2022-10-03 15:09:14 +02:00
Petar Avramovic 1fa2019828 [SelectionDAG] Add check for BUILD_VECTOR in isKnownNeverNaN
Includes handling of constants with vector type in isKnownNeverNaN.
For AMDGPU results in not making fcanonicalize during legalization
for vector inputs to fmaxnum_ieee and fminnum_ieee. Does not affect
end result since there is a combine that eliminates fcanonicalize.

Differential Revision: https://reviews.llvm.org/D88573
2022-10-03 12:47:07 +02:00
Amara Emerson 3daf7ddaef [GlobalISel] Allow prelegalizer combiners to have access to LegalizerInfo.
Before, the isPreLegalize() query in CombinerHelper only checked for the
presence of a LegalizerInfo object. This is problematic when we want to have
a combine actually check for legality in a pre-legalizer combine pass, since
if we pass a LegalizerInfo object to the constructor it causes the combines to
think that we're running *post* legalizer, which isn't true.

This change fixes it to instead check an explicit bool that passes to signal
whether the pass will be run before or after legalization.

Doing so exposed a bug in the extending loads combine, which tried to check for
legality of candidate extending loads if LegalizerInfo was present. Since we
only ran it pre-legalizer and therefore with a null LegalizerInfo, it never
actually ran. Also fixes the legality checks to keep the tests passing.

Differential Revision: https://reviews.llvm.org/D135044
2022-10-03 07:36:18 +01:00
David Green 3651635eca [ARM][DAG] BF16 constant handling.
Much like f16 and f32, we shouldn't try to shrink bf16 to smaller fp
constant.  The code may not be optimal, but this allows us to legalize
bf16 constants under Arm without errors.
2022-10-02 11:51:08 +01:00
Yeting Kuo cefb7aab61 [VP][RISCV] Add vp.copysign and RISC-V support.
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D134935
2022-10-01 10:19:10 +08:00
Eric Wang 5b26f4f042 Reland "[MLGO] ML Regalloc Priority Advisor"
This relands commit 8f4f26ba5b, which was reverted in 91c96a806c because of Buildbot failures. The previous model test is not compatible with tflite. e.g. https://lab.llvm.org/buildbot/#/builders/6/builds/14041

Differential Revision: https://reviews.llvm.org/D133616
2022-09-30 16:27:26 -05:00
Guozhi Wei feea3b990e [LiveRangeEdit] Add a statistic variable for rematerialization
Add a statistic variable for rematerialization.

Differential Revision: https://reviews.llvm.org/D134907
2022-09-30 19:39:51 +00:00
Simon Pilgrim 61dc5014ac [DAG] Update foldSelectWithIdentityConstant to use llvm::isNeutralConstant
D133866 added the llvm::isNeutralConstant helper to track neutral/passthrough constants

This patch updates foldSelectWithIdentityConstant to use the helper instead of maintaining its own opcode handling

Differential Revision: https://reviews.llvm.org/D134966
2022-09-30 17:46:52 +01:00
Serge Pavlov b3913a9cdf [GlobalISel] Do not crash on widening vector result
Function buildCopyToRegs did not handle properly the case when it should
make wider vector result. It happened, for example, in a function that
returns value of type <2 x f32>, which should be widen to <4 x f32> to
fit XMM register. The function eventually calls
MachineIRBuilder.buildUnmerge, which does not expect that only one
destination register is specified.

Now this case is treated specifically in buildCopyToRegs.

Differential Revision: https://reviews.llvm.org/D128546
2022-09-30 21:30:55 +07:00
Pierre van Houtryve 7388520d1c [GISel] Add more cases to isKnownNeverNaN
Make it even with the DAG implementation as of D134854

Reviewed By: arsenm, foad

Differential Revision: https://reviews.llvm.org/D134857
2022-09-30 14:10:56 +00:00
Pierre van Houtryve 653beae5a1 [AMDGPU][GISel] Add Identity BUILD_VECTOR Combines
Folds-away BUILD_VECTOR-related noops in the post-legalizer combiner.

Depends on D134433

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D134953
2022-09-30 14:07:13 +00:00
Amaury Séchet 031a7ad575 [NFC] Fix erroneous indentation. 2022-09-30 12:30:27 +00:00
Yeting Kuo 1cc02b05b7 [SelectionDAG] Add helper function to check whether a SDValue is neutral element. NFC.
Using this helper makes work about neutral elements more easier. Although I only
find one case now, I think it will have more chance to be used since so many
combine works are related to neutral elements.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D133866
2022-09-30 11:29:11 +08:00
Mircea Trofin 91c96a806c Revert "[MLGO] ML Regalloc Priority Advisor"
This reverts commit 8f4f26ba5b.

Buildbot failures, e.g. https://lab.llvm.org/buildbot/#/builders/6/builds/14041
2022-09-29 18:26:40 -07:00
Amaury Séchet 923909afbe [DAG] Simplify the select of constant combine code. NFC 2022-09-30 01:03:14 +00:00
Amaury Séchet d7600c7ccb [DAG] select Cond, C, -1 --> or (sext (not Cond)), C when C is MVT::i1
In the spirit of D130765 . Get rid of cbranches and/or cmov. Usually shorter, but sometime not, becaus eit's hard to prededict when dependency breaking xor will be introduced.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D134736
2022-09-30 00:36:58 +00:00
Eric Wang 8f4f26ba5b [MLGO] ML Regalloc Priority Advisor
The bulk of the implementation is common between 'release' mode (==AOT-ed
model) and 'development' mode (for training), the main difference is
that in development mode, we may also log features (for training logs),
inject scoring information and then produce the log file.

Differential Revision: https://reviews.llvm.org/D133616
2022-09-29 16:55:15 -05:00
Thomas Symalla a41dde2c62 [AMDGPU] Add use check in v_fma combine.
In D132837, an existing v_fma combine was extended to regard nested
fma instructions. Originally, the inner FMA was checked for being used
only once. In its current state, this check is missing, which causes
some regressions.

In this patch, this check was added.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D134856
2022-09-29 12:25:03 +02:00
Jessica Paquette 1eb49bbab6 [GlobalISel][CallLowering] Use hasRetAttr for return flags on CallBases
Given something like this:

```
declare signext i16 @signext_callee()
define i32 @caller() {
  %res = call i16 @signext_callee()
  ...
}
```

CallLowering would miss that signext_callee's return value is sign extended,
because it isn't on the call.

Use hasRetAttr on the CallBase to allow us to catch this.

(This now inserts G_ASSERT_SEXT/G_ASSERT_ZEXT like in the original review.)

Differential Revision: https://reviews.llvm.org/D86228
2022-09-28 19:38:24 -07:00
Jessica Paquette 704b2e162c [GlobalISel] Add isConstFalseVal helper to Utils
Add a utility function which returns true if the given value is a constant
false value.

This is necessary to port one of the compare simplifications in
TargetLowering::SimplifySetCC.

Differential Revision: https://reviews.llvm.org/D91754
2022-09-28 15:44:26 -07:00
serge-sans-paille 16544cbe64 [iwyu] Move <cmath> out of llvm/Support/MathExtras.h
Interestingly, MathExtras.h doesn't use <cmath> declaration, so move it out of
that header and include it when needed.

No functional change intended, but there's no longer a transitive include
fromMathExtras.h to cmath.
2022-09-28 20:49:01 +02:00
Aiden Grossman 8d77f8fde7 [MLGO] Add per-instruction MBB frequencies to regalloc dev features
This commit adds in two new features to the ML regalloc eviction
analysis that can be used in ML models, a vector of MBB frequencies and
a vector of indicies mapping instructions to their corresponding basic
blocks. This will allow for further experimentation with per-instruction
features and give a lot more flexibility for future experimentation over
how we're extracting MBB frequency data currently.

Reviewed By: mtrofin, jacobhegna

Differential Revision: https://reviews.llvm.org/D134166
2022-09-28 18:45:04 +00:00
Jay Foad 2c12a04bba [ISel] Fix DAG divergence after new FMA combine
D132837 introduced a new DAG combine that used MorphNodeTo to morph an
FMUL into an FMA. It turns out that MorphNodeTo does not properly update
the divergence bit for users of the morphed node, causing an assertion
failure on the new test case:

llc: SelectionDAG.cpp:10486: void llvm::SelectionDAG::VerifyDAGDivergence(): Assertion `calculateDivergence(N) == N->isDivergent() && "Divergence bit inconsistency detected"' failed.

Fixing MorphNodeTo to propagate the divergence bit is tricky because of
the way it is used to select machine instructions, so use getNode and
ReplaceAllUsesOfValueWith instead.

Differential Revision: https://reviews.llvm.org/D134810
2022-09-28 19:41:51 +01:00
Craig Topper 12357e88af [RISCV][SelectionDAGBuilder] Fix crash when copying a v1f32 vector between basic blocks.
On a rv64 without f32 or vector support, this will be passed across
the basic block as an i64. We need use i32 as an intermediate type
with bitcast and anyext/trunc.

Fixes PR58025

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D134758
2022-09-28 10:13:35 -07:00
Matt Arsenault a61c3455c0 AtomicExpand: Use llvm.ptrmask instead of ptrtoint
This removes the ptrtoint from the load's pointer operand, although we
can't entirely eliminate these to get the LSB shift. In a future
patch, this will avoid ptrtoint in the case where the atomic is
overaligned to the word size.
2022-09-28 12:51:30 -04:00
Matt Devereau 0a4771a7e8 [AArch64][SVE] Expand gather index to 32 bits instead of 64 bits
For gathers which load in 8 and 16 bit data then use that data
as an index, the index can be extended to 32 bits instead of
64 bits

Differential Revision: https://reviews.llvm.org/D130692
2022-09-28 14:42:12 +00:00
jacquesguan 465ac0b96e [LegalizeTypes] Use getVectorElementCount to avoid crash of scalable vector.
Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D134718
2022-09-28 17:31:29 +08:00
eopXD 9677d70eb2 [VP][RISCV] Add vp.floor, vp.round, vp.roundeven and their RISC-V support
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D134759
2022-09-27 19:45:58 -07:00
eopXD 163cb33854 [VP][RISCV] Add vp.ceil and RISC-V support
Previous commit 8b00b24f85 missed to add `int_ceil` anchor for the
llvm.ceil.* section under LangRef.rst

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D134586
2022-09-27 12:04:09 -07:00
eopXD 384b8b3da7 Revert "[VP][RISCV] Add vp.ceil and RISC-V support"
This reverts commit 8b00b24f85.
2022-09-27 11:12:57 -07:00
eopXD 8b00b24f85 [VP][RISCV] Add vp.ceil and RISC-V support
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D134586
2022-09-27 11:08:27 -07:00
Craig Topper a6383bb51c [VP][RISCV] Add vp.fmuladd.
Expanded in SelectionDAGBuilder similar to llvm.fmuladd.

Reviewed By: frasercrmck, simoll

Differential Revision: https://reviews.llvm.org/D134474
2022-09-27 10:02:37 -07:00
Amaury Séchet d1baed7c9c [DAG] select Cond, -1, C --> or (sext Cond), C if Cond is MVT::i1
This seems to be beneficial overall, except for midpoint-int.ll .

The X86 backend seems to generate zeroing that are not necesary.

Reviewed By: shchenz

Differential Revision: https://reviews.llvm.org/D131260
2022-09-27 12:54:52 +00:00
Yeting Kuo 04e1301f3d [VP][RISCV] Add vp.maxnum and vp.minnum intrinsics and RISC-V support.
Add vp.maxnum and vp.minnum which are vector predicted intrinsics of llvm.maxnum
and llvm.minnum.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D134639
2022-09-27 13:36:45 +08:00
Paul Scoropan ce004fb4f2 [PowerPC] XCOFF exception section support on the direct assembler path
This feature implements support for making entries in the exception section
on XCOFF on the direct assembly path using the ".except" pseudo-op. It also
provides functionality to lower entries (comprised of language and reason
codes) into the exception section through the use of annotation metadata
attached to llvm.ppc.trap/trapd/tw/tdw intrinsics. Integrated assembler
support will be provided in another review. https://reviews.llvm.org/D133030
needs to merge first for LIT tests

Reviewed By: shchenz, RKSimon

Differential Revision: https://reviews.llvm.org/D132146
2022-09-26 22:24:20 -04:00
Mircea Trofin 8a24e0cb5a [nfc][mlgo] Lazily compute the regalloc reward
Differential Revision: https://reviews.llvm.org/D134664
2022-09-26 15:34:29 -07:00
Craig Topper afdd600a49 [LegalizeTypes][RISCV] Support f16 in ExpandIntRes_LLROUND_LLRINT.
Promote f16 to f32 and use the f32 libcall.

I deleted rv64zfh-half-intrinsics-strict.ll because it only existed due to this issue breaking rv32.

Differential Revision: https://reviews.llvm.org/D134579
2022-09-26 11:09:33 -07:00
Amaury Séchet b30bbd181b Small formating nit in DAGCombiner. NFC 2022-09-26 13:36:11 +00:00
Yeting Kuo 43c5fbdd3a [VP][RISCV] Add vp.sqrt intrinsic and RISC-V support.
The patch modeled vp.fabs patch D132793.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D133690
2022-09-26 10:47:40 +08:00
Yi Kong 32994b7357 Make MLIR model URLs cache variables
This allows us to directly use the models published on Github.

Differential Revision: https://reviews.llvm.org/D134566
2022-09-23 15:21:53 -07:00
Afanasyev Ivan d2e434c378 [RegisterCoalescer] fix dst subreg replacement during remat copy trick
Instructions might use definition register as its "undef" operand. It
happens on architectures with predicated executon:

```
%0:subreg = instruction op_1, ..., op_N, undef %0:subreg, op_N+2, ...
```

RegisterCoalescer should take into account all remat instruction
operands during destination subregister fixup.

```
; remat result before fix:
%1 = instruction op_1, ..., op_N, undef %1:subreg, op_N+2, ...

; remat result after fix (correct):
%1 = instruction op_1, ..., op_N, undef %1, op_N+2, ...
```

Differential Revision: https://reviews.llvm.org/D125657
2022-09-23 18:52:29 +00:00
Philip Reames b9c4733079 [DAG] Move one-use add of splat to base of scatter/gather
This extends the uniform base transform used with scatter/gather to support one-use vector adds-of-splats with a non-zero base. This has the effect of essentially reassociating an add from vector to scalar domain.

The motivation is to improve the lowering of scatter/gather operations fed by complex geps.

Differential Revision: https://reviews.llvm.org/D134472
2022-09-22 18:45:12 -07:00
Eric Wang 83c53d346f [NFC][MLGO] Introduce logRewardIfNeeded method
This patch introduces a logRewardIfNeeded method to reuse regallocscoring.

Differential Revision: https://reviews.llvm.org/D134232
2022-09-22 19:22:32 -05:00
Philip Reames 60c91fd364 [RISCV] Disallow scale for scatter/gather
RISCV doesn't actually support a scaled form of indexed load and store. We previously handled this by forming the scaled SDNode, and then doing custom legalization during lowering. This patch instead adds a callback via TLI to prevent formation entirely.

This has two effects:
* First, the GEP gets expanded (and used). Instead of the shift being created with an SDLoc of the memory operation, it has the SDLoc of the GEP instruction. This avoids the scheduler perturbing IR order when there's no reason to.
* Second, we fix what appears to be a bug in index calculation with RV32. The rules for GEPs require index calculation be done in particular bitwidth, and it appears the custom legalization code got this wrong for the case where index type exceeds pointer width. (Or at least, I trust the generic GEP lowering to be correct a lot more.)

The DAGCombiner change to handle VPScatter/VPGather is technically separate, but is required to prevent a regression on those intrinsics.

Differential Revision: https://reviews.llvm.org/D134382
2022-09-22 15:31:26 -07:00
Craig Topper 9d236d4dab [SelectionDAGBuilder] Simplify how VTs is created for constrained intrinsics. NFC
All constrained intrinsics return a single value. We can directly
convert it to an EVT instead of going through ComputeValueTypes.
2022-09-22 14:21:22 -07:00
Philip Reames 46525fee81 [DAGCombine] Check both forms of a commutative transform
The transform to fold an add into the base of a scatter/gather was only checking to see if the LHS was a splat.  Included test change indicates that splats are not canonicalized to LHS, and that we need to check both sides.
2022-09-22 12:21:47 -07:00
Jonathan Camilleri 08288052ae [DebugInfo] Emit access specifiers for typedefs
The accessibility level of a typedef or using declaration in a
struct or class was being lost when producing debug information.

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D134339
2022-09-22 17:08:41 +00:00
Amara Emerson 885a87033c [GlobalISel] Enforce G_ASSERT_ALIGN to have a valid alignment > 0. 2022-09-22 16:05:07 +01:00
Matt Arsenault 94ebd7d9ff MachineVerifier: Verify REG_SEQUENCE
Somehow there was no verification of this, other than an ad-hoc
assertion in TwoAddressInstructions.
2022-09-22 09:51:15 -04:00
Christudasan Devadasan 32a8260ccc -dot-machine-cfg for printing MachineFunction to a dot file
This pass allows a user to dump a MIR function to a dot file
and view it as a graph. It is targeted to provide a similar
functionality as -dot-cfg pass on LLVM-IR. As of now the pass
also support below flags:
-dot-mcfg-only [optional][won't print instructions in the
graph just block name]
-mcfg-dot-filename-prefix [optional][prefix to add to output dot file]
-mcfg-func-name [optional] [specify function name or it's
substring, handy if mir file contains multiple functions and
you need to see graph of just one]

More flags and details can be introduced as per the requirements
in future. This pass is inspired from -dot-cfg IR pass and APIs
are written in almost identical format.

Patch by Yashwant Singh <Yashwant.Singh@amd.com> (yassingh)

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D133709
2022-09-22 12:48:33 +05:30
Fangrui Song 2d975f1efe [GlobalISel] Fix std::max after D134380 2022-09-21 14:09:04 -07:00
Amara Emerson 85cd376f70 [GlobalISel] Fix known bits for G_ASSERT_ALIGN.
I don't know what was going on originally with these tests. It seems reasonable
to have the immediate be the same byte alignment unit as the IR, in which case
we need to take the log2 in order to set the right number of low bits.

This fixes a miscompile in chromium.

Differential Revision: https://reviews.llvm.org/D134380
2022-09-21 21:34:05 +01:00
Guozhi Wei c39311eb40 [RegisterCoalescer] Use LiveRangeEdit to handle rematerialization
This patch uses the API provided by LiveRangeEdit to handle rematerialization.
It will make future maintenance and improvement more easier.

No functional change.

Differential Revision: https://reviews.llvm.org/D133610
2022-09-21 17:51:07 +00:00
Philip Reames 143f3bf8f4 [SDAG] Split handling of VPLoad/VPGather and VPStore/VPScatter [nfc]
The merged routines are not-idiomatic, and the code sharing that results is prettty minimal.  The confusion factor is not justified.
2022-09-21 09:06:02 -07:00
Thomas Symalla c98a46fee6 [ISel] Enable generating more fma instructions.
This patch changes a FADD / FMUL => FMA ISel pattern implemented
in D80801 so that it peeks through more than one FMA.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D132837
2022-09-21 12:03:11 +02:00
Matt Arsenault b9a371f6d1 AtomicExpand: Use correct pointer size for integer
This was using the default address space.
2022-09-20 16:51:05 -04:00
Amara Emerson 78833a43e8 [GlobalISel][Legalizer] Fix lowerSelect() not sign-extending the mask value.
I'm not sure why the SEXT_INREG was gated on a bitwidth check of the mask
vs element size.

This fixes a miscompile in chromium's skia library.

Differential Revision: https://reviews.llvm.org/D134236
2022-09-20 16:40:34 +01:00
Matt Arsenault 34fb7803f8 GlobalISel: Pass through AssumptionCache 2022-09-19 19:10:51 -04:00
Matt Arsenault bcb931c484 SelectionDAG: Add AssumptionCache analysis dependency
Fixes compile time regression after
bb70b5d406
2022-09-19 19:10:51 -04:00
Matt Arsenault 0d8ffcc532 Analysis: Add AssumptionCache argument to isDereferenceableAndAlignedPointer
This does not try to pass it through from the end users.
2022-09-19 18:57:33 -04:00
Simon Pilgrim 8206044183 [DAG] SimplifyDemandedVectorElts - add MULHS/MULHU handling to existing MUL/AND handling
Allows to determine known zero elements, which particularly helps simplification of DIV/REM by constant patterns
2022-09-19 12:44:43 +01:00
Simon Pilgrim 47cfe71027 [DAG] MatchRotate - reuse existing LHSShiftArg/RHSShiftArg variables. NFC. 2022-09-18 14:35:10 +01:00
Kai Nacke ae35188f97 [GISel] Fix match tree emitter.
The following changes are necessasy to get the generated tree
matcher to compile:

- In CodeExpansions::declare(), the assert() prevents connecting
  two instructions. E.g. the match code
    (match (MUL $t, $s1, $s2),
           (SUB $d, $t, $s3)),
  results in two declarations of $t, one for the def and one for
  the use. Removing the assertion allows this construct.
  If $t is later used, it is one of the operands, which should be
  perfectly fine.
- The code emitted in GIMatchTreeVRegDefPartitioner::generatePartitionSelectorCode()
  is not compilable:
  - The value of NewInstrID should be emitted, not the name
  - Both calls involving getOperand() end with one parenthesis too many
- Swaps generated condition for the partition code in the latter function

It also changes the rules i2p_to_p2i, fabs_fabs_fold, and fneg_fneg_fold
to use the tree matcher for a linear match. These rules are tested by:

CodeGen/AArch64/GlobalISel/combine-fabs.mir
CodeGen/AArch64/GlobalISel/combine-fneg.mir
CodeGen/AArch64/GlobalISel/combine-ptrtoint.mir
CodeGen/AMDGPU/GlobalISel/combine-add-nullptr.mir

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D133257
2022-09-18 00:00:15 +00:00
Aiden Grossman e5e3dccd07 [mlgo] Add in-development instruction based features for regalloc advisor
This patch adds in instruction based features to the regalloc advisor
gated behind a flag so a user can decide at runtime whether or not they
want to enable the feature. The features are only enabled when LLVM is
compiled in MLGO develpment mode (LLVM_HAVE_TF_API) is set to true.

To extract the instruction features, I'm taking a list of segments from
each LiveInterval and noting the start and end SlotIndices. This list is then
sorted based on the start SlotIndex and I iterate through each SlotIndex
to grab instructions, making sure to check for overlaps. This results in
a vector of opcodes and binary mapping matrix that maps live ranges to the
opcodes of the instructions within that LR.

Reviewed By: mtrofin

Differential Revision: https://reviews.llvm.org/D131930
2022-09-17 19:54:45 +00:00
Kazu Hirata 20d764aff0 [llvm] Don't including SetVector.h (NFC)
llvm/lib/ProfileData/RawMemProfReader.cpp uses SetVector without
including SetVector.h, so this patch adds an appropriate #include
there.
2022-09-17 12:36:43 -07:00
Vladislav Dzhidzhoev 6cf11f4462 [GlobalISel][DebugInfo] Salvage trivially dead instructions
Use salvageDebugInfo for instructions erased as trivially dead in
GlobalISel.

It would be helpful to implement support of G_PTR_ADD and G_FRAME_INDEX
in salvageDebugInfo in future in order to preserve more variable
location.

Reviewed by: arsenm

Differential Revision: https://reviews.llvm.org/D133986
2022-09-17 03:54:55 +03:00
Sotiris Apostolakis b827e7c600 [SelectOpti] Restrict load sinking
This is a follow-up to D133777, which resolved a use-after-free case but
did not cover all possible memory bugs due to misplacement of loads.

In short, the overall problem was that sinked loads could be moved after
state-modifying instructions leading to memory bugs.

The solution is to restrict load sinking unless it is found to be sound.
i) Within a basic block (to-be-sinked load and select-user are in the same BB),
loads can be sinked only if there is no intervening state-modifying instruction.
This is a conservative approach to avoid resorting to alias analysis to detect
potential memory overlap.
ii) Across basic blocks, sinking of loads is avoided. This is because going over
multiple basic blocks looking for memory conflicts could be computationally
expensive and also unlikely to allow loads to sink. Further, experiments showed
that not sinking these loads has a slight positive performance effect.
Maybe for some of these loads, having some separation allows enough time
for the load to be executed in time for its user. This is not the case for
floating point operations that benefit more from sinking.

The solution in D133777 was essentially undone in this patch,
since the latter is a complete solution to the observed problem.

Overall, the performance impact of this patch is minimal.
Tested on two internal Google workloads with instrPGO.
Search application showed <0.05% perf difference,
while the database one showed a slight improvement,
but not statistically significant.

Reviewed By: davidxl

Differential Revision: https://reviews.llvm.org/D133999
2022-09-16 20:50:46 +00:00
Jessica Paquette 1076b31da8 [GlobalISel] Combine select + fcmp to fminnum/fmaxnum/fminimum/fmaximum
This is a partial port of the code used by the SelectionDAGBuilder to
translate selects.

In particular, see matchSelectPattern in ValueTracking.cpp. This is a
GISel-equivalent of the portion which handles fminnum/fmaxnum/fminimum/fmaximum.

I tried to set it up so it'd be easy to add the non-FP cases. Those are simpler.
On the AArch64-end, it seems like the FP cases are more important for perf
right now, so I bit the bullet and went at the more complicated problem. :)

I elected to do this as a post-legalize combine rather than in the
IRTranslator because

Deciding which fmax/fmin to use can depend on legalization rules
Philosophically-speaking (TM), putting it in a combine just feels cleaner

Being able to enable/disable the combine is handy
Another option would be to use the ValueTracking code in the IRTranslator and
match what SelectionDAGBuilder::visitSelect does. I think that may be somewhat
annoying since we'd need to write lowerings back into the selects in the
legalizer. I'm not strongly opposed to the approach.

We'd also want to be careful with vector selects once that's implemented,
which explicitly check if a vector select is legal on the target. That'd
probably need a hook.

From what I can tell, doing this as a combine is probably a cleaner option
long-term.

Differential Revision: https://reviews.llvm.org/D116702
2022-09-16 13:35:46 -07:00
Pengxuan Zheng 59365f33e2 [MachineCSE] Add a threshold to avoid spending too much time in isProfitableToCSE
Currently, it can become extremely costly to compute MayIncreasePressure if the
size of CSUses turns out to be very large. In that case, it's no longer cost
effective to keep computing MayIncreasePressure. Therefore, to limit the amount
of time spent in isProfitableToCSE, we simply conservatively assume
MayIncreasePressure if the size of CSUses is too large. This can reduce overall
compile time by 30% for some benchmarks.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D134003
2022-09-16 13:35:17 -07:00
Craig Topper 1121eca685 [VP][VE] Default VP_SREM/UREM to Expand and add generic expansion using VP_SDIV/UDIV+VP_MUL+VP_SUB.
I want to default all VP operations to Expand. These 2 were blocking
because VE doesn't support them and the tests were expecting them
to fail a specific way. Using Expand caused them to fail differently.

Seemed better to emulate them using operations that are supported.

@simoll mentioned on Discord that VE has some expansion downstream. Not
sure if its done like this or in the VE target.

Reviewed By: frasercrmck, efocht

Differential Revision: https://reviews.llvm.org/D133514
2022-09-16 13:19:02 -07:00
David Majnemer 8a868d8859 Revert "Revert "[clang, llvm] Add __declspec(safebuffers), support it in CodeView""
This reverts commit cd20a18286 and adds a
"let Heading" to NoStackProtectorDocs.
2022-09-16 19:39:48 +00:00
Stanislav Mekhanoshin a0c8f5fefa [SDAG] Print divergence in SDNode::dump
If target does not support divergence the field is set to false
and not printed.

Differential Revision: https://reviews.llvm.org/D133984
2022-09-16 11:43:34 -07:00
Juan Manuel MARTINEZ CAAMAÑO e438f2d821 [DAGCombine] Do not fold SRA/SRL of MUL into MULH when MUL's LSB are
used, and MUL_LOHI is available

Folding into a sra(mul) / srl(mul) into a mulh introduces an extra
multiplication to compute the high half of the multiplication,
while it is more profitable to compute the high and lower halfs with a
single mul_lohi.

Differential Revision: https://reviews.llvm.org/D133768
2022-09-16 15:48:36 +00:00
Liqiang Tao 2e37557fde StackProtector: ensure stack checks are inserted before the tail call
The IR stack protector pass should insert stack checks before the tail
calls not only the musttail calls. So that the attributes `ssqreq` and
`tail call`, which are emited by llvm-opt, could be both enabled by
llvm-llc.

Reviewed By: compnerd

Differential Revision: https://reviews.llvm.org/D133860
2022-09-16 22:24:46 +08:00
Florian Hahn 6b86b481e3
[AArch64] Use tbl for truncating vector FPtoUI conversions.
On AArch64, doing the vector truncate separately after the fptoui
conversion can be lowered more efficiently using tbl.4, building on
D133495.

https://alive2.llvm.org/ce/z/T538CC

Depends on D133495

Reviewed By: t.p.northover

Differential Revision: https://reviews.llvm.org/D133496
2022-09-16 14:57:43 +01:00
Florian Hahn 8491d01cc3
[AArch64] Lower vector trunc using tbl.
Similar to using tbl to lower vector ZExts, tbl4 can be used to lower
vector truncates.

The initial version support i32->i8 conversions.

Depends on D120571

Reviewed By: t.p.northover

Differential Revision: https://reviews.llvm.org/D133495
2022-09-16 12:42:49 +01:00
Nikita Popov b4309800e9 [CodeGen] Don't zero callee-save registers with zero-call-used-regs (PR57692)
Callee save registers must be preserved, so -fzero-call-used-regs
should not be zeroing them. The previous implementation only did
not zero callee save registers that were saved&restored inside the
function, but we need preserve all of them.

Fixes https://github.com/llvm/llvm-project/issues/57692.

Differential Revision: https://reviews.llvm.org/D133946
2022-09-16 11:52:29 +02:00
Florian Hahn 5871f18827
[AArch64] Lower extending uitofp using tbl.
On AArch64, doing the zero-extend separately first can be lowered more
efficiently using tbl, building on D120571.

https://alive2.llvm.org/ce/z/8Je595

Depends on D120571

Reviewed By: t.p.northover

Differential Revision: https://reviews.llvm.org/D133494
2022-09-16 10:20:25 +01:00
Yuta Mukai 116838b151 [MachinePipeliner] Fix the interpretation of the scheduling model
The method of counting resource consumption is modified to be based on
"Cycles" value when DFA is not used.

The calculation of ResMII is modified to total "Cycles" and divide it
by the number of units for each resource. Previously, ResMII was
excessive because it was assumed that resources were consumed for
the cycles of "Latency" value.

The method of resource reservation is modified similarly. When a
value of "Cycles" is larger than 1, the resource is considered to be
consumed by 1 for cycles of its length from the scheduled cycle.
To realize this, ResourceManager maintains a resource table for all
slots. Previously, resource consumption was always 1 for 1 cycle
regardless of the value of "Cycles" or "Latency".

In addition, the number of micro operations per cycle is modified to
be constrained by "IssueWidth". To disable the constraint,
--pipeliner-force-issue-width=100 can be used.

For the case of using DFA, the scheduling results are unchanged.

Reviewed By: dpenry

Differential Revision: https://reviews.llvm.org/D133572
2022-09-16 09:51:48 +09:00
Alexander Timofeev fbdea5a2e9 [AMDGPU] Always select s_cselect_b32 for uniform 'select' SDNode
This patch contains changes necessary to carry physical condition register (SCC) dependencies through the SDNode scheduler.  It adds the edge in the SDNodeScheduler dependency graph instead of inserting the SCC copy between each definition and use. This approach lets the scheduler place instructions in an optimal way placing the copy only when the dependency cannot be resolved.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D133593
2022-09-15 22:03:56 +02:00
Florian Hahn 81a11da762
[CGP,AArch64] Replace zexts with shuffle that can be lowered using tbl.
This patch extends CodeGenPrepare to lower zext v16i8 -> v16i32 in loops
using a wide shuffle  creating a v64i8 vector, selecting groups of 3
zero elements and an element from the input.

This is profitable on AArch64 where such shuffles can be lowered to tbl
instructions, but only in loops, because it requires materializing 4
masks, which can be done in the loop preheader.

This is the only reason the transform is part of CGP. If there's a
better alternative I missed, please let me know. The same goes for the
shouldReplaceZExtWithShuffle hook which guards this. I am not sure if
this transform will be beneficial on other targets, but it seems like
there is no way other convenient way.

This improves the generated code for loops like the one below in
combination with D96522.

    int foo(uint8_t *p, int N) {
      unsigned long long sum = 0;
      for (int i = 0; i < N ; i++, p++) {
	unsigned int v = *p;
	sum += (v < 127) ? v : 256 - v;
      }
      return sum;
    }

https://clang.godbolt.org/z/Wco866MjY

Reviewed By: t.p.northover

Differential Revision: https://reviews.llvm.org/D120571
2022-09-15 19:18:13 +01:00
Sergei Barannikov c6acb4eb0f [SDAG] Add `getCALLSEQ_END` overload taking `uint64_t`s
All in-tree targets pass pointer-sized ConstantSDNodes to the
method. This overload reduced amount of boilerplate code a bit.  This
also makes getCALLSEQ_END consistent with getCALLSEQ_START, which
already takes uint64_ts.
2022-09-15 14:02:12 -04:00
Matt Arsenault 63d1d37d35 RegAllocGreedy: Avoid overflowing priority bitfields
The class priority is expected to be at most 5 bits before it starts
clobbering bits used for other fields. Also clamp the instruction
distance in case we have millions of instructions.

AMDGPU was accidentally overflowing into the global priority bit in
some cases. I think in principal we would have wanted this, but in the
cases I've looked at, it had the counter intuitive effect and
de-prioritized the large register tuple.

Avoid using weird bit hack PPC uses for global priority. The
AllocationPriority field is really 5 bits, and PPC was relying on
overflowing this to 6-bits to forcibly set the global priority
bit. Split this out as a separate flag to avoid having magic behavior
for values above 31.
2022-09-15 10:38:40 -04:00
Stanislav Mekhanoshin ef4b9c33f5 Fix crash while printing MMO target flags
MachineMemOperand::print can dereference a NULL pointer if TII
is not passed from the printMemOperand. This does not happen while
dumping the DAG/MIR from llc but crashes the debugger if a dump()
method is called from gdb.

Differential Revision: https://reviews.llvm.org/D133903
2022-09-14 17:29:48 -07:00
Roland Froese 207228c1d6 [DAGCombiner] More load-store forwarding for big-endian
Get some load-store forwarding cases for big-endian where a larger store covers
a smaller load, and the offset would be 0 and handled on little-endian but on
big-endian the offset is adjusted to be non-zero. The idea is just to shift the
data to make it look like the offset 0 case.

Differential Revision: https://reviews.llvm.org/D130115
2022-09-14 15:36:35 -04:00
Marco Elver 4627a30acf [MIR] Support printing and parsing pcsections
Adds support for printing and parsing PC sections metadata in MIR.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D133785
2022-09-14 10:30:25 +02:00
YongKang Zhu 5fa6b24354 Address feedback in https://reviews.llvm.org/D133637
https://reviews.llvm.org/D133637 fixes the problem where we should hash raw content of
register mask instead of the pointer to it.

Fix the same issue in `llvm::hash_value()`.

Remove the added API `MachineOperand::getRegMaskSize()` to avoid potential confusion.

Add an assert to emphasize that we probably should hash a machine operand iff it has
associated machine function, but keep the fallback logic in the original change.

Reviewed By: MatzeB

Differential Revision: https://reviews.llvm.org/D133747
2022-09-13 16:12:41 -07:00
Matt Arsenault d00b0aab31 RegisterCoalescer: Fix verifier error when merging copy of undef
There's no real read of the register, so the copy introduced a new
live value. Make sure we introduce a replacement implicit_def instead
of just erasing the copy.

Found from llvm-reduce since it tries to set undef on everything.
2022-09-13 18:40:28 -04:00
Sotiris Apostolakis eda61fb656 [SelectOpti] Fix lifetime intrinsic bug
When a select is converted to a branch and load instructions are sinked to the true/false blocks,
lifetime intrinsics (if present) could be made unsound if not moved.

This conservatively moves all lifetime intrinsics in a transformed BB to the end block to ensure
preserved lifetime semantics.

Reviewed By: davidxl

Differential Revision: https://reviews.llvm.org/D133777
2022-09-13 19:00:18 +00:00
Craig Topper efd5acf120 [LegalizeTypes][NVPTX] Remove extra compare from fallback code for ISD::ADD in ExpandIntRes_ADDSUB.
This is the ultimate fallback code if UADDO isn't supported.

If the target uses 0/1 we used one compare, but if the target doesn't
use 0/1 we emitted two compares. Regardless of boolean constants we
should only need to check that the Result is less than one of the
original operands. So we only need one compare.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D133708
2022-09-13 09:07:56 -07:00
Hendrik Greving 393a17b5d1 [ValueTypes] Define MVTs for v256i2/v128i4.
Adds MVT::v256i2, MVT::v128i4.

Differential Revision: https://reviews.llvm.org/D133603
2022-09-13 09:02:23 -07:00
Matt Arsenault 740f920a1f LiveRegUnits: Break register loop when a clobber is encountered 2022-09-13 10:15:08 -04:00
Matt Arsenault b7dae832e6 DeadMachineInstructionElim: Don't repeat per-function init
This was happening for every iteration but only needs to be done once.
2022-09-13 08:19:54 -04:00
Matt Arsenault 243632c63e LiveRegUnits: Cleanup isReg checks
This is the common case and should be checked first. Provides a very
marginal compile time improvement on the example I'm looking at.
2022-09-13 08:19:53 -04:00
Pavel Samolysov 02aaf8e3d6 [NFC][ScheduleDAGInstrs] Use structure bindings and emplace_back
Some uses of std::make_pair and the std::pair's first/second members
in the ScheduleDAGInstrs.[cpp|h] files were replaced with using of the
vector's emplace_back along with structure bindings from C++17.
2022-09-13 12:49:04 +03:00
Sylvestre Ledru cd20a18286 Revert "[clang, llvm] Add __declspec(safebuffers), support it in CodeView"
Causing:
https://github.com/llvm/llvm-project/issues/57709

This reverts commit ab56719acd.
2022-09-13 10:53:59 +02:00
David Green f124e59b2e [TypePromotionPass] Don't treat phi's as ToPromote
This attempts to stop the type promotion pass transforming where it is
not profitable, by not marking PhiNodes as ToPromote and being more
aggressive about pulling extends out of loops.

Differential Revision: https://reviews.llvm.org/D133203
2022-09-13 08:57:15 +01:00
Matt Arsenault 920b2e65fc DeadMachineInstructionElim: Fix typo 2022-09-12 20:10:33 -04:00