Commit Graph

270 Commits

Author SHA1 Message Date
Simon Pilgrim 5d3a86489f [GlobalISel] Move getOpcode() calls inside assert() to avoid (void)s. NFC.
Tidier solution to the unused variable warnings - we already do this in other places in this file.
2022-02-07 09:50:27 +00:00
Djordje Todorovic def10a2895 [GlobalIsel] Fix another "unused variable" warning 2022-02-07 09:32:22 +01:00
Djordje Todorovic eab395fa40 Fix the warning after D118805
A variable was used within assert() only.
2022-02-07 09:25:02 +01:00
Róbert Ágoston cd4ed08b5a [GlobalISel] Don't combine instructions which are fed by memory instructions using different size
Memory instructions like extending loads from the same address are not equal if
their size is not equal.

This fixes https://github.com/llvm/llvm-project/issues/53524.

Differential Revision: https://reviews.llvm.org/D118805
2022-02-04 15:00:47 -08:00
Jessica Paquette 9a61e731ff [GlobalISel] Combine (G_*ADDO x, 0) -> x + no carry out
Similar to the G_*MULO change.

The code for checking if a constant is legal/pre-legalize is shared between
these, and is kind of hairy. So, factor it out into a new function:
`isConstantLegalOrBeforeLegalizer`.

To make the refactoring clean, further refactor `isLegalOrBeforeLegalizer` into
a wrapper for two functions:

- `isPreLegalize`
- `isLegal`

This is a bit easier to read in general.

https://godbolt.org/z/KW7oszP1o

Differential Revision: https://reviews.llvm.org/D118655
2022-02-03 14:25:15 -08:00
Jessica Paquette c636899dc1 [GlobalISel] Combine: (G_*MULO x, 0) -> 0 + no carry out
Similar to the following combine in `DAGCombiner::visitMULO`:

```
  // fold (mulo x, 0) -> 0 + no carry out
  if (isNullOrNullSplat(N1))
    return CombineTo(N, DAG.getConstant(0, DL, VT),
                     DAG.getConstant(0, DL, CarryVT));
```

This fixes some generally poor codegen for `*mulo`:

https://godbolt.org/z/eTxYsvz8f

Differential Revision: https://reviews.llvm.org/D118635
2022-02-03 14:23:58 -08:00
Sebastian Neubauer 4723f3cf03 [AMDGPU][GlobalISel] Combine unmerge of undef
Fold (unmerge undef) -> undef, undef, ...

Differential Revision: https://reviews.llvm.org/D118138
2022-01-26 12:30:36 +01:00
Abinav Puthan Purayil 68b70d17d8 [GlobalISel] Fold or of shifts with constant amount to funnel shift.
This change folds (or (shl x, C0), (lshr y, C1)) to funnel shift iff C0
and C1 are constants where C0 + C1 is the bit-width of the shift
instructions.

Differential Revision: https://reviews.llvm.org/D116529
2022-01-24 10:43:32 +05:30
Lucas Prates 283f5a198a [GlobalISel] Fix incorrect sign extension when combining G_INTTOPTR and G_PTR_ADD
The GlobalISel combiner currently uses sign extension when manipulating
the LHS constant when combining a sequence of the following sequence of
machine instructions into a single constant:
```
  %0:_(s32) = G_CONSTANT i32 <CONSTANT>
  %1:_(p0) = G_INTTOPTR %0:_(s32)
  %2:_(s64) = G_CONSTANT i64 <CONSTANT>
  %3:_(p0) = G_PTR_ADD %1:_, %2:_(s64)
```

This causes an issue when the bit width of the first contant and the
target pointer size are different, as G_INTTOPTR has no sign extension
semantics.

This patch fixes this by capture an arbitrary precision in when matching
the constant, allowing the matching function to correctly zero extend
it.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D116941
2022-01-20 17:02:52 +00:00
Petar Avramovic c8c5dc766b GlobalIsel: Fix fma combine when one of the operands comes from unmerge
Fma combine assumes that MRI.getVRegDef(Reg)->getOperand(0).getReg() = Reg
which is not true when Reg is defined by instruction with multiple defs
e.g. G_UNMERGE_VALUES.
Fix is to keep register and the instruction that defines register in
DefinitionAndSourceRegister and use when needed.

Differential Revision: https://reviews.llvm.org/D117032
2022-01-12 17:47:25 +01:00
Jay Foad 50fb44eebb [GlobalISel] Use getPreferredShiftAmountTy in one more G_UBFX combine
Change CombinerHelper::matchBitfieldExtractFromShrAnd to use
getPreferredShiftAmountTy for the shift-amount-like operands of G_UBFX
just like all the other G_[SU]BFX combines do. This better matches the
AMDGPU legality rules for these instructions.

Differential Revision: https://reviews.llvm.org/D116803
2022-01-08 09:20:44 +00:00
Jay Foad ff971873b3 [GlobalISel] Fix legality checks for G_UBFX combines
1. Fix CombinerHelper::matchBitfieldExtractFromAnd to check legality
   with the correct types for the G_UBFX that it builds.
2. Fix AMDGPUTargetLowering::isConstantUnsignedBitfieldExtractLegal to
   match the legality rules: result and first operand can be s32 or s64
   but the "shift amount" operands are always s32.
3. Add AMDGPU tests where the post-legalizer combiner would create
   illegal MIR without the above fixes.

Differential Revision: https://reviews.llvm.org/D116802
2022-01-08 09:20:44 +00:00
Jay Foad 3f3fe4a5cf [GlobalISel] Fix typo Extact to Extract in function name. NFC. 2022-01-07 11:13:35 +00:00
Jack Andersen f108c7f59d [GlobalISel] Allow DBG_VALUE to use undefined vregs before LiveDebugValues.
Expanding on D109750.

Since `DBG_VALUE` instructions have final register validity determined in
`LDVImpl::handleDebugValue`, there is no apparent reason to immediately prune
unused register operands as their defs are erased. Consequently, this renders
`MachineInstr::eraseFromParentAndMarkDBGValuesForRemoval` moot; gaining a
substantial performance improvement.

The only necessary changes involve making relevant passes consider invalid
DBG_VALUE vregs uses as valid.

Reviewed By: MatzeB

Differential Revision: https://reviews.llvm.org/D112852
2021-12-05 15:55:59 -05:00
Abinav Puthan Purayil bc5dbb0bae [GlobalISel] Add matchers for constant splat.
This change exposes isBuildVectorConstantSplat() to the llvm namespace
and uses it to implement the constant splat versions of
m_SpecificICst().

CombinerHelper::matchOrShiftToFunnelShift() can now work with vector
types and CombinerHelper::matchMulOBy2()'s match for a constant splat is
simplified.

Differential Revision: https://reviews.llvm.org/D114625
2021-11-30 15:18:50 +05:30
Mirko Brkusanin 0dd570ff56 [AMDGPU][GlobalISel] Transform (fsub (fpext (fneg (fmul x, y))), z) -> (fneg (fma (fpext x), (fpext y), z))
Patch by: Mateja Marjanovic

Differential Revision: https://reviews.llvm.org/D98050
2021-11-29 16:27:22 +01:00
Mirko Brkusanin 37c2a2201d [AMDGPU][GlobalISel] Transform (fsub (fpext (fmul x, y)), z) -> (fma (fpext x), (fpext y), (fneg z))
Patch by: Mateja Marjanovic

Differential Revision: https://reviews.llvm.org/D98049
2021-11-29 16:27:22 +01:00
Mirko Brkusanin 5fe7fcd28e [AMDGPU][GlobalISel] Transform (fsub (fneg (fmul, x, y)), z) -> (fma (fneg x), y, (fneg z))
Patch by: Mateja Marjanovic

Differential Revision: https://reviews.llvm.org/D98048
2021-11-29 16:27:22 +01:00
Mirko Brkusanin a782169270 [AMDGPU][GlobalISel] Transform (fsub (fmul x, y), z) -> (fma x, y, -z)
Patch by: Mateja Marjanovic

Differential Revision: https://reviews.llvm.org/D96614
2021-11-29 16:27:22 +01:00
Mirko Brkusanin e5e49a08f1 [AMDGPU][GlobalISel] Transform (fadd (fma x, y, (fpext (fmul u, v))), z) -> (fma x, y, (fma (fpext u), (fpext v), z))
Patch by: Mateja Marjanovic

Differential Revision: https://reviews.llvm.org/D98047
2021-11-29 16:27:21 +01:00
Mirko Brkusanin f732292536 [AMDGPU][GlobalISel] Transform (fadd (fma x, y, (fmul u, v)), z) -> (fma x, y, (fma u, v, z))
Patch by: Mateja Marjanovic

Differential Revision: https://reviews.llvm.org/D97938
2021-11-29 16:27:21 +01:00
Mirko Brkusanin 8951136216 [AMDGPU][GlobalISel] Transform (fadd (fpext (fmul x, y)), z) -> (fma (fpext x), (fpext y), z)
Patch by: Mateja Marjanovic

Differential Revision: https://reviews.llvm.org/D97937
2021-11-29 16:27:21 +01:00
Mirko Brkusanin 881840fc26 [AMDGPU][GlobalISel] Transform (fadd (fmul x, y), z) -> (fma x, y, z)
Patch by: Mateja Marjanovic

Differential Revision: https://reviews.llvm.org/D93305
2021-11-29 16:27:21 +01:00
Abinav Puthan Purayil 4af45f10cc [GlobalISel] Fold or of shifts to funnel shift.
This change folds a basic funnel shift idiom:
- (or (shl x, amt), (lshr y, sub(bw, amt))) -> fshl(x, y, amt)
- (or (shl x, sub(bw, amt)), (lshr y, amt)) -> fshr(x, y, amt)

This also helps in folding to rotate shift if x and y are equal since we
already have a funnel shift to rotate combine.

Differential Revision: https://reviews.llvm.org/D114499
2021-11-26 17:05:29 +05:30
Kazu Hirata 259cd6f893 [llvm] Use range-based for loops (NFC) 2021-11-25 22:17:10 -08:00
Kazu Hirata d45cb1d7ea [llvm] Use range-based for loops (NFC) 2021-11-23 08:54:48 -08:00
Mirko Brkusanin db6bc2ab51 [AMDGPU][GlobalISel] Fold G_FNEG above when users cannot fold mods
If possible fold fneg into instruction above if users cannot fold mods and we
know it will decrease instruction count.
Follows same logic as SDAG combiner in choosing opportunities to combine.

Differential Revision: https://reviews.llvm.org/D112827
2021-11-17 14:25:13 +01:00
Jon Roelofs b046eb19b8 [AArch64][GlobalISel] combine (and (or x, c1), c2) => (and x, c2) iff c1 & c2 == 0
https://godbolt.org/z/h8ejrG4hb

rdar://83597585

Differential Revision: https://reviews.llvm.org/D111856
2021-10-20 12:11:52 -07:00
Jon Roelofs 1300677f97 [AArch64][GlobalISel] combine and + [la]sr => ubfx
https://godbolt.org/z/h8ejrG4hb

rdar://83597585

Differential Revision: https://reviews.llvm.org/D111839
2021-10-18 10:33:01 -07:00
Amara Emerson 53ebfa7c5d [AArch64][GlobalISel] Fix combiner assertion in matchConstantOp().
We shouldn't call APInt::getSExtValue() on a >64b value.
2021-10-11 15:55:13 -07:00
Roman Lebedev 684cbae89a
[KnownBits] Introduce `countMaxActiveBits()` and use it in a few places 2021-10-11 23:36:06 +03:00
Amara Emerson f95d9c95bb [GlobalISel] Fix the stores of truncates -> wide store combine for non-evenly dividing type sizes.
If the wide store we'd generate is not a multiple of the memory type of the
narrow stores (e.g. s48 and s32), we'd assert. Fix that.
2021-10-09 21:18:20 -07:00
Amara Emerson 17b89f9daa [GlobalISel] Improve G_UMHULH -> LSHR combine to accept non-uniform constant vectors. 2021-10-08 11:25:26 -07:00
Mirko Brkusanin d20840c937 [GlobalISel] Combine for eliminating redundant operand negations
Differential Revision: https://reviews.llvm.org/D111319
2021-10-08 14:29:22 +02:00
Amara Emerson 08b3c0d995 [GlobalISel] Combine G_UMULH x, (1 << c)) -> x >> (bitwidth - c)
In order to not generate an unnecessary G_CTLZ, I extended the constant folder
in the CSEMIRBuilder to handle G_CTLZ. I also added some extra handing of
vector constants too. It seems we don't have any support for doing constant
folding of vector constants, so the tests show some other useless G_SUB
instructions too.

Differential Revision: https://reviews.llvm.org/D111036
2021-10-07 23:51:37 -07:00
Amara Emerson 8bfc0e06dc [GlobalISel] Port the udiv -> mul by constant combine.
This is a straight port from the equivalent DAG combine.

Differential Revision: https://reviews.llvm.org/D110890
2021-10-07 11:37:17 -07:00
Mirko Brkusanin 40e00063bc [GlobalISel] Combine fabs(fneg(x)) to fabs(x)
Differential Revision: https://reviews.llvm.org/D110943
2021-10-05 13:43:39 +02:00
Jay Foad a9bceb2b05 [APInt] Stop using soft-deprecated constructors and methods in llvm. NFC.
Stop using APInt constructors and methods that were soft-deprecated in
D109483. This fixes all the uses I found in llvm, except for the APInt
unit tests which should still test the deprecated methods.

Differential Revision: https://reviews.llvm.org/D110807
2021-10-04 08:57:44 +01:00
Amara Emerson ca8316b704 [GlobalISel] Extend CombinerHelper::matchConstantOp() to match constant splat vectors.
This allows the "x op 0 -> x" fold to optimize vector constant RHSs.

Differential Revision: https://reviews.llvm.org/D110802
2021-09-30 14:31:25 -07:00
Amara Emerson 80f4bb5c61 [GlobalISel] Extend G_SELECT of known condition combine to vectors.
Adds a new utility function: isConstantOrConstantSplatVector().

Differential Revision: https://reviews.llvm.org/D110786
2021-09-30 12:16:44 -07:00
Jessica Paquette 15a24e1fdb [GlobalISel] Combine mulo x, 2 -> addo x, x
Similar to what SDAG does when it sees a smulo/umulo against 2
(see: `DAGCombiner::visitMULO`)

This pattern is fairly common in Swift code AFAICT.

Here's an example extracted from a Swift testcase:

https://godbolt.org/z/6cT8Mesx7

Differential Revision: https://reviews.llvm.org/D110662
2021-09-28 16:59:43 -07:00
Petar Avramovic d477a7c2e7 GlobalISel/Utils: Refactor integer/float constant match functions
Rework getConstantstVRegValWithLookThrough in order to make it clear if we
are matching integer/float constant only or any constant(default).
Add helper functions that get DefVReg and APInt/APFloat from constant instr
getIConstantVRegValWithLookThrough: integer constant, only G_CONSTANT
getFConstantVRegValWithLookThrough: float constant, only G_FCONSTANT
getAnyConstantVRegValWithLookThrough: either G_CONSTANT or G_FCONSTANT

Rename getConstantVRegVal and getConstantVRegSExtVal to getIConstantVRegVal
and getIConstantVRegSExtVal. These now only match G_CONSTANT as described
in comment.

Relevant matchers now return both DefVReg and APInt/APFloat.

Replace existing uses of getConstantstVRegValWithLookThrough and
getConstantVRegVal with new helper functions. Any constant match is
only required in:
ConstantFoldBinOp: for constant argument that was bit-cast of float to int
getAArch64VectorSplat: AArch64::G_DUP operands can be any constant
amdgpu select for G_BUILD_VECTOR_TRUNC: operands can be any constant

In other places use integer only constant match.

Differential Revision: https://reviews.llvm.org/D104409
2021-09-17 11:22:13 +02:00
Konstantin Schwarz d2e66d7fa4 [GlobalISel] Add a combine for and(load , mask) -> zextload
This only handles simple masks, not shifted masks, for now.

Reviewed By: aemerson

Differential Revision: https://reviews.llvm.org/D109357
2021-09-16 10:42:46 +02:00
Ahmed Bougacha 94a2f9cdb6 [GlobalISel] Fix CombinerHelper::isPredecessor for same def/use MI.
The doc comment for isPredecessor says:
  Returns true if \p DefMI precedes \p UseMI or they are the same
  instruction.
And dominates relies on that behavior for its own:
  Returns true if \p DefMI dominates \p UseMI. By definition an
  instruction dominates itself.

Make both statements correct by fixing isPredecessor.
Found by inspection.
2021-09-15 16:45:27 -07:00
Amara Emerson 5ec1845cad [AArch64][GlobalISel] Add a new reassociation for G_PTR_ADDs.
G_PTR_ADD (G_PTR_ADD X, C), Y) -> (G_PTR_ADD (G_PTR_ADD(X, Y), C)

Improves CTMark -Os on AArch64:

Program            before after  diff
           sqlite3 286932 287024  0.0%
                kc 432512 432508 -0.0%
             SPASS 412788 412764 -0.0%
    pairlocalalign 249460 249416 -0.0%
            bullet 475740 475512 -0.0%
    7zip-benchmark 568864 568356 -0.1%
  consumer-typeset 419088 418648 -0.1%
        tramp3d-v4 367628 367224 -0.1%
          clamscan 383184 382732 -0.1%
            lencod 430028 429284 -0.2%
Geomean difference               -0.1%

Differential Revision: https://reviews.llvm.org/D109528
2021-09-14 23:57:41 -07:00
Chris Lattner 735f46715d [APInt] Normalize naming on keep constructors / predicate methods.
This renames the primary methods for creating a zero value to `getZero`
instead of `getNullValue` and renames predicates like `isAllOnesValue`
to simply `isAllOnes`.  This achieves two things:

1) This starts standardizing predicates across the LLVM codebase,
   following (in this case) ConstantInt.  The word "Value" doesn't
   convey anything of merit, and is missing in some of the other things.

2) Calling an integer "null" doesn't make any sense.  The original sin
   here is mine and I've regretted it for years.  This moves us to calling
   it "zero" instead, which is correct!

APInt is widely used and I don't think anyone is keen to take massive source
breakage on anything so core, at least not all in one go.  As such, this
doesn't actually delete any entrypoints, it "soft deprecates" them with a
comment.

Included in this patch are changes to a bunch of the codebase, but there are
more.  We should normalize SelectionDAG and other APIs as well, which would
make the API change more mechanical.

Differential Revision: https://reviews.llvm.org/D109483
2021-09-09 09:50:24 -07:00
Amara Emerson eae44c8a86 [GlobalISel] Implement merging of stores of truncates.
This is a port of a combine which matches a pattern where a wide type scalar
value is stored by several narrow stores. It folds it into a single store or
a BSWAP and a store if the targets supports it.

Assuming little endian target:
 i8 *p = ...
 i32 val = ...
 p[0] = (val >> 0) & 0xFF;
 p[1] = (val >> 8) & 0xFF;
 p[2] = (val >> 16) & 0xFF;
 p[3] = (val >> 24) & 0xFF;
=>
 *((i32)p) = val;

On CTMark AArch64 -Os this results in a good amount of savings:

Program            before        after       diff
             SPASS 412792       412788       -0.0%
                kc 432528       432512       -0.0%
            lencod 430112       430096       -0.0%
  consumer-typeset 419156       419128       -0.0%
            bullet 475840       475752       -0.0%
        tramp3d-v4 367760       367628       -0.0%
          clamscan 383388       383204       -0.0%
    pairlocalalign 249764       249476       -0.1%
    7zip-benchmark 570100       568860       -0.2%
           sqlite3 287628       286920       -0.2%
Geomean difference                           -0.1%

Differential Revision: https://reviews.llvm.org/D109419
2021-09-08 17:06:33 -07:00
Mirko Brkusanin 36527cbe02 [AMDGPU][GlobalISel] Legalize memcpy family of intrinsics
Legalize G_MEMCPY, G_MEMMOVE, G_MEMSET and G_MEMCPY_INLINE.

Corresponding intrinsics are replaced by a loop that uses loads/stores in
AMDGPULowerIntrinsics pass unless their length is a constant lower then
MemIntrinsicExpandSizeThresholdOpt (default 1024). Any G_MEM* instruction that
reaches legalizer should have a const length argument and should be expanded
into appropriate number of loads + stores.

Differential Revision: https://reviews.llvm.org/D108357
2021-09-07 12:24:07 +02:00
Konstantin Schwarz 90d5298759 [GlobalISel] Add convenience constructors to MemDesc
This allows constructing a MemDesc from a MachineMemoryOperand, a pattern that starts to show up more frequently.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D109161
2021-09-03 12:52:18 +02:00
Jessica Paquette 844d8e0337 [GlobalISel] Combine icmp eq/ne x, 0/1 -> x when x == 0 or 1
This adds the following combines:

```
x = ... 0 or 1
c = icmp eq x, 1

->

c = x
```

and

```
x = ... 0 or 1
c = icmp ne x, 0

->

c = x
```

When the target's true value for the relevant types is 1.

This showed up in the following situation:

https://godbolt.org/z/M5jKexWTW

SDAG currently supports the `ne` case, but not the `eq` case. This can probably
be further generalized, but I don't feel like thinking that hard right now.

This gives some minor code size improvements across the board on CTMark at
-Os for AArch64. (0.1% for 7zip and pairlocalalign in particular.)

Differential Revision: https://reviews.llvm.org/D109130
2021-09-02 15:05:31 -07:00
Konstantin Schwarz 4b4bc1ea16 [GlobalISel] Do not generate illegal G_SEXTLOADs after legalization
The sext_inreg_of_load combine did not have the isLegalOrBeforeLegalizer check,
leading to the generation of potentially illegal G_SEXTLOADs when run after legalization.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D108626
2021-08-25 10:13:39 +02:00
Sebastian Neubauer fbae34635d [GlobalISel] Add combine for PTR_ADD with regbanks
Combine two G_PTR_ADDs, but keep the register bank of the constant.
That way, the combine can be used in post-regbank-select combines.

Introduce two helper methods in CombinerHelper, getRegBank and
setRegBank that get and set an optional register bank to a register.
That way, they can be used before and after register bank selection.

Differential Revision: https://reviews.llvm.org/D103326
2021-08-17 13:58:16 +02:00
Jessica Paquette 50efbf9cbe [GlobalISel] Narrow binops feeding into G_AND with a mask
This is a fairly common pattern:

```
%mask = G_CONSTANT iN <mask val>
%add = G_ADD %lhs, %rhs
%and = G_AND %add, %mask
```

We have combines to eliminate G_AND with a mask that does nothing.

If we combined the above to this:

```
%mask = G_CONSTANT iN <mask val>
%narrow_lhs = G_TRUNC %lhs
%narrow_rhs = G_TRUNC %rhs
%narrow_add = G_ADD %narrow_lhs, %narrow_rhs
%ext = G_ZEXT %narrow_add
%and = G_AND %ext, %mask
```

We'd be able to take advantage of those combines using the trunc + zext.

For this to work (or be beneficial in the best case)

- The operation we want to narrow then widen must only be used by the G_AND
- The G_TRUNC + G_ZEXT must be free
- Performing the operation at a narrower width must not produce a different
  value than performing it at the original width *after masking.*

Example comparison between SDAG + GISel: https://godbolt.org/z/63jzb1Yvj

At -Os for AArch64, this is a 0.2% code size improvement on CTMark/pairlocalign.

Differential Revision: https://reviews.llvm.org/D107929
2021-08-13 18:31:13 -07:00
Amara Emerson 7ec4ce157b [AArch64][GlobalISel] Relax oneuse restriction for PTR_ADD chain combining to check addressing legality.
With contributions by Sebastian Neubauer

Differential Revision: https://reviews.llvm.org/D105676
2021-08-10 16:41:18 -07:00
Amara Emerson 4c2e01232c [GlobalISel] Fix a combine causing DBG_VALUE with dangling vregs.
We should use MachineInstr::eraseFromParentAndMarkDBGValuesForRemoval()
instead of eraseFromParent().

We should probably use that in other places too but fix this issue which
affects clang bootstrap builds for now.
2021-08-07 01:41:02 -07:00
Petar Avramovic 66de26b1f9 GlobalISel: Fix matchEqualDefs for instructions with multiple defs
Instructions that produceSameValue produce same values for operands with
same index. matchEqualDefs used to return true for any two values from
different instructions that produce same values. Fix this by checking if
values are defined by operands with the same index.

Differential Revision: https://reviews.llvm.org/D107362
2021-08-05 15:05:45 +02:00
Dominik Montada cc947e29ea [GlobalISel] Combine shr(shl x, c1), c2 to G_SBFX/G_UBFX
Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D107330
2021-08-05 13:52:10 +02:00
Amara Emerson c54d5c9756 [GlobalISel] Use GMergeLikeOp to simplify a combine. NFC. 2021-07-29 13:53:16 -07:00
Amara Emerson 532c458fa8 [GlobalISel] Add GPtrAdd and use it in some combines. 2021-07-29 12:04:02 -07:00
Amara Emerson c658b472f3 [GlobalISel] Add a constant folding combine.
Use it AArch64 post-legal combiner. These don't always get folded because when
the instructions are created the constants are obscured by artifacts.

Differential Revision: https://reviews.llvm.org/D106776
2021-07-26 14:53:33 -07:00
Amara Emerson dec34104bf [GlobalISel] Add combine for merge(unmerge) and use AArch64 postlegal-combiner.
Differential Revision: https://reviews.llvm.org/D106761
2021-07-26 10:37:31 -07:00
Amara Emerson 03cdb5221d [GlobalISel] Fix load-or combine moving loads across potential aliasing stores.
Although this combine checks that there's no load folding barriers between
the loads that it's trying to merge, it was inserting the load at the
MIRBuilder's default insertion point, which is the G_OR use inst.

This was causing a miscompile in the test suite's
SingleSource/Regression/C/gcc-c-torture/execute/GCC-C-execute-bswap-2

Differential Revision: https://reviews.llvm.org/D106251
2021-07-19 10:23:23 -07:00
Matt Arsenault 5a0d940f2a GlobalISel: Preserve memory type for memset expansion 2021-07-16 11:41:32 -04:00
Amara Emerson 4e3dc6b8dd GlobalISel: Introduce GenericMachineInstr classes and derivatives for idiomatic LLVM RTTI.
This adds some level of type safety, allows helper functions to be added for
specific opcodes for free, and also allows us to succinctly check for class
membership with the usual dyn_cast/isa/cast functions.

To start off with, add variants for the different load/store operations with some
places using it.

Differential Revision: https://reviews.llvm.org/D105751
2021-07-15 15:21:57 -07:00
Jessica Paquette 5da0f9ab61 [GlobalISel] Fix infinite loop in reassociationCanBreakAddressingModePattern
It didn't update the opcode while walking through G_INTTOPTR/G_PTRTOINT.

Differential Revision: https://reviews.llvm.org/D106080
2021-07-15 10:09:07 -07:00
Amara Emerson f30251f527 [GlobalISel] Clean up CombinerHelper::apply* functions to return void.
For some reason we/I started writing these as returning bool when the return value
is actually ignored by the combiner.
2021-07-02 13:17:06 -07:00
Amara Emerson 0111da2ef8 [GlobalISel] Add re-association combine for G_PTR_ADD to allow better addressing mode usage.
We're trying to match a few pointer computation patterns here for
re-association opportunities.
1) Isolating a constant operand to be on the RHS, e.g.:
   G_PTR_ADD(BASE, G_ADD(X, C)) -> G_PTR_ADD(G_PTR_ADD(BASE, X), C)

2) Folding two constants in each sub-tree as long as such folding
   doesn't break a legal addressing mode.
   G_PTR_ADD(G_PTR_ADD(BASE, C1), C2) -> G_PTR_ADD(BASE, C1+C2)

AArch64 code size improvements on CTMark with -Os:
Program              before  after   diff
 pairlocalalign      251048  251044 -0.0%
 consumer-typeset    421820  421812 -0.0%
 kc                  431348  431320 -0.0%
 SPASS               413404  413300 -0.0%
 clamscan            384396  384220 -0.0%
 tramp3d-v4          370640  370412 -0.1%
 lencod              432096  431772 -0.1%
 bullet              479400  478796 -0.1%
 sqlite3             288504  288072 -0.1%
 7zip-benchmark      573796  570768 -0.5%
 Geomean difference                 -0.1%

Differential Revision: https://reviews.llvm.org/D105069
2021-07-02 12:31:21 -07:00
Matt Arsenault 28f2f66200 GlobalISel: Use LLT in memory legality queries
This enables proper lowering of non-byte sized loads. We still aren't
faithfully preserving memory types everywhere, so the legality checks
still only consider the size.
2021-06-30 17:44:13 -04:00
Jon Roelofs a642872476 [GISel] Support llvm.memcpy.inline
Differential revision: https://reviews.llvm.org/D105072
2021-06-30 12:39:05 -07:00
Brendon Cahoon f9f5d41545 [AMDGPU][GlobalISel] Legalize and select G_SBFX and G_UBFX
Adds legalizer, register bank select, and instruction
select support for G_SBFX and G_UBFX. These opcodes generate
scalar or vector ALU bitfield extract instructions for
AMDGPU. The instructions allow both constant or register
values for the offset and width operands.

The 32-bit scalar version is expanded to a sequence that
combines the offset and width into a single register.

There are no 64-bit vgpr bitfield extract instructions, so the
operations are expanded to a sequence of instructions that
implement the operation. If the width is a constant,
then the 32-bit bitfield extract instructions are used.

Moved the AArch64 specific code for creating G_SBFX to
CombinerHelper.cpp so that it can be used by other targets.
Only bitfield extracts with constant offset and width values
are handled currently.

Differential Revision: https://reviews.llvm.org/D100149
2021-06-28 09:06:44 -04:00
Eli Friedman 74909e4b6e Rename MachineMemOperand::getOrdering -> getSuccessOrdering.
Since this method can apply to cmpxchg operations, make sure it's clear
what value we're actually retrieving.  This will help ensure we don't
accidentally ignore the failure ordering of cmpxchg in the future.

We could potentially introduce a getOrdering() method on AtomicSDNode
that asserts the operation isn't cmpxchg, but not sure that's
worthwhile.

Differential Revision: https://reviews.llvm.org/D103338
2021-06-21 16:49:27 -07:00
Jon Roelofs a2ab765029 [GISel] Eliminate redundant bitmasking
This was a GISel vs SDAG regression that showed up at -Os on arm64 in:
SingleSource/Benchmarks/Adobe-C++/simple_types_constant_folding.test

https://llvm.godbolt.org/z/aecjodsjG

Differential revision: https://reviews.llvm.org/D103334
2021-06-17 12:53:00 -07:00
Jessica Paquette e7f501b5e7 [GlobalISel][AArch64] Combine and (lshr x, cst), mask -> ubfx x, cst, width
Also add a target hook which allows us to get around custom legalization on
AArch64.

Differential Revision: https://reviews.llvm.org/D99283
2021-06-01 10:56:17 -07:00
Amara Emerson 57ea5d4f48 [GlobalISel] Fix div+rem -> divrem combine causing use-def violation. 2021-05-19 23:13:41 -07:00
Jessica Paquette 84ae1cf8ed Recommit "[GlobalISel] Simplify G_ICMP to true/false when the result is known"
Add missing REQUIRES line to
prelegalizer-combiner-icmp-to-true-false-known-bits.
2021-05-19 09:29:19 -07:00
Nico Weber 52a7797626 Revert "[GlobalISel] Simplify G_ICMP to true/false when the result is known"
This reverts commit 892497c806.
Breaks tests, see comments on https://reviews.llvm.org/D102542
2021-05-19 09:02:27 -04:00
Jessica Paquette 892497c806 [GlobalISel] Simplify G_ICMP to true/false when the result is known
Use existing KnownBits helpers from KnownBits.h to simplify G_ICMPs.

E.g.

x == x -> true
x != x -> false
load(x) > 1 -> true (when the load is known to be greater than 1)

And so on.

Differential Revision: https://reviews.llvm.org/D102542
2021-05-18 09:26:41 -07:00
Amara Emerson 808bc11d9e [GlobalISel] Don't form zero/sign extending loads for atomics.
For importing patterns, we only support matching G_LOAD, not G_ZEXTLOAD or
G_SEXTLOAD.

Differential Revision: https://reviews.llvm.org/D101932
2021-05-07 16:41:48 -07:00
Amara Emerson 1ccebb18ef [GlobalISel] Micro-optimize the conditional branch optimization.
Convert a check into an assert and pass an MI instead of recomputing in the
apply function.
2021-05-07 00:03:09 -07:00
Amara Emerson 96ec6d91e4 [AArch64][GlobalISel] Simplify out of range rotate amount.
Differential Revision: https://reviews.llvm.org/D101005
2021-04-29 14:05:58 -07:00
Yang Fan 0d7fd9f0d0
[GlobalISel] Fix Wint-in-bool-context warning (NFC)
GCC warning:
```
/llvm-project/llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp: In member function ‘bool llvm::CombinerHelper::matchFunnelShiftToRotate(llvm::MachineInstr&)’:
/llvm-project/llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp:3882:35: warning: ?: using integer constants in boolean context, the expression will always evaluate to ‘true’ [-Wint-in-bool-context]
 3882 |       Opc == TargetOpcode::G_FSHL ? TargetOpcode::G_ROTL : TargetOpcode::G_ROTR;
      |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
```
2021-03-31 09:59:43 +08:00
Amara Emerson 91887cd4ec [AArch64][GlobalISel] Combine funnel shifts to rotates.
Differential Revision: https://reviews.llvm.org/D99388
2021-03-30 11:00:36 -07:00
Jessica Paquette 700431128e [GlobalISel][AArch64] Combine G_SEXT_INREG + right shift -> G_SBFX
Basically a port of isBitfieldExtractOpFromSExtInReg in AArch64ISelDAGToDAG.

This is only done post-legalization for now. Once the legalizer knows how to
decompose these back into shifts, this requirement can probably be removed.

Differential Revision: https://reviews.llvm.org/D99230
2021-03-30 10:14:30 -07:00
Tomas Matheson a9968c0a33 [NFC][CodeGen] Tidy up TargetRegisterInfo stack realignment functions
Currently needsStackRealignment returns false if canRealignStack returns false.
This means that the behavior of needsStackRealignment does not correspond to
it's name and description; a function might need stack realignment, but if it
is not possible then this function returns false. Furthermore,
needsStackRealignment is not virtual and therefore some backends have made use
of canRealignStack to indicate whether a function needs stack realignment.

This patch attempts to clarify the situation by separating them and introducing
new names:

 - shouldRealignStack - true if there is any reason the stack should be
   realigned

 - canRealignStack - true if we are still able to realign the stack (e.g. we
   can still reserve/have reserved a frame pointer)

 - hasStackRealignment = shouldRealignStack && canRealignStack (not target
   customisable)

Targets can now override shouldRealignStack to indicate that stack realignment
is required.

This change will make it easier in a future change to handle the case where we
need to realign the stack but can't do so (for example when the register
allocator creates an aligned spill after the frame pointer has been
eliminated).

Differential Revision: https://reviews.llvm.org/D98716

Change-Id: Ib9a4d21728bf9d08a545b4365418d3ffe1af4d87
2021-03-30 17:31:39 +01:00
Christudasan Devadasan 4c6ab48fb1 GlobalISel: Try to combine G_[SU]DIV and G_[SU]REM
It is good to have a combined `divrem` instruction when the
`div` and `rem` are computed from identical input operands.
Some targets can lower them through a single expansion that
computes both division and remainder. It effectively reduces
the number of instructions than individually expanding them.

Reviewed By: arsenm, paquette

Differential Revision: https://reviews.llvm.org/D96013
2021-03-10 18:46:07 +05:30
Amara Emerson 55e760769b [GlobalISel] Fold away G_BUILD_VECTOR with all elements extracted.
If every element is extracted from a G_BUILD_VECTOR, pass through the source
registers. This is different to the extract(build_vector) combine because this
one tolerates multiple users as long as they're exhaustive.

Differential Revision: https://reviews.llvm.org/D97890
2021-03-09 11:34:26 -08:00
Amara Emerson e60ab72137 [AArch64][GlobalISel] Add combine for extract_vector_elt(build_vector, cst)
Differential Revision: https://reviews.llvm.org/D97835
2021-03-09 11:08:02 -08:00
Petar Avramovic d44f61f81c Reland [GlobalISel] Combine zext(trunc x) to x
Recommit 4112299ee7. Depends on
4c8fb7ddd6 which was reverted.

Combine zext(trunc x) to x when truncated bits are known to be zero.

Differential Revision: https://reviews.llvm.org/D96031
2021-03-05 11:05:37 +01:00
Nico Weber 59beb1ef6d Revert "[GlobalISel] Combine zext(trunc x) to x"
This reverts commit 4112299ee7.
Seems to depend on 4c8fb7ddd6 which
is being reverted.
2021-03-04 10:13:40 -05:00
Petar Avramovic 4112299ee7 [GlobalISel] Combine zext(trunc x) to x
Combine zext(trunc x) to x when truncated bits are known to be zero.

Differential Revision: https://reviews.llvm.org/D96031
2021-03-04 15:05:23 +01:00
Jay Foad a6be26710b [GlobalISel] Make more use of replaceSingleDefInstWithReg. NFC. 2021-02-23 17:08:34 +00:00
Amara Emerson 5d6d9b63a3 [GlobalISel] Propagate extends through G_PHIs into the incoming value blocks.
This combine tries to do inter-block hoisting of extends of G_PHIs, into the
originating blocks of the phi's incoming value. The idea is to expose further
optimization opportunities that are normally obscured by the PHI.

Some basic heuristics, and a target hook for AArch64 is added, to allow tuning.
E.g. if the extend is used by a G_PTR_ADD, it doesn't perform this combine
since it may be folded into the addressing mode during selection.

There are very minor code size improvements on AArch64 -Os, but the real benefit
is that it unlocks optimizations like AArch64 conditional compares on some
benchmarks.

Differential Revision: https://reviews.llvm.org/D95703
2021-02-12 11:52:52 -08:00
Amara Emerson de035c18cf [GlobalISel] Fix sext_inreg(load) combine to not move the originating load.
The builder was using the extend user as the insertion point, which meant that
we were incorrectly "moving" the load from its original position, and therefore
could violate memory operation ordering.
2021-02-11 19:27:09 -08:00
Craig Topper 11ef356d9e [TargetLowering] Use Align in allowsMisalignedMemoryAccesses.
Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D96097
2021-02-04 19:22:06 -08:00
Jessica Paquette 02d4b365bf [GlobalISel] Check if branches use the same MBB in matchOptBrCondByInvertingCond
If the G_BR + G_BRCOND in this combine use the same MBB, then it will infinite
loop. Don't allow that to happen.

Differential Revision: https://reviews.llvm.org/D95895
2021-02-02 15:38:48 -08:00
Jessica Paquette cbf5246359 Fix buildbot after cfc6073017
Windows buildbots were not happy with using find_if + instructionsWithoutDebug.

In cfc6073017, instructionsWithoutDebug is not technically necessary. So,
just iterate over the block directly.

http://lab.llvm.org:8011/#/builders/127/builds/4732/steps/7/logs/stdio
2021-01-19 10:38:04 -08:00
Jessica Paquette cfc6073017 [GlobalISel] Combine (a[0]) | (a[1] << k1) | ...| (a[m] << kn) into a wide load
This is a restricted version of the combine in `DAGCombiner::MatchLoadCombine`.
(See D27861)

This tries to recognize patterns like below (assuming a little-endian target):

```
s8* x = ...
s32 val = a[0] | (a[1] << 8) | (a[2] << 16) | (a[3] << 24)
->
s32 val = *((i32)a)

s8* x = ...
s32 val = a[3] | (a[2] << 8) | (a[1] << 16) | (a[0] << 24)
->
s32 val = BSWAP(*((s32)a))
```

(This patch also handles the big-endian target case as well, in which the first
example above has a BSWAP, and the second example above does not.)

To recognize the pattern, this searches from the last G_OR in the expression
tree.

E.g.

```
    Reg   Reg
     \    /
      OR_1   Reg
       \    /
        OR_2
          \     Reg
           .. /
          Root
```

Each non-OR register in the tree is put in a list. Each register in the list is
then checked to see if it's an appropriate load + shift logic.

If every register is a load + potentially a shift, the combine checks if those
loads + shifts, when OR'd together, are equivalent to a wide load (possibly with
a BSWAP.)

To simplify things, this patch

(1) Only handles G_ZEXTLOADs (which appear to be the common case)
(2) Only works in a single MachineBasicBlock
(3) Only handles G_SHL as the bit twiddling to stick the small load into a
    specific location

An IR example of this is here: https://godbolt.org/z/4sP9Pj (lifted from
test/CodeGen/AArch64/load-combine.ll)

At -Os on AArch64, this is a 0.5% code size improvement for CTMark/sqlite3,
and a 0.4% improvement for CTMark/7zip-benchmark.

Also fix a bug in `isPredecessor` which caused it to fail whenever `DefMI` was
the first instruction in the block.

Differential Revision: https://reviews.llvm.org/D94350
2021-01-19 10:24:27 -08:00
Matt Arsenault 1f9b6ef91f GlobalISel: Add combine for G_UREM by power of 2
Really I want this in the legalizer, but this is a start.
2021-01-07 16:36:35 -05:00
Amara Emerson 7df3544e80 [GlobalISel] Fix assertion failures after "GlobalISel: Return APInt from getConstantVRegVal" landed.
APInt binary ops don't promote types but instead assert, which a combine was
relying on.
2020-12-26 23:51:44 -08:00
Matt Arsenault 581d13f8ae GlobalISel: Return APInt from getConstantVRegVal
Returning int64_t was arbitrarily limiting for wide integer types, and
the functions should handle the full generality of the IR.

Also changes the full form which returns the originally defined
vreg. Add another wrapper for the common case of just immediately
converting to int64_t (arguably this would be useful for the full
return value case as well).

One possible issue with this change is some of the existing uses did
break without conversion to getConstantVRegSExtVal, and it's possible
some without adequate test coverage are now broken.
2020-12-22 22:23:58 -05:00