llvm-project

Commit Graph

Author	SHA1	Message	Date
Sebastian Neubauer	2a6c871596	[InstCombine] Move target-specific inst combining For a long time, the InstCombine pass handled target specific intrinsics. Having target specific code in general passes was noted as an area for improvement for a long time. D81728 moves most target specific code out of the InstCombine pass. Applying the target specific combinations in an extra pass would probably result in inferior optimizations compared to the current fixed-point iteration, therefore the InstCombine pass resorts to newly introduced functions in the TargetTransformInfo when it encounters unknown intrinsics. The patch should not have any effect on generated code (under the assumption that code never uses intrinsics from a foreign target). This introduces three new functions: TargetTransformInfo::instCombineIntrinsic TargetTransformInfo::simplifyDemandedUseBitsIntrinsic TargetTransformInfo::simplifyDemandedVectorEltsIntrinsic A few target specific parts are left in the InstCombine folder, where it makes sense to share code. The largest left-over part in InstCombineCalls.cpp is the code shared between arm and aarch64. This allows to move about 3000 lines out from InstCombine to the targets. Differential Revision: https://reviews.llvm.org/D81728	2020-07-22 15:59:49 +02:00
Simon Tatham	01aefae4a1	[ARM,MVE] Add an InstCombine rule permitting VPNOT. Summary: If a user writing C code using the ACLE MVE intrinsics generates a predicate and then complements it, then the resulting IR will use the `pred_v2i` IR intrinsic to turn some `<n x i1>` vector into a 16-bit integer; complement that integer; and convert back. This will generate machine code that moves the predicate out of the `P0` register, complements it in an integer GPR, and moves it back in again. This InstCombine rule replaces `i2v(~v2i(x))` with a direct complement of the original predicate vector, which we can already instruction- select as the VPNOT instruction which complements P0 in place. Reviewers: ostannard, MarkMurrayARM, dmgreen Reviewed By: dmgreen Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70484	2019-12-02 16:20:30 +00:00
Simon Tatham	f4f77aa53e	[ARM,MVE] Add InstCombine rules for pred_i2v / pred_v2i. If you're writing C code using the ACLE MVE intrinsics that passes the result of a vcmp as input to a predicated intrinsic, e.g. mve_pred16_t pred = vcmpeqq(v1, v2); v_out = vaddq_m(v_inactive, v3, v4, pred); then clang's codegen for the compare intrinsic will create calls to `@llvm.arm.mve.pred.v2i` to convert the output of `icmp` into an `mve_pred16_t` integer representation, and then the next intrinsic will call `@llvm.arm.mve.pred.i2v` to convert it straight back again. This will be visible in the generated code as a `vmrs`/`vmsr` pair that move the predicate value pointlessly out of `p0` and back into it again. To prevent that, I've added InstCombine rules to remove round trips of the form `v2i(i2v(x))` and `i2v(v2i(x))`. Also I've taught InstCombine about the known and demanded bits of those intrinsics. As a result, you now get just the generated code you wanted: vpt.u16 eq, q1, q2 vaddt.u16 q0, q3, q4 Reviewers: ostannard, MarkMurrayARM, dmgreen Reviewed By: dmgreen Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70313	2019-11-18 10:39:30 +00:00

Author

SHA1

Message

Date

Sebastian Neubauer

2a6c871596

[InstCombine] Move target-specific inst combining

For a long time, the InstCombine pass handled target specific
intrinsics. Having target specific code in general passes was noted as
an area for improvement for a long time.

D81728 moves most target specific code out of the InstCombine pass.
Applying the target specific combinations in an extra pass would
probably result in inferior optimizations compared to the current
fixed-point iteration, therefore the InstCombine pass resorts to newly
introduced functions in the TargetTransformInfo when it encounters
unknown intrinsics.
The patch should not have any effect on generated code (under the
assumption that code never uses intrinsics from a foreign target).

This introduces three new functions:
TargetTransformInfo::instCombineIntrinsic
TargetTransformInfo::simplifyDemandedUseBitsIntrinsic
TargetTransformInfo::simplifyDemandedVectorEltsIntrinsic

A few target specific parts are left in the InstCombine folder, where
it makes sense to share code. The largest left-over part in
InstCombineCalls.cpp is the code shared between arm and aarch64.

This allows to move about 3000 lines out from InstCombine to the targets.

Differential Revision: https://reviews.llvm.org/D81728

2020-07-22 15:59:49 +02:00

Simon Tatham

01aefae4a1

[ARM,MVE] Add an InstCombine rule permitting VPNOT.

Summary:
If a user writing C code using the ACLE MVE intrinsics generates a
predicate and then complements it, then the resulting IR will use the
`pred_v2i` IR intrinsic to turn some `<n x i1>` vector into a 16-bit
integer; complement that integer; and convert back. This will generate
machine code that moves the predicate out of the `P0` register,
complements it in an integer GPR, and moves it back in again.

This InstCombine rule replaces `i2v(~v2i(x))` with a direct complement
of the original predicate vector, which we can already instruction-
select as the VPNOT instruction which complements P0 in place.

Reviewers: ostannard, MarkMurrayARM, dmgreen

Reviewed By: dmgreen

Subscribers: kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70484

2019-12-02 16:20:30 +00:00

Simon Tatham

f4f77aa53e

[ARM,MVE] Add InstCombine rules for pred_i2v / pred_v2i.

If you're writing C code using the ACLE MVE intrinsics that passes the
result of a vcmp as input to a predicated intrinsic, e.g.

  mve_pred16_t pred = vcmpeqq(v1, v2);
  v_out = vaddq_m(v_inactive, v3, v4, pred);

then clang's codegen for the compare intrinsic will create calls to
`@llvm.arm.mve.pred.v2i` to convert the output of `icmp` into an
`mve_pred16_t` integer representation, and then the next intrinsic
will call `@llvm.arm.mve.pred.i2v` to convert it straight back again.
This will be visible in the generated code as a `vmrs`/`vmsr` pair
that move the predicate value pointlessly out of `p0` and back into it again.

To prevent that, I've added InstCombine rules to remove round trips of
the form `v2i(i2v(x))` and `i2v(v2i(x))`. Also I've taught InstCombine
about the known and demanded bits of those intrinsics. As a result,
you now get just the generated code you wanted:

  vpt.u16 eq, q1, q2
  vaddt.u16 q0, q3, q4

Reviewers: ostannard, MarkMurrayARM, dmgreen

Reviewed By: dmgreen

Subscribers: kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70313

2019-11-18 10:39:30 +00:00

3 Commits