Commit Graph

273 Commits

Author SHA1 Message Date
Craig Topper 944cc5e0ab [SelectionDAGBuilder][CGP][X86] Move some of SDB's gather/scatter uniform base handling to CGP.
I've always found the "findValue" a little odd and
inconsistent with other things in SDB.

This simplfifies the code in SDB to just handle a splat constant
address or a 2 operand GEP in the same BB. This removes the
need for "findValue" since the operands to the GEP are
guaranteed to be available. The splat constant handling is
new, but was needed to avoid regressions due to constant
folding combining GEPs created in CGP.

CGP is now responsible for canonicalizing gather/scatters into
this form. The pattern I'm using for scalarizing, a scalar GEP
followed by a GEP with an all zeroes index, seems to be subject
to constant folding that the insertelement+shufflevector was not.

Differential Revision: https://reviews.llvm.org/D76947
2020-04-16 17:49:22 -07:00
Guozhi Wei 6d20937c29 [CodeGenPrepare] Delete intrinsic call to llvm.assume to enable more tailcall
The attached test case is simplified from tcmalloc. Both function calls should be optimized as tailcall. But llvm can only optimize the first call. The second call can't be optimized because function dupRetToEnableTailCallOpts failed to duplicate ret into block case2.

There 2 problems blocked the duplication:

  1 Intrinsic call llvm.assume is not handled by dupRetToEnableTailCallOpts.
  2 The control flow is more complex than expected, dupRetToEnableTailCallOpts can only duplicate ret into its predecessor, but here we have an intermediate block between call and ret.

The solutions:

  1 Since CodeGenPrepare is already at the end of LLVM IR phase, we can simply delete the intrinsic call to llvm.assume.
  2 A general solution to the complex control flow is hard, but for this case, after exit2 is duplicated into case1, exit2 is the only successor of exit1 and exit1 is the only predecessor of exit2, so they can be combined through eliminateFallThrough. But this function is called too late, there is no more dupRetToEnableTailCallOpts after it. We can add an earlier call to eliminateFallThrough to solve it.

Differential Revision: https://reviews.llvm.org/D76539
2020-03-31 11:55:51 -07:00
Juneyoung Lee d82c1e8c56 Rename test name, add more tests for codegenprepare 2020-03-25 20:31:12 +09:00
Juneyoung Lee e951a48996 Add freeze(and x, const) case to codegenprepare's freeze-cmp.ll 2020-03-25 17:29:01 +09:00
Juneyoung Lee 6ad63606ea [CodeGenPrepare] Freeze condition when transforming select to br
Summary:
This is a simple fix for CodeGenPrepare that freezes branch condition when transforming select to branch.
If it is not frozen, instsimplify or the later pipeline can potentially exploit undefined behavior.

The diff shows optimized form becase D75859 and D76048 already made a few changes to CodeGenPrepare for optimizing freeze(cmp).

Reviewers: jdoerfert, spatel, lebedev.ri, efriedma

Reviewed By: lebedev.ri

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76179
2020-03-16 12:46:20 +09:00
Juneyoung Lee 4ffe3ac729 Revert "[CodeGenPrepare] Freeze condition when transforming select to br"
This reverts commit 10aa7ea951.
2020-03-16 12:45:54 +09:00
Juneyoung Lee 10aa7ea951 [CodeGenPrepare] Freeze condition when transforming select to br
Summary:
This is a simple fix for CodeGenPrepare that freezes branch condition when transforming select to branch.
If it is not freezed, instsimplify or the later pipeline can potentially exploit undefined behavior.

The diff shows optimized form becase D75859 and D76048 already made a few changes to CodeGenPrepare for optimizing freeze(cmp).

Reviewers: jdoerfert, spatel, lebedev.ri, efriedma

Reviewed By: lebedev.ri

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76179
2020-03-15 11:10:46 +09:00
Juneyoung Lee c39cb1c0dd [CodeGenPrepare] Expand freeze conversion to support fcmp and icmp with null
Summary:
This is a simple patch that expands https://reviews.llvm.org/D75859 to pointer comparison and fcmp

Checked with Alive2

Reviewers: reames, jdoerfert

Reviewed By: jdoerfert

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76048
2020-03-13 17:21:33 +09:00
Juneyoung Lee 48b901b0e1 Add tests to Transforms/CodeGenPrepare/X86/freeze-cmp.ll before commiting D76048 2020-03-13 17:18:42 +09:00
Juneyoung Lee 629cf3c1c5 Apply update_test_check.py to CodeGenPrepare/X86/freeze-icmp.ll test 2020-03-12 16:37:16 +09:00
Juneyoung Lee 8eb2f865c3 [CodeGenPrepare] Fold br(freeze(icmp x, const)) to br(icmp(freeze x, const))
Summary:
This patch helps CodeGenPrepare move freeze into the icmp when it is used by branch.
It reenables generation of efficient conditional jumps.

This is only done when at least one of icmp's operands is constant to prevent the transformation from increasing # of freeze instructions.

Performance degradation of MultiSource/Benchmarks/Ptrdist/yacr2/yacr2.test is resolved with this patch.

Checked with Alive2

Reviewers: reames, fhahn, nlopes

Reviewed By: reames

Subscribers: jdoerfert, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75859
2020-03-12 03:16:15 +09:00
Florian Hahn 7769030b93 Recommit "[PatternMatch] Match XOR variant of unsigned-add overflow check."
This version fixes a buildbot failure cause by picking the wrong insert
point for XORs. We cannot pick the XOR binary operator as insert point,
as it is not guaranteed that both input operands for the overflow
intrinsic are defined before it.

This reverts the revert commit
c7fc0e5da6.
2020-02-23 18:33:18 +00:00
Florian Hahn c7fc0e5da6 Revert "[PatternMatch] Match XOR variant of unsigned-add overflow check."
This reverts commit e01a3d49c2.
and commit a6a585b803.

This causes a failure on GreenDragon:
http://lab.llvm.org:8080/green/view/LLDB/job/lldb-cmake/9597
2020-02-19 19:37:08 +01:00
Florian Hahn e01a3d49c2 [PatternMatch] Match XOR variant of unsigned-add overflow check.
Instcombine folds (a + b <u a) to (a ^ -1 <u b) and that does not match
the expected pattern in CodeGenPerpare via UAddWithOverflow.

This causes a regression over Clang 7 on both X86 and AArch64:
https://gcc.godbolt.org/z/juhXYV

This patch extends UAddWithOverflow to also catch the XOR case, if the
XOR is only used in the ICMP. This covers just a single case, but I'd
like to make sure I am not missing anything before tackling the other
cases.

Reviewers: nikic, RKSimon, lebedev.ri, spatel

Reviewed By: nikic, lebedev.ri

Differential Revision: https://reviews.llvm.org/D74228
2020-02-19 15:25:18 +01:00
Florian Hahn 216afd3301 [TargetLower] Update shouldFormOverflowOp check if math is used.
On some targets, like SPARC, forming overflow ops is only profitable if
the math result is used: https://godbolt.org/z/DxSmdB
This patch adds a new MathUsed parameter to allow the targets
to make the decision and defaults to only allowing it
if the math result is used. That is the conservative choice.

This patch also updates AArch64ISelLowering, X86ISelLowering,
ARMISelLowering.h, SystemZISelLowering.h to allow forming overflow
ops if the math result is not used. On those targets using the
overflow intrinsic for the overflow check only generates better code.

Reviewers: nikic, RKSimon, lebedev.ri, spatel

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D74722
2020-02-19 11:28:33 +01:00
Florian Hahn 7cbf710396 [CGP] Precommit tests for D74228. 2020-02-19 09:24:06 +01:00
Florian Hahn 106ae108c1 [CGP] Add uaddo test with math used, SPARC/AArch64 variants. 2020-02-18 12:49:08 +01:00
Clement Courbet 15488ff24b [CodeGen] Fix the computation of the alignment of split stores.
Summary:
Right now the alignment of the lower half of a store is computed as
align/2, which fails for unaligned stores (align = 1), and is overly
pessimitic for, e.g. a 8 byte store aligned to 4 bytes.
Fixes PR44851
Fixes PR44877

Reviewers: gchatelet, spatel, lebedev.ri

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D74311
2020-02-12 10:37:30 +01:00
Clement Courbet 24856002e0 [CodeGenPrepare] Add more store splitting tests for PR44877. 2020-02-12 09:50:47 +01:00
Fangrui Song bf70494b94 [test] More tests to target specific directories after CodeGenPrepare requires TargetPassConfig (D73754) 2020-02-02 10:43:02 -08:00
Fangrui Song eee6a45a13 [CodeGenPrepare][test] Add REQUIRES to two tests after D73754 2020-02-02 09:53:17 -08:00
Fangrui Song 5a56a25b0b [CodeGenPrepare] Make TargetPassConfig required
The code paths in the absence of TargetMachine, TargetLowering or
TargetRegisterInfo are poorly tested. As rL285987 said, requiring
TargetPassConfig allows us to delete many (untested) checks littered
everywhere.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D73754
2020-02-02 09:28:45 -08:00
Fangrui Song 502a77f125 Migrate function attribute "no-frame-pointer-elim" to "frame-pointer"="all" as cleanups after D56351 2019-12-24 15:57:33 -08:00
Hiroshi Yamauchi ed50e6060b [PGO][PGSO] Enable size optimizations in code gen / target passes for cold code.
Summary: Split off of D67120.

Reviewers: davidxl

Subscribers: hiraditya, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71288
2019-12-13 11:01:19 -08:00
Joerg Sonnenberger 9681ea9560 Reapply r374743 with a fix for the ocaml binding
Add a pass to lower is.constant and objectsize intrinsics

This pass lowers is.constant and objectsize intrinsics not simplified by
earlier constant folding, i.e. if the object given is not constant or if
not using the optimized pass chain. The result is recursively simplified
and constant conditionals are pruned, so that dead blocks are removed
even for -O0. This allows inline asm blocks with operand constraints to
work all the time.

The new pass replaces the existing lowering in the codegen-prepare pass
and fallbacks in SDAG/GlobalISEL and FastISel. The latter now assert
on the intrinsics.

Differential Revision: https://reviews.llvm.org/D65280

llvm-svn: 374784
2019-10-14 16:15:14 +00:00
Dmitri Gribenko 1a21f98ac3 Revert "Add a pass to lower is.constant and objectsize intrinsics"
This reverts commit r374743. It broke the build with Ocaml enabled:
http://lab.llvm.org:8011/builders/clang-x86_64-debian-fast/builds/19218

llvm-svn: 374768
2019-10-14 12:22:48 +00:00
Joerg Sonnenberger e4300c392d Add a pass to lower is.constant and objectsize intrinsics
This pass lowers is.constant and objectsize intrinsics not simplified by
earlier constant folding, i.e. if the object given is not constant or if
not using the optimized pass chain. The result is recursively simplified
and constant conditionals are pruned, so that dead blocks are removed
even for -O0. This allows inline asm blocks with operand constraints to
work all the time.

The new pass replaces the existing lowering in the codegen-prepare pass
and fallbacks in SDAG/GlobalISEL and FastISel. The latter now assert
on the intrinsics.

Differential Revision: https://reviews.llvm.org/D65280

llvm-svn: 374743
2019-10-13 23:00:15 +00:00
David Bolvansky 3d33e97be6 [NFC} Updated test
llvm-svn: 372093
2019-09-17 09:45:52 +00:00
David Green a6e944b173 [CGP] Ensure sinking multiple instructions does not invalidate dominance checks
In MVE, as of rL371218, we are attempting to sink chains of instructions such as:
  %l1 = insertelement <8 x i8> undef, i8 %l0, i32 0
  %broadcast.splat26 = shufflevector <8 x i8> %l1, <8 x i8> undef, <8 x i32> zeroinitializer
In certain situations though, we can end up breaking the dominance relations of
instructions. This happens when we sink the instruction into a loop, but cannot
remove the originals. The Use is updated, which might in fact be a Use from the
second instruction to the first.

This attempts to fix that by reversing the order of instruction that are sunk,
and ensuring that we update the uses on new instructions if they have already
been sunk, not the old ones.

Differential Revision: https://reviews.llvm.org/D67366

llvm-svn: 371743
2019-09-12 16:00:07 +00:00
Sam Tebbs f1cdd95a2f [ARM] Sink add/mul(shufflevector(insertelement())) for MVE instruction selection
This patch sinks add/mul(shufflevector(insertelement())) into the basic block in which they are used so that they can then be selected together.

This is useful for various MVE instructions, such as vmla and others that take R registers.

Loop tests have been added to the vmla test file to make sure vmlas are generated in loops.

Differential revision: https://reviews.llvm.org/D66295

llvm-svn: 371218
2019-09-06 16:01:32 +00:00
Sanjay Patel acceedb15f [CodeGenPrepare] Fix use-after-free
If OptimizeExtractBits() encountered a shift instruction with no operands at all,
it would erase the instruction, but still return false.

This previously didn’t matter because its caller would always return after
processing the instruction, but https://reviews.llvm.org/D63233 changed the
function’s caller to fall through if it returned false, which would then cause
a use-after-free detectable by ASAN.

This change makes OptimizeExtractBits return true if it removes a shift
instruction with no users, terminating processing of the instruction.

Patch by: @brentdax (Brent Royal-Gordon)

Differential Revision: https://reviews.llvm.org/D66330

llvm-svn: 369168
2019-08-16 23:10:34 +00:00
Sanjay Patel 8341a847a2 [CodeGenPrepare] fix RUN line settings
I'm not sure if this was running as expected with a broken triple.

llvm-svn: 369156
2019-08-16 21:37:49 +00:00
Fangrui Song ac14f7b10c [lit] Delete empty lines at the end of lit.local.cfg NFC
llvm-svn: 363538
2019-06-17 09:51:07 +00:00
Sanjay Patel c8d88ad1a9 [CodeGenPrepare][x86] shift both sides of a vector select when profitable
This is based on the example/discussion in PR37428:
https://bugs.llvm.org/show_bug.cgi?id=37428

Proper vector shift instructions don't appear until AVX2, so we may generate several
extra instructions within a loop trying to compensate for that. It's difficult to
recover from that shift expansion later than this, so use the existing TLI hook and
splat analysis to enable better codegen.

This extends CGP functionality introduced with:
rL201655

Differential Revision: https://reviews.llvm.org/D63233

llvm-svn: 363511
2019-06-16 15:29:03 +00:00
Sanjay Patel 7ea378b940 [CodeGenPrepare] propagate debuginfo when copying a shuffle
llvm-svn: 363409
2019-06-14 15:05:35 +00:00
Sanjay Patel a1421e8347 [x86] add tests for vector shifts; NFC
llvm-svn: 363203
2019-06-12 21:30:06 +00:00
Sanjay Patel 5ab41a7a05 [CodeGenPrepare] limit overflow intrinsic matching to a single basic block (2nd try)
This is a subset of the original commit from rL359879
which was reverted because it could crash when using the 'RemovedInstructions'
structure that enables delayed deletion of dead instructions. The motivating
compile-time win does not require that change though. We should get most of
that win from this change alone.

Using/updating a dominator tree to match math overflow patterns may be very
expensive in compile-time (because of the way CGP uses a DT), so just handle
the single-block case.

See post-commit thread for rL354298 for more details:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20190422/646276.html

Differential Revision: https://reviews.llvm.org/D61075

llvm-svn: 359969
2019-05-04 12:46:32 +00:00
Evgeniy Stepanov 46ec57e576 Revert "[CodeGenPrepare] limit overflow intrinsic matching to a single basic block"
This reverts commit r359879, which introduced a compiler crash.

llvm-svn: 359908
2019-05-03 17:31:49 +00:00
Sanjay Patel 8ff072e48e [CodeGenPrepare] limit overflow intrinsic matching to a single basic block
Using/updating a dominator tree to match math overflow patterns may be very
expensive in compile-time (because of the way CGP uses a DT), so just handle
the single-block case.

Also, we were restarting the iterator loops when doing the overflow intrinsic
transforms by marking the dominator tree for update. That was done to prevent
iterating over a removed instruction. But we can postpone the deletion using
the existing "RemovedInsts" structure, and that means we don't need to update
the DT.

See post-commit thread for rL354298 for more details:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20190422/646276.html

Differential Revision: https://reviews.llvm.org/D61075

llvm-svn: 359879
2019-05-03 13:09:18 +00:00
Eric Christopher cee313d288 Revert "Temporarily Revert "Add basic loop fusion pass.""
The reversion apparently deleted the test/Transforms directory.

Will be re-reverting again.

llvm-svn: 358552
2019-04-17 04:52:47 +00:00
Eric Christopher a863435128 Temporarily Revert "Add basic loop fusion pass."
As it's causing some bot failures (and per request from kbarton).

This reverts commit r358543/ab70da07286e618016e78247e4a24fcb84077fda.

llvm-svn: 358546
2019-04-17 02:12:23 +00:00
Sanjay Patel d47eac59ef [CodeGenPrepare] limit formation of overflow intrinsics (PR41129)
This is probably a bigger limitation than necessary, but since we don't have any evidence yet
that this transform led to real-world perf improvements rather than regressions, I'm making a
quick, blunt fix.

In the motivating x86 example from:
https://bugs.llvm.org/show_bug.cgi?id=41129
...and shown in the regression test, we want to avoid an extra instruction in the dominating
block because that could be costly.

The x86 LSR test diff is reversing the changes from D57789. There's no evidence that 1 version
is any better than the other yet.

Differential Revision: https://reviews.llvm.org/D59602

llvm-svn: 356665
2019-03-21 13:57:07 +00:00
Sanjay Patel fb44f99b73 [CGP][x86] add tests for usubo regression (PR41129); NFC
llvm-svn: 356559
2019-03-20 15:02:35 +00:00
Sanjay Patel 2c9275a790 [CGP] add another bailout for degenerate code (PR41064)
This is almost the same as:
rL355345
...and should prevent any potential crashing from examples like:
https://bugs.llvm.org/show_bug.cgi?id=41064
...although the bug was masked by:
rL355823
...and I'm not sure how to repro the problem after that change.

llvm-svn: 356218
2019-03-14 23:14:31 +00:00
Tim Northover 8935aca9c7 CodeGenPrep: preserve inbounds attribute when sinking GEPs.
Targets can potentially emit more efficient code if they know address
computations never overflow. For example ILP32 code on AArch64 (which only has
64-bit address computation) can ignore the possibility of overflow with this
extra information.

llvm-svn: 355926
2019-03-12 15:22:23 +00:00
Sam Parker 52760bf435 [CGP] Limit distance between overflow math and cmp
Inserting an overflowing arithmetic intrinsic can increase register
pressure by producing two values at a point where only one is needed,
while the second use maybe several blocks away. This increase in
pressure is likely to be more detrimental on performance than
rematerialising one of the original instructions.
    
So, check that the arithmetic and compare instructions are no further
apart than their immediate successor/predecessor.

Differential Revision: https://reviews.llvm.org/D59024

llvm-svn: 355823
2019-03-11 13:19:46 +00:00
Rong Xu ce3be45cac [CodeGenPrepare] Fix ModifiedDT flag in optimizeSelectInst
r44412 fixed a huge compile time regression but it needed ModifiedDT flag to be
maintained correctly in optimizations in optimizeBlock() and optimizeInst().
Function optimizeSelectInst() does not update the flag.
This patch propagates the flag in optimizeSelectInst() back to
optimizeBlock().

This patch also removes ModifiedDT in CodeGenPrepare class (which is not used).
The property of ModifiedDT is now recorded in a ref parameter.

Differential Revision: https://reviews.llvm.org/D59139

llvm-svn: 355751
2019-03-08 22:46:18 +00:00
Florian Hahn 13bbcb3264 [ARM] Sink zext/sext operands for add and sub to enable vsubl generation.
This uses the infrastructure added in rL353152 to sink zext and sexts to
sub/add users, to enable vsubl/vaddl generation when NEON is available.

See https://bugs.llvm.org/show_bug.cgi?id=40025.

Reviewers: SjoerdMeijer, t.p.northover, samparker, efriedma

Reviewed By: samparker

Differential Revision: https://reviews.llvm.org/D58063

llvm-svn: 355460
2019-03-06 00:10:03 +00:00
Sanjay Patel 3b2d0bc7c2 [CodeGenPrepare] avoid crashing on non-canonical/degenerate code
The test is reduced from an example in the post-commit thread for:
rL354746
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20190304/632396.html

While we must avoid dying here, the real question should be:
Why is non-canonical and/or degenerate code making it to CGP when
using the new pass manager?

llvm-svn: 355345
2019-03-04 22:47:13 +00:00
Sanjay Patel cb04ba032f [CGP] add special-cases to form unsigned add with overflow (PR40486)
There's likely a missed IR canonicalization for at least 1 of these
patterns. Otherwise, we wouldn't have needed the pattern-matching
enhancement in D57516.

Note that -- unlike usubo added with D57789 -- the TLI hook for
this transform defaults to 'on'. So if there's any perf fallout
from this, targets should look at how they're lowering the uaddo
node in SDAG and/or override that hook.

The x86 diffs suggest that there's some missing pattern-matching
for forming inc/dec.

This should fix the remaining known problems in:
https://bugs.llvm.org/show_bug.cgi?id=40486
https://bugs.llvm.org/show_bug.cgi?id=31754

llvm-svn: 354746
2019-02-24 15:31:27 +00:00