Commit Graph

25239 Commits

Author SHA1 Message Date
Sergey Dmitriev 0f70f73308 [Attributor] Bitcast constant to the returned value type if it has different type
Reviewers: jdoerfert, sstefan1, uenoku

Reviewed By: jdoerfert

Subscribers: hiraditya, uenoku, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D79277
2020-05-03 11:46:13 -07:00
Hongtao Yu 911e06f5eb [ICP] Handling must tail calls in indirect call promotion
Per the IR convention, a musttail call must precede a ret with an optional bitcast. This was violated by the indirect call promotion optimization which could result an IR like:

    ; <label>:2192:
      br i1 %2198, label %2199, label %2201, !dbg !226012, !prof !229483

    ; <label>:2199:                                   ; preds = %2192
      musttail call fastcc void @foo(i8* %2195), !dbg !226012
      br label %2202, !dbg !226012

    ; <label>:2201:                                   ; preds = %2192
      musttail call fastcc void %2197(i8* %2195), !dbg !226012
      br label %2202, !dbg !226012

    ; <label>:2202:                                   ; preds = %605, %2201, %2199
      ret void, !dbg !229485

This is being fixed in this change where the return statement goes together with the promoted indirect call. The code generated is like:

    ; <label>:2192:
      br i1 %2198, label %2199, label %2201, !dbg !226012, !prof !229483

    ; <label>:2199:                                   ; preds = %2192
      musttail call fastcc void @foo(i8* %2195), !dbg !226012
      ret void, !dbg !229485

    ; <label>:2201:                                   ; preds = %2192
      musttail call fastcc void %2197(i8* %2195), !dbg !226012
      ret void, !dbg !229485

Differential Revision: https://reviews.llvm.org/D79258
2020-05-03 10:42:22 -07:00
Mircea Trofin bec4ab95a4 [llvm][NFC] Inliner: factor cost and reporting out of inlining process
Summary:
This factors cost and reporting out of the inlining workflow, thus
making it easier to reuse when driving inlining from the upcoming
InliningAdvisor.

Depends on: D79215

Reviewers: davidxl, echristo

Subscribers: eraman, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D79275
2020-05-03 10:38:28 -07:00
Florian Hahn bbdfcf8f69 [VPlan] Remove unused & undefined print method (NFC). 2020-05-03 18:36:20 +01:00
Johannes Doerfert 8228153f87 [Attributor][NFC] Encode IRPositions in the bits of a single pointer
This reduces memory consumption for IRPositions by eliminating the
vtable pointer and the `KindOrArgNo` integer. Since each abstract
attribute has an associated IRPosition, the 12-16 bytes we save add up
quickly.

No functional change is intended.

---

Single run of the Attributor module and then CGSCC pass (oldPM)
for SPASS/clause.c (~10k LLVM-IR loc):

Before:
```
calls to allocation functions: 469545 (260135/s)
temporary memory allocations: 77137 (42735/s)
peak heap memory consumption: 30.50MB
peak RSS (including heaptrack overhead): 119.50MB
total memory leaked: 269.07KB
```

After:
```
calls to allocation functions: 468999 (274108/s)
temporary memory allocations: 77002 (45004/s)
peak heap memory consumption: 28.83MB
peak RSS (including heaptrack overhead): 118.05MB
total memory leaked: 269.07KB
```

Difference:
```
calls to allocation functions: -546 (5808/s)
temporary memory allocations: -135 (1436/s)
peak heap memory consumption: -1.67MB
peak RSS (including heaptrack overhead): 0B
total memory leaked: 0B
```

---

CTMark 15 runs

Metric: compile_time

Program                                        lhs    rhs    diff
 test-suite...:: CTMark/sqlite3/sqlite3.test    25.07  24.09 -3.9%
 test-suite...Mark/mafft/pairlocalalign.test    14.58  14.14 -3.0%
 test-suite...-typeset/consumer-typeset.test    21.78  21.58 -0.9%
 test-suite :: CTMark/SPASS/SPASS.test          21.95  22.03  0.4%
 test-suite :: CTMark/lencod/lencod.test        25.43  25.50  0.3%
 test-suite...ark/tramp3d-v4/tramp3d-v4.test    23.88  23.83 -0.2%
 test-suite...TMark/7zip/7zip-benchmark.test    60.24  60.11 -0.2%
 test-suite :: CTMark/kimwitu++/kc.test         15.69  15.69 -0.0%
 test-suite...:: CTMark/ClamAV/clamscan.test    25.43  25.42 -0.0%
 test-suite :: CTMark/Bullet/bullet.test        37.63  37.62 -0.0%
 Geomean difference                                          -0.8%

---

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D78722
2020-05-03 12:15:19 -05:00
Johannes Doerfert 6bf16ee4c5 [Attributor][NFC] Let AbstractAttribute be an IRPosition
Since every AbstractAttribute so far, and for the foreseeable future,
corresponds to a single IRPosition we can simplify the class structure.
We already did this for IRAttribute but there is no reason to stop
there.
2020-05-03 12:13:40 -05:00
Mircea Trofin 667f558c3f [llvm][NFC] Inliner.cpp shouldInline post-commit feedback
Discussion is in https://reviews.llvm.org/D79215
2020-05-03 09:31:31 -07:00
Sanjay Patel 682f0b366b [InstCombine] use select-of-constants with set/clear bit mask patterns
Cond ? (X & ~C) : (X | C) --> (X & ~C) | (Cond ? 0 : C)
Cond ? (X | C) : (X & ~C) --> (X & ~C) | (Cond ? C : 0)

The select-of-constants form results in better codegen.
There's an existing test diff that shows a transform that
results in an extra IR instruction, but that's an existing
problem.

This is motivated by code seen in LLVM itself - see PR37581:
https://bugs.llvm.org/show_bug.cgi?id=37581

define i8 @src(i8 %x, i8 %C, i1 %b)  {
  %notC = xor i8 %C, -1
  %and = and i8 %x, %notC
  %or = or i8 %x, %C
  %cond = select i1 %b, i8 %or, i8 %and
  ret i8 %cond
}

define i8 @tgt(i8 %x, i8 %C, i1 %b)  {
  %notC = xor i8 %C, -1
  %and = and i8 %x, %notC
  %mul = select i1 %b, i8 %C, i8 0
  %or = or i8 %mul, %and
  ret i8 %or
}

http://volta.cs.utah.edu:8080/z/Vt2WVm

Differential Revision: https://reviews.llvm.org/D78880
2020-05-03 09:44:43 -04:00
Nikita Popov b7e2358220 Remove getNumUses() comparisons (NFC)
getNumUses() scans the full use list. Don't use it is we only want
to check if there's zero or one uses.
2020-05-02 11:05:19 +02:00
Nikita Popov 60e9ee16b4 [MergeFuncs] Don't merge shufflevectors with different masks
When the shufflevector mask operand was converted into special
instruction data, the FunctionComparator was not updated to
account for this. As such, MergeFuncs will happily merge
shufflevectors with different masks.

This fixes https://bugs.llvm.org/show_bug.cgi?id=45773.

Differential Revision: https://reviews.llvm.org/D79261
2020-05-02 10:21:14 +02:00
Mircea Trofin 3dbc612cf2 [llvm][NFC] Rename variable as per https://reviews.llvm.org/D79215
Operator error - performed the rename and didn't save.
2020-05-01 16:30:41 -07:00
Mircea Trofin e1c4a7cb16 [llvm][NFC] Inliner: simplify inlining decision logic
Summary:
shouldInline makes a decision based on the InlineCost of a call site, as
well as an evaluation on whether the site should be deferred. This means
it's possible for the decision to be not to inline, even for an
InlineCost that would otherwise allow it.

Both uses of shouldInline performed the exact same logic after calling
it. In addition, the decision on whether to inline or not was
communicated through two values of the Option<InlineCost> return value:
None, or an InlineCost evaluating to false.

Simplified by:
- encapsulating the decision in the return object. The bool it evaluates
to communicates unambiguously the decision. The InlineCost is also
available.
- encapsulated the common post-shouldInline code into shouldInline.

Reviewers: davidxl, echristo, eraman

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D79215
2020-05-01 16:18:59 -07:00
Christopher Tetreault beeabe382d [SVE] Fix invalid usage of VectorType::getNumElements() in InstCombine
Summary:
Make foldVectorBinop return null if the instruction type is a scalable
vector. It is unclear what, if any, of this function works with scalable
vectors.

Identified by test LLVM.Transforms/InstCombine::nsw.ll

Reviewers: efriedma, david-arm, fpetrogalli, spatel

Reviewed By: efriedma

Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D79196
2020-05-01 10:56:29 -07:00
Sanjay Patel 7fa150203f [InstCombine] fix miscompile from multi-use cttz/ctlz transform
PR45762:
https://bugs.llvm.org/show_bug.cgi?id=45762
2020-05-01 13:52:24 -04:00
Florian Hahn d911c17596 [SCCP] Get a copy of the state of CopyOf once.
This fixes potential reference invalidations, when no lattice value is
assigned for CopyOf. As the state of CopyOf won't change while in
handleCallResult, we can get a copy once and use that.

Should fix PR45749.
2020-05-01 14:46:35 +01:00
Benjamin Kramer 7a5a1e9460 [IR] AttributeList::getContext has a single user, remove it. 2020-05-01 14:18:29 +02:00
Florian Hahn 19ab53f1e2 [LoopVersioning] Update setAliasChecks to take ArrayRef argument (NFC).
This cleanup was suggested as part of D78458.
2020-04-30 22:17:12 +01:00
Nikita Popov b74c6d2c9d [InlineFunction] Disable emission of alignment assumptions by default
In D74183 clang started emitting alignment for sret parameters
unconditionally. This caused a 1.5% compile-time regression on
tramp3d-v4. The reason is that we now generate many instance of IR like

    %ptrint = ptrtoint %class.GuardLayers* %guards_m to i64
    %maskedptr = and i64 %ptrint, 3
    %maskcond = icmp eq i64 %maskedptr, 0
    tail call void @llvm.assume(i1 %maskcond)

to preserve the alignment information during inlining. Based on IR
analysis, these assumptions also regress optimization. The attached
phase ordering test case illustrates two issues: One are instruction
count based optimization heuristics, which are affected by the four
additional instructions of the assumption. The other is blocking of
SROA due to ptrtoint casts (PR45763).

We already encountered the same problem in Rust, where we (unlike
Clang) generally prefer to emit alignment information absolutely
everywhere it is available. We were only able to do this after
hardcoding -preserve-alignment-assumptions-during-inlining=false,
because we were seeing significant optimization and compile-time
regressions otherwise.

This patch disables -preserve-alignment-assumptions-during-inlining
by default, because we should not be punishing people for adding
more alignment annotations.

Once the assume bundle work shakes out and we can represent (and use)
alignment assumptions using assume bundles, it should be possible to
re-enable this with reduced overhead.

Differential Revision: https://reviews.llvm.org/D76886
2020-04-30 23:12:54 +02:00
Arthur Eubanks a90948fd6e [NFC] Rename *ByValOrInalloca* to *PassPointeeByValue*
Summary: In preparation for preallocated.

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D79152
2020-04-30 09:42:13 -07:00
Jann Horn a22685885d [AddressSanitizer] Instrument byval call arguments
Summary:
In the LLVM IR, "call" instructions read memory for each byval operand.
For example:

```
$ cat blah.c
struct foo { void *a, *b, *c; };
struct bar { struct foo foo; };
void func1(const struct foo);
void func2(struct bar *bar) { func1(bar->foo); }
$ [...]/bin/clang -S -flto -c blah.c -O2 ; cat blah.s
[...]
define dso_local void @func2(%struct.bar* %bar) local_unnamed_addr #0 {
entry:
  %foo = getelementptr inbounds %struct.bar, %struct.bar* %bar, i64 0, i32 0
  tail call void @func1(%struct.foo* byval(%struct.foo) align 8 %foo) #2
  ret void
}
[...]
$ [...]/bin/clang -S -c blah.c -O2 ; cat blah.s
[...]
func2:                                  # @func2
[...]
        subq    $24, %rsp
[...]
        movq    16(%rdi), %rax
        movq    %rax, 16(%rsp)
        movups  (%rdi), %xmm0
        movups  %xmm0, (%rsp)
        callq   func1
        addq    $24, %rsp
[...]
        retq
```

Let ASAN instrument these hidden memory accesses.

This is patch 4/4 of a patch series:
https://reviews.llvm.org/D77616 [PATCH 1/4] [AddressSanitizer] Refactor ClDebug{Min,Max} handling
https://reviews.llvm.org/D77617 [PATCH 2/4] [AddressSanitizer] Split out memory intrinsic handling
https://reviews.llvm.org/D77618 [PATCH 3/4] [AddressSanitizer] Refactor: Permit >1 interesting operands per instruction
https://reviews.llvm.org/D77619 [PATCH 4/4] [AddressSanitizer] Instrument byval call arguments

Reviewers: kcc, glider

Reviewed By: glider

Subscribers: hiraditya, dexonsmith, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77619
2020-04-30 17:09:13 +02:00
Jann Horn cfe36e4c6a [AddressSanitizer] Refactor: Permit >1 interesting operands per instruction
Summary:
Refactor getInterestingMemoryOperands() so that information about the
pointer operand is returned through an array of structures instead of
passing each piece of information separately by-value.

This is in preparation for returning information about multiple pointer
operands from a single instruction.

A side effect is that, instead of repeatedly generating the same
information through isInterestingMemoryAccess(), it is now simply collected
once and then passed around; that's probably more efficient.

HWAddressSanitizer has a bunch of copypasted code from AddressSanitizer,
so these changes have to be duplicated.

This is patch 3/4 of a patch series:
https://reviews.llvm.org/D77616 [PATCH 1/4] [AddressSanitizer] Refactor ClDebug{Min,Max} handling
https://reviews.llvm.org/D77617 [PATCH 2/4] [AddressSanitizer] Split out memory intrinsic handling
https://reviews.llvm.org/D77618 [PATCH 3/4] [AddressSanitizer] Refactor: Permit >1 interesting operands per instruction
https://reviews.llvm.org/D77619 [PATCH 4/4] [AddressSanitizer] Instrument byval call arguments

[glider: renamed llvm::InterestingMemoryOperand::Type to OpType to fix
GCC compilation]

Reviewers: kcc, glider

Reviewed By: glider

Subscribers: hiraditya, jfb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77618
2020-04-30 17:09:13 +02:00
Jann Horn 223a95fdf0 [AddressSanitizer] Split out memory intrinsic handling
Summary:
In both AddressSanitizer and HWAddressSanitizer, we first collect
instructions whose operands should be instrumented and memory intrinsics,
then instrument them. Both during collection and when inserting
instrumentation, they are handled separately.

Collect them separately and instrument them separately. This is a bit
more straightforward, and prepares for collecting operands instead of
instructions in a future patch.

This is patch 2/4 of a patch series:
https://reviews.llvm.org/D77616 [PATCH 1/4] [AddressSanitizer] Refactor ClDebug{Min,Max} handling
https://reviews.llvm.org/D77617 [PATCH 2/4] [AddressSanitizer] Split out memory intrinsic handling
https://reviews.llvm.org/D77618 [PATCH 3/4] [AddressSanitizer] Refactor: Permit >1 interesting operands per instruction
https://reviews.llvm.org/D77619 [PATCH 4/4] [AddressSanitizer] Instrument byval call arguments

Reviewers: kcc, glider

Reviewed By: glider

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77617
2020-04-30 17:09:13 +02:00
Jann Horn e29996c9a2 [AddressSanitizer] Refactor ClDebug{Min,Max} handling
Summary:
A following commit will split the loop over ToInstrument into two.
To avoid having to duplicate the condition for suppressing instrumentation
sites based on ClDebug{Min,Max}, refactor it out into a new function.

While we're at it, we can also avoid the indirection through
NumInstrumented for setting FunctionModified.

This is patch 1/4 of a patch series:
https://reviews.llvm.org/D77616 [PATCH 1/4] [AddressSanitizer] Refactor ClDebug{Min,Max} handling
https://reviews.llvm.org/D77617 [PATCH 2/4] [AddressSanitizer] Split out memory intrinsic handling
https://reviews.llvm.org/D77618 [PATCH 3/4] [AddressSanitizer] Refactor: Permit >1 interesting operands per instruction
https://reviews.llvm.org/D77619 [PATCH 4/4] [AddressSanitizer] Instrument byval call arguments

Reviewers: kcc, glider

Reviewed By: glider

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77616
2020-04-30 17:09:13 +02:00
Alexander Potapenko 7e7754df32 Revert an accidental commit of four AddressSanitizer refactor CLs
I couldn't make arc land the changes properly, for some reason they all got
squashed. Reverting them now to land cleanly.

Summary: This reverts commit cfb5f89b62.

Reviewers: kcc, thejh

Subscribers:
2020-04-30 16:15:43 +02:00
Jann Horn cfb5f89b62 [AddressSanitizer] Refactor ClDebug{Min,Max} handling
Summary:
A following commit will split the loop over ToInstrument into two.
To avoid having to duplicate the condition for suppressing instrumentation
sites based on ClDebug{Min,Max}, refactor it out into a new function.

While we're at it, we can also avoid the indirection through
NumInstrumented for setting FunctionModified.

This is patch 1/4 of a patch series:
https://reviews.llvm.org/D77616 [PATCH 1/4] [AddressSanitizer] Refactor ClDebug{Min,Max} handling
https://reviews.llvm.org/D77617 [PATCH 2/4] [AddressSanitizer] Split out memory intrinsic handling
https://reviews.llvm.org/D77618 [PATCH 3/4] [AddressSanitizer] Refactor: Permit >1 interesting operands per instruction
https://reviews.llvm.org/D77619 [PATCH 4/4] [AddressSanitizer] Instrument byval call arguments

Reviewers: kcc, glider

Reviewed By: glider

Subscribers: jfb, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77616
2020-04-30 15:30:46 +02:00
David Spickett 3929429347 [globalopt] Don't emit DWARF fragments for members
of a struct that cover the whole struct

This can happen when the rest of the
members of are zero length. Following
the same pattern applied to the SROA
pass in:
d7f6f1636d

Fixes: https://bugs.llvm.org/show_bug.cgi?id=45335

Differential Revision: https://reviews.llvm.org/D78720
2020-04-30 11:36:55 +01:00
Evgeniy Brevnov 3acf62f3ad [BPI][NFC] IRCE shoud qequest BPI through analysis manager.
Summary: There is no need to create BPI explicitly. It should be requested through AM in a normal way.

Reviewers: skatkov

Reviewed By: skatkov

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D79080
2020-04-30 16:04:06 +07:00
Evgeniy Brevnov 3e68a66704 [BPI][NFC] Reuse post dominantor tree from analysis manager when available
Summary: Currenlty BPI unconditionally creates post dominator tree each time. While this is not incorrect we can save compile time by reusing existing post dominator tree (when it's valid) provided by analysis manager.

Reviewers: skatkov, taewookoh, yrouban

Reviewed By: skatkov

Subscribers: hiraditya, steven_wu, dexonsmith, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78987
2020-04-30 11:31:03 +07:00
Mircea Trofin 3ab319b295 [llvm][NFC] Use CallBase explicitly instead of Instruction in FunctionComparator
Reviewers: dblaikie, craig.topper

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D79098
2020-04-29 15:37:46 -07:00
Mircea Trofin 2c7ff270d2 [llvm][NFC] Inliner: rename call site variables.
Summary:
Renamed 'CS' to 'CB', and, in one case, to a more specific name to avoid
naming collision with outer scope (a maintainability/readability reason,
not correctness)

Also updated comments.

Reviewers: davidxl, dblaikie, jdoerfert

Subscribers: eraman, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D79101
2020-04-29 15:36:29 -07:00
Anh Tuyen Tran c7878ad231 [VFDatabase] Scalar functions are vector functions with VF =1
Summary:
Return scalar function when VF==1. The new trivial mapping scalar --> scalar when VF==1 to prevent false positive for "isVectorizable" query.

Author: masoud.ataei (Masoud Ataei)

Reviewers: Whitney (Whitney Tsang), fhahn (Florian Hahn), pjeeva01 (Jeeva P.), fpetrogalli (Francesco Petrogalli), rengolin (Renato Golin)

Reviewed By: fpetrogalli (Francesco Petrogalli)

Subscribers: hiraditya (Aditya Kumar), llvm-commits, LLVM

Tag: LLVM

Differential Revision: https://reviews.llvm.org/D78054
2020-04-29 17:20:37 +00:00
Mircea Trofin 4632b7292a [llvm][NFC] Removed addressed fixme; formatting.
Removed already-addressed fixme, and updated formatting of a few lines
that were triggering Harbormaster.
2020-04-29 09:06:01 -07:00
Hiroshi Yamauchi 1831986826 [PGO][PGSO] Prep for enabling non-cold code size opts under non-partial-profile sample PGO.
Summary:
- Distinguish between partial-profile and non-partial-profile sample PGO.
- Add a flag for partial-profile sample PGO.
- Tune the sample PGO cutoff.
- No default behavior change (yet).

Reviewers: davidxl

Subscribers: eraman, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78949
2020-04-29 08:57:47 -07:00
Mircea Trofin e61247c0a8 [llvm][NFC] Change parameter type to more specific CallBase in IndirectCallPromotion
Reviewers: dblaikie, craig.topper, wmi

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D79047
2020-04-29 08:42:32 -07:00
Simon Pilgrim 090cae8491 [TTI] Add DemandedElts to getScalarizationOverhead
The improvements to the x86 vector insert/extract element costs in D74976 resulted in the estimated costs for vector initialization and scalarization increasing higher than should be expected. This is particularly noticeable on pre-SSE4 targets where the available of legal INSERT_VECTOR_ELT ops is more limited.

This patch does 2 things:
1 - it implements X86TTIImpl::getScalarizationOverhead to more accurately represent the typical costs of a ISD::BUILD_VECTOR pattern.
2 - it adds a DemandedElts mask to getScalarizationOverhead to permit the SLP's BoUpSLP::getGatherCost to be rewritten to use it directly instead of accumulating raw vector insertion costs.

This fixes PR45418 where a v4i8 (zext'd to v4i32) was no longer vectorizing.

A future patch should extend X86TTIImpl::getScalarizationOverhead to tweak the EXTRACT_VECTOR_ELT scalarization costs as well.

Reviewed By: @craig.topper

Differential Revision: https://reviews.llvm.org/D78216
2020-04-29 12:00:38 +01:00
Florian Hahn e89379856a Recommit "[VPlan] Add & use VPValue operands for VPWidenRecipe (NFC)."
The crash that caused the original revert has been fixed in
a3c964a278. I also added a reduced version of the crash reproducer.

This reverts the revert commit 2107af9ccf.
2020-04-29 11:40:39 +01:00
Florian Hahn 616657b39c [LAA] Move CheckingPtrGroup/PointerCheck outside class (NFC).
This allows forward declarations of PointerCheck, which in turn reduce
the number of times LoopAccessAnalysis needs to be included.

Ultimately this helps with moving runtime check generation to
Transforms/Utils/LoopUtils.h, without having to include it there.

Reviewers: anemet, Ayal

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D78458
2020-04-28 21:47:31 +01:00
Mircea Trofin 8a7cf11f92 [llvm][NFC] Refactor APIs operating on CallBase
Summary:
Refactored the parameter and return type where they are too generally
typed as Instruction.

Reviewers: dblaikie, wmi, craig.topper

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D79027
2020-04-28 13:23:47 -07:00
David Blaikie 95e570725a OpenMPOpt::RuntimeFunctionInfo::UsesMap: Use unique_ptr for values to simplify memory management 2020-04-28 12:26:53 -07:00
David Blaikie 3c89256d71 Attributor::ArgumentReplacementMap: Use unique_ptr to simplify memory management 2020-04-28 12:26:52 -07:00
Roman Lebedev a0004358a8
[InstCombine] Negator: 'or' with no common bits set is just 'add'
In `InstCombiner::visitAdd()`, we have
```
  // A+B --> A|B iff A and B have no bits set in common.
  if (haveNoCommonBitsSet(LHS, RHS, DL, &AC, &I, &DT))
    return BinaryOperator::CreateOr(LHS, RHS);
```
so we should handle such `or`'s here, too.
2020-04-28 19:16:32 +03:00
Sam Parker e9c9329aa4 [TTI] Add TargetCostKind argument to getUserCost
There are several different types of cost that TTI tries to provide
explicit information for: throughput, latency, code size along with
a vague 'intersection of code-size cost and execution cost'.

The vectorizer is a keen user of RecipThroughput and there's at least
'getInstructionThroughput' and 'getArithmeticInstrCost' designed to
help with this cost. The latency cost has a single use and a single
implementation. The intersection cost appears to cover most of the
rest of the API.

getUserCost is explicitly called from within TTI when the user has
been explicit in wanting the code size (also only one use) as well
as a few passes which are concerned with a mixture of size and/or
a relative cost. In many cases these costs are closely related, such
as when multiple instructions are required, but one evident diverging
cost in this function is for div/rem.

This patch adds an argument so that the cost required is explicit,
so that we can make the important distinction when necessary.

Differential Revision: https://reviews.llvm.org/D78635
2020-04-28 08:57:45 +01:00
Craig Topper a58b62b4a2 [IR] Replace all uses of CallBase::getCalledValue() with getCalledOperand().
This method has been commented as deprecated for a while. Remove
it and replace all uses with the equivalent getCalledOperand().

I also made a few cleanups in here. For example, to removes use
of getElementType on a pointer when we could just use getFunctionType
from the call.

Differential Revision: https://reviews.llvm.org/D78882
2020-04-27 22:17:03 -07:00
Mircea Trofin cb56e9b923 [llvm][NFC] Use CallBase instead of Instruction in ProfileSummaryInfo
Summary:
getProfileCount requires the parameter be a valid CallBase, and its uses
reflect that.

Reviewers: dblaikie, craig.topper, wmi

Subscribers: eraman, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78940
2020-04-27 20:47:52 -07:00
Arthur Eubanks 3b0450acec Add IR constructs for preallocated (inalloca replacement)
Add llvm.call.preallocated.{setup,arg} instrinsics.
Add "preallocated" operand bundle which takes a token produced by llvm.call.preallocated.setup.
Add "preallocated" parameter attribute, which is like byval but without the copy.

Verifier changes for these IR constructs.

See https://github.com/rnk/llvm-project/blob/call-setup-docs/llvm/docs/CallSetup.md

Subscribers: hiraditya, jdoerfert, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D74651
2020-04-27 16:15:50 -07:00
Sanjay Patel 21acc0612a [SLP] refactor load-combine logic; NFC
We may want to identify sequences that are not
reductions, but still qualify as load-combines
in the back-end, so make most of the body a
helper function.
2020-04-27 16:02:37 -04:00
Sameer Sahasrabuddhe 8488763682 [NFC] UnifyLoopExits: correctly skip expensive checks 2020-04-27 15:10:35 +05:30
Ayal Zaks a3c964a278 [LV] Fix recording of BranchTakenCount for FoldTail
When folding tail, branch taken count is computed during initial VPlan execution
and recorded to be used by the compare computing the loop's mask. This recording
should directly set the State, instead of reusing Value2VPValue mapping which
serves original Values present prior to vectorization.
The branch taken count may be a constant Value, which may be used elsewhere in
the loop; trying to employ Value2VPValue for both leads to the issue reported in
https://reviews.llvm.org/D76992#inline-721028

Differential Revision: https://reviews.llvm.org/D78847
2020-04-26 20:13:10 +03:00
Florian Hahn 2f3e86b318 [DSE,MSSA] Continue checking more remaining candidates with dbgcnt.
After changing the candidate iteration strategy, we should continue with
the next candidate, rather than breaking out of the loop.
2020-04-26 16:59:32 +01:00
Florian Hahn 7d57d22baa [SCCP] Support ranges for loads and stores.
Integer ranges can be used for loaded/stored values. Note that widening
can be disabled for loads/stores, as we only rely on instructions that
cause continued increases to ranges to be widened (like binary
operators).

Reviewers: efriedma, mssimpso, davide

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D78433
2020-04-26 13:16:47 +01:00
Simon Pilgrim a3982491db [Pass] Ensure we don't include PassSupport.h or PassAnalysisSupport.h directly
Both PassSupport.h and PassAnalysisSupport.h are only supposed to be included via Pass.h.

Differential Revision: https://reviews.llvm.org/D78815
2020-04-26 12:58:20 +01:00
Nikita Popov 164845cd92 [GVN] Reduce expression size (NFC)
Reduce size of GVN::Expression by reordering fields to reduce padding.
2020-04-26 09:43:35 +02:00
Sergei Trofimovich 09684b08d3 llvm: IPO: handle IRMover error handling, bug #45636
Summary:
Missing error mangling is noticed in
https://bugs.llvm.org/show_bug.cgi?id=45636
where inconsistent profiling input caused
llvm/lld to crash as:

```
Program aborted due to an unhandled Error:
linking module flags 'ProfileSummary':
  IDs have conflicting values in 'Mutex_posix.o' and 'nsBrowserApp.o'
```

The change does not change the fact that LLVM crashes
but changes error output to say what was incorrect:

```
LLVM ERROR: Function Import: link error:
  linking module flags 'ProfileSummary':
    IDs have conflicting values in 'Mutex_posix.o' and 'nsBrowserApp.o'
```

Actual crash has yet to be fixed.

Reviewers: lattner

Reviewed By: lattner

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78676
2020-04-25 19:16:01 +01:00
Sergey Dmitriev 67aed1469b [Attributor] Do not set 'returned' attribute for arguments that cannot be bitcasted to function result
Reviewers: jdoerfert, sstefan1, uenoku

Reviewed By: jdoerfert

Subscribers: hiraditya, uenoku, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78828
2020-04-25 09:49:40 -07:00
Sanjay Patel 4abab5c5ca [InstCombine] generalize canonicalization of masked equality comparisons
(X | MaskC) == C --> (X & ~MaskC) == C ^ MaskC
  (X | MaskC) != C --> (X & ~MaskC) != C ^ MaskC

We have more analyis for 'and' patterns and already lean this way
in the existing code, so this should be neutral or better in IR.

If this does not do as well in codegen, the problem already exists
and we should fix that based on target costs/heuristics.

http://volta.cs.utah.edu:8080/z/oP3ecL

define void @src(i8 %x, i8 %OrC, i8 %C, i1* %p0, i1* %p1) {
  %or = or i8 %x, %OrC
  %eq = icmp eq i8 %or, %C
  store i1 %eq, i1* %p0

  %ne = icmp ne i8 %or, %C
  store i1 %ne, i1* %p1
  ret void
}

define void @tgt(i8 %x, i8 %OrC, i8 %C, i1* %p0, i1* %p1) {
  %NotOrC = xor i8 %OrC, -1
  %a = and i8 %x, %NotOrC
  %NewC = xor i8 %C, %OrC
  %eq = icmp eq i8 %a, %NewC
  store i1 %eq, i1* %p0

  %ne = icmp ne i8 %a, %NewC
  store i1 %ne, i1* %p1
  ret void
}
2020-04-25 11:31:57 -04:00
Florian Hahn 46a04940e8 [DSE] Add stat for remaining stores after DSE.
Using the existing NumFastStores statistic can be misleading when
comparing the impact of DSE patches.

For example, consider the case where a store gets removed from a
function before it is inlined into another function. A less
powerful DSE might only remove the store from functions it has
been inlined into, which will result in more stores being removed, but
no difference in the actual number of stores after DSE.

The new stat provides the absolute number of stores surviving after
DSE.

Reviewers: dmgreen, bryant, asbirlea, jfb

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D78830
2020-04-25 16:12:55 +01:00
Tyker e5f8a77c19 [AssumeBundles] Refactor asssume builder
Summary:
refactor assume bulider for the next patch.
the assume builder now generate only one assume per attribute kind and per value they are on. to do this it takes the highest. this is desirable because currently, for all attributes the higest value is the most valuable.

Reviewers: jdoerfert

Reviewed By: jdoerfert

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78013
2020-04-25 13:43:52 +02:00
Benjamin Kramer 1d42764df7 Give helpers internal linkage. NFC. 2020-04-25 11:50:52 +02:00
Ehud Katz 64249f177e [CodeExtractor] Fix extraction of a value used only by intrinsics outside of region
We should only skip `lifetime` and `dbg` intrinsics when searching for users.
Other intrinsics are legit users that can't be ignored.

Without this fix, the testcase would result in an invalid IR. `memcpy`
will have a reference to the, now, external value (local to the
extracted loop function).

Fix PR42194

Differential Revision: https://reviews.llvm.org/D78749
2020-04-25 11:44:47 +03:00
Craig Topper 2c24051bac [CallSite removal] Rename CallSite.h to AbstractCallSite.h. NFC
The CallSite and ImmutableCallSite were removed in a previous
commit. So rename the file to match the remaining class and
the name of the cpp that implements it.
2020-04-24 22:12:25 -07:00
Tyker 97ecd91e20 [NFC] Refactor SimplifyCFG to make propagating information easier.
Reviewers: jdoerfert

Reviewed By: jdoerfert

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77742
2020-04-24 22:22:20 +02:00
Michael Liao 495bb8feb9 Fix `-Wparentheses` warnings. NFC. 2020-04-24 15:04:01 -04:00
Tyker 42431da895 [AssumeBundles] Use assume bundles in isKnownNonZero
Summary: Use nonnull and dereferenceable from an assume bundle in isKnownNonZero

Reviewers: jdoerfert, nikic, lebedev.ri, reames, fhahn, sstefan1

Reviewed By: jdoerfert

Subscribers: fhahn, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76149
2020-04-24 20:41:51 +02:00
Florian Hahn e1235831c4 [DSE,MSSA] Improve debug output (NFC).
This patch slightly improves the formatting of the debug output, adds a
few missing outputs and makes some existing outputs more consistent with
the rest.
2020-04-24 17:50:08 +01:00
Florian Hahn 44ce588670 [DSE,MSSA] Skip checking write clobber for DomAccess (NFC).
There is no need to check if the starting access for is a write clobber
and all of its uses have already been checked.
2020-04-24 17:16:22 +01:00
Sanjay Patel e4175ff525 [InstCombine] intersect FMF when reassociating FP min/max intrinsics
As discussed in PR45478:
https://bugs.llvm.org/show_bug.cgi?id=45478
...propagating FMF from the outer (second) call is not correct,
so intersect them instead.
I suspect we could do better (see TODO comment), but mismatched
FMF is probably too rare to care about.

Differential Revision: https://reviews.llvm.org/D78631
2020-04-24 12:14:03 -04:00
Simon Pilgrim 27ad103a3a ARCRuntimeEntryPoints.h - remove unnecessary includes. NFC. 2020-04-24 14:32:45 +01:00
Max Kazantsev 9cd4debd5a [LoopVectorize] Preserve CFG analyses if CFG wasn't modified
One of transforms the loop vectorizer makes is LCSSA formation. In some cases it
is the only transform it makes. We should not drop CFG analyzes if only LCSSA was
formed and no actual CFG changes was made.

We should think of expanding this logic to other passes as well, and maybe make
it a part of PM framework.

Reviewed By: Florian Hahn
Differential Revision: https://reviews.llvm.org/D78360
2020-04-24 17:22:24 +07:00
Johannes Doerfert 1dfc473177 Revert "[Attributor][NFC] Encode IRPositions in the bits of a single pointer"
A dependent patch has been reverted [0]. Until it goes back in this one
has to stay out.

[0] ebdb893994

This reverts commit d254b50b2b.
2020-04-24 02:53:51 -05:00
Johannes Doerfert d254b50b2b [Attributor][NFC] Encode IRPositions in the bits of a single pointer
This reduces memory consumption for IRPositions by eliminating the
vtable pointer and the `KindOrArgNo` integer. Since each abstract
attribute has an associated IRPosition, the 12-16 bytes we save add up
quickly.

No functional change is intended.

---

Single run of the Attributor module and then CGSCC pass (oldPM)
for SPASS/clause.c (~10k LLVM-IR loc):

Before:
```
calls to allocation functions: 469545 (260135/s)
temporary memory allocations: 77137 (42735/s)
peak heap memory consumption: 30.50MB
peak RSS (including heaptrack overhead): 119.50MB
total memory leaked: 269.07KB
```

After:
```
calls to allocation functions: 468999 (274108/s)
temporary memory allocations: 77002 (45004/s)
peak heap memory consumption: 28.83MB
peak RSS (including heaptrack overhead): 118.05MB
total memory leaked: 269.07KB
```

Difference:
```
calls to allocation functions: -546 (5808/s)
temporary memory allocations: -135 (1436/s)
peak heap memory consumption: -1.67MB
peak RSS (including heaptrack overhead): 0B
total memory leaked: 0B
```

---

CTMark 15 runs

Metric: compile_time

Program                                        lhs    rhs    diff
 test-suite...:: CTMark/sqlite3/sqlite3.test    25.07  24.09 -3.9%
 test-suite...Mark/mafft/pairlocalalign.test    14.58  14.14 -3.0%
 test-suite...-typeset/consumer-typeset.test    21.78  21.58 -0.9%
 test-suite :: CTMark/SPASS/SPASS.test          21.95  22.03  0.4%
 test-suite :: CTMark/lencod/lencod.test        25.43  25.50  0.3%
 test-suite...ark/tramp3d-v4/tramp3d-v4.test    23.88  23.83 -0.2%
 test-suite...TMark/7zip/7zip-benchmark.test    60.24  60.11 -0.2%
 test-suite :: CTMark/kimwitu++/kc.test         15.69  15.69 -0.0%
 test-suite...:: CTMark/ClamAV/clamscan.test    25.43  25.42 -0.0%
 test-suite :: CTMark/Bullet/bullet.test        37.63  37.62 -0.0%
 Geomean difference                                          -0.8%

---

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D78722
2020-04-24 01:58:47 -05:00
Mircea Trofin b8960b5d81 [llvm][NFC][CallSite] Remove remaining {Immutable}CallSite uses
Reviewers: dblaikie, craig.topper

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78789
2020-04-23 22:19:39 -07:00
Mehdi Amini 2107af9ccf Revert "[VPlan] Add & use VPValue operands for VPWidenRecipe (NFC)."
This reverts commit 9245c7ac13.

This is triggering a segfault in XLA downstream, we'll follow-up with
a reproducer, it is likely influenced by TTI/TLI settings or other
options as a simple `opt -loop-vectorize` invocation on the IR
before the crash does not reproduce immediately.
2020-04-24 05:07:32 +00:00
Mircea Trofin 2059a6e3ef [llvm][NFC][CallSite] Remove ImmutableCallSite from a few locations
Reviewers: craig.topper, dblaikie

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78783
2020-04-23 21:18:44 -07:00
Craig Topper cbe77ca9bd [CallSite removal] Remove unneeded includes of CallSite.h. NFC 2020-04-23 21:01:48 -07:00
Craig Topper 81c5e83f7d [CallSite removal][Transform] Replace CallSite with CallBase in Utils. NFC
Differential Revision: https://reviews.llvm.org/D78780
2020-04-23 20:49:33 -07:00
Roman Lebedev 5a159ed2a8
[InstCombine] Negator: don't negate multi-use `sub`
While we can do that, it doesn't increase instruction count,
if the old `sub` sticks around then the transform is not only
not a unlikely win, but a likely regression, since we likely
now extended live range and use count of both of the `sub` operands,
as opposed to just the result of `sub`.

As Kostya Serebryany notes in post-commit review in
https://reviews.llvm.org/D68408#1998112
this indeed can degrade final assembly,
increase register pressure, and spilling.

This isn't what we want here,
so at least for now let's guard it with an use check.
2020-04-23 23:59:15 +03:00
Christopher Tetreault 7ca56c90bd [SVE] Remove calls to isScalable from Transforms
Reviewers: efriedma, chandlerc, reames, aprantl, sdesmalen

Reviewed By: efriedma

Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77756
2020-04-23 13:50:07 -07:00
Mircea Trofin ceb7f308b8 [llvm][NFC][CallSite] Removed CallSite from few implementation details
Reviewers: dblaikie, craig.topper

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78724
2020-04-23 10:36:36 -07:00
Mircea Trofin cea6f4d5f8 [llvm][NFC][CallSite] Remove CallSite from TypeMetadataUtils & related
Reviewers: craig.topper, dblaikie

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78666
2020-04-23 08:23:16 -07:00
Sanjay Patel 62da6ecea2 [InstCombine] substitute equivalent constant to reduce logic-of-icmps
(X == C) && (Y Pred1 X) --> (X == C) && (Y Pred1 C)
(X != C) || (Y Pred1 X) --> (X != C) || (Y Pred1 C)

This cooperates/overlaps with D78430, but it is a more general transform
that gets us most of the expected simplifications and several other
improvements.
http://volta.cs.utah.edu:8080/z/5gxjjc

PR45618:
https://bugs.llvm.org/show_bug.cgi?id=45618

Differential Revision: https://reviews.llvm.org/D78582
2020-04-23 10:19:16 -04:00
Simon Pilgrim 7a8b1096be [ObjCARC] Remove unused forward declarations. NFC. 2020-04-23 13:52:49 +01:00
Simon Pilgrim b108a457e1 [VPlan] Remove unused forward declarations. NFC.
Move VPlan.h include from VPlanVerifier.h down to VPlanVerifier.cpp
2020-04-23 12:34:20 +01:00
Serguei Katkov c0d2bbb1d4 [CaptureTracking] Replace hardcoded constant to option. NFC.
The motivation is to be able to play with the option and change if it is required.

Reviewers: fedor.sergeev, apilipenko, rnk, jdoerfert
Reviewed By: fedor.sergeev
Subscribers: hiraditya, dantrushin, llvm-commits
Differential Revision: https://reviews.llvm.org/D78624
2020-04-23 18:23:35 +07:00
Florian Hahn 9245c7ac13 [VPlan] Add & use VPValue operands for VPWidenRecipe (NFC).
This patch adds VPValue version of the instruction operands to
VPWidenRecipe and uses them during code-generation.

Similar to D76373 this reduces ingredient def-use usage by ILV as
a step towards full VPlan-based def-use relations.

Reviewers: rengolin, Ayal, gilr

Reviewed By: gilr

Differential Revision: https://reviews.llvm.org/D76992
2020-04-23 12:16:46 +01:00
Craig Topper 25807452ac [ArgumentPromotion] Remove unnecessary getScalarType() before casting to PointerType. NFC
I don't believe this pass deals with vectors of pointers. I think
this getScalarType() was added during a mechanical opaque pointer
change of the interface to GetElementPtrInst::getIndexedType.
2020-04-22 22:51:41 -07:00
Vedant Kumar 2fa656cdfd [Debugify] Do not require named metadata to be present when stripping
This allows -mir-strip-debug to be run without -debugify having run
before.
2020-04-22 17:03:39 -07:00
Vedant Kumar 2a5675f11d [MachineDebugify] Insert synthetic DBG_VALUE instructions
Summary:
Teach MachineDebugify how to insert DBG_VALUE instructions.  This can
help find bugs causing CodeGen differences when debug info is present.
DBG_VALUE instructions are only emitted when -debugify-level is set to
locations+variables.

There is essentially no attempt made to match up DBG_VALUE register
operands with the local variables they ought to correspond to. I'm not
sure how to improve the situation. In some cases (MachineMemOperand?)
it's possible to find the IR instruction a MachineInstr corresponds to,
but in general this seems to call for "undoing" the work done by ISel.

Reviewers: dsanders, aprantl

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78135
2020-04-22 17:03:39 -07:00
Juneyoung Lee aca335955c [ValueTracking] Let analyses assume a value cannot be partially poison
Summary:
This is RFC for fixes in poison-related functions of ValueTracking.
These functions assume that a value can be poison bitwisely, but the semantics
of bitwise poison is not clear at the moment.
Allowing a value to have bitwise poison adds complexity to reasoning about
correctness of optimizations.

This patch makes the analysis functions simply assume that a value is
either fully poison or not, which has been used to understand the correctness
of a few previous optimizations.
The bitwise poison semantics seems to be only used by these functions as well.

In terms of implementation, using value-wise poison concept makes existing
functions do more precise analysis, which is what this patch contains.

Reviewers: spatel, lebedev.ri, jdoerfert, reames, nikic, nlopes, regehr

Reviewed By: nikic

Subscribers: fhahn, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78503
2020-04-23 08:08:53 +09:00
Juneyoung Lee 5ceef26350 Revert "RFC: [ValueTracking] Let analyses assume a value cannot be partially poison"
This reverts commit 80faa8c3af.
2020-04-23 08:07:09 +09:00
Juneyoung Lee 80faa8c3af RFC: [ValueTracking] Let analyses assume a value cannot be partially poison
Summary:
This is RFC for fixes in poison-related functions of ValueTracking.
These functions assume that a value can be poison bitwisely, but the semantics
of bitwise poison is not clear at the moment.
Allowing a value to have bitwise poison adds complexity to reasoning about
correctness of optimizations.

This patch makes the analysis functions simply assume that a value is
either fully poison or not, which has been used to understand the correctness
of a few previous optimizations.
The bitwise poison semantics seems to be only used by these functions as well.

In terms of implementation, using value-wise poison concept makes existing
functions do more precise analysis, which is what this patch contains.

Reviewers: spatel, lebedev.ri, jdoerfert, reames, nikic, nlopes, regehr

Reviewed By: nikic

Subscribers: fhahn, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78503
2020-04-23 07:57:12 +09:00
Florian Hahn 352b612a71 [SCCP] Drop unnecessary early exit for ExtractValueInst.
visitExtractValueInst uses mergeInValue, so it already can handle
constant ranges. Initially the early exit was using isOverdefined to
keep things as NFC during the initial move to ValueLatticeElement.
As the function already supports constant ranges, it can just use
ValueState[&I].isOverdefined.

Reviewers: efriedma, mssimpso, davide

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D78393
2020-04-22 22:07:59 +01:00
Craig Topper be04aba6fc [CallSite removal][ValueTracking] Use CallBase instead of ImmutableCallSite for getIntrinsicForCallSite. NFC
Differential Revision: https://reviews.llvm.org/D78613
2020-04-22 12:06:58 -07:00
Christopher Tetreault 2dea3f1298 [SVE] Add new VectorType subclasses
Summary:
Introduce new types for fixed width and scalable vectors.

Does not remove getNumElements yet so as to not break code during transition
period.

Reviewers: deadalnix, efriedma, sdesmalen, craig.topper, huntergr

Reviewed By: sdesmalen

Subscribers: jholewinski, arsenm, jvesely, nhaehnle, mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, nicolasvasilache, csigg, arpith-jacob, mgester, lucyrfox, liufengdb, kerbowa, Joonsoo, grosul1, frgossen, lldb-commits, tschuett, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm, #lldb

Differential Revision: https://reviews.llvm.org/D77587
2020-04-22 08:59:01 -07:00
Mircea Trofin 1b6b05a250 [llvm][NFC][CallSite] Remove CallSite from a few trivial locations
Summary: Implementation details and internal (to module) APIs.

Reviewers: craig.topper, dblaikie

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78610
2020-04-22 08:39:21 -07:00
Dmitry Vyukov 5a2c31116f [TSAN] Add optional support for distinguishing volatiles
Add support to optionally emit different instrumentation for accesses to
volatile variables. While the default TSAN runtime likely will never
require this feature, other runtimes for different environments that
have subtly different memory models or assumptions may require
distinguishing volatiles.

One such environment are OS kernels, where volatile is still used in
various places for various reasons, and often declare volatile to be
"safe enough" even in multi-threaded contexts. One such example is the
Linux kernel, which implements various synchronization primitives using
volatile (READ_ONCE(), WRITE_ONCE()). Here the Kernel Concurrency
Sanitizer (KCSAN) [1], is a runtime that uses TSAN instrumentation but
otherwise implements a very different approach to race detection from
TSAN.

While in the Linux kernel it is generally discouraged to use volatiles
explicitly, the topic will likely come up again, and we will eventually
need to distinguish volatile accesses [2]. The other use-case is
ignoring data races on specially marked variables in the kernel, for
example bit-flags (here we may hide 'volatile' behind a different name
such as 'no_data_race').

[1] https://github.com/google/ktsan/wiki/KCSAN
[2] https://lkml.kernel.org/r/CANpmjNOfXNE-Zh3MNP=-gmnhvKbsfUfTtWkyg_=VqTxS4nnptQ@mail.gmail.com

Author: melver (Marco Elver)
Reviewed-in: https://reviews.llvm.org/D78554
2020-04-22 17:27:09 +02:00
Roman Lebedev 67266d879c
[InstCombine] Negator: shufflevector is negatible
All these folds are correct as per alive-tv
2020-04-22 15:14:23 +03:00
Craig Topper 05a11974ae [CallSite removal] Remove unneeded includes of CallSite.h. NFC 2020-04-22 00:07:13 -07:00
Johannes Doerfert ca59ff5af9 [Attributor] Replace AccessKind2Accesses map with an "array map"
The number of different access location kinds we track is relatively
small (8 so far). With this patch we replace the DenseMap that mapped
from index (0-7) to the access set pointer with an array of access set
pointers. This reduces memory consumption.

No functional change is intended.

---

Single run of the Attributor module and then CGSCC pass (oldPM)
for SPASS/clause.c (~10k LLVM-IR loc):

Before:
```
calls to allocation functions: 472499 (215654/s)
temporary memory allocations: 77794 (35506/s)
peak heap memory consumption: 35.28MB
peak RSS (including heaptrack overhead): 125.46MB
total memory leaked: 269.04KB
```

After:
```
calls to allocation functions: 472270 (308673/s)
temporary memory allocations: 77578 (50704/s)
peak heap memory consumption: 32.70MB
peak RSS (including heaptrack overhead): 121.78MB
total memory leaked: 269.04KB
```

Difference:
```
calls to allocation functions: -229 (346/s)
temporary memory allocations: -216 (326/s)
peak heap memory consumption: -2.58MB
peak RSS (including heaptrack overhead): 0B
total memory leaked: 0B
```

---
2020-04-22 01:35:27 -05:00
Johannes Doerfert f20ff4b17d [Attributor] Run IRPosition::verify only with EXPENSIVE_CHECKS 2020-04-22 01:35:12 -05:00
Sameer Sahasrabuddhe 5a7a6382bc FixIrreducible: don't crash when moving a child loop
Summary:
When an irreducible SCC is converted into a new natural loop, existing
loops included in that SCC now become children of the new loop. The
logic that moves these loops from the parent loop to the new loop
invoked undefined behaviour when it modified the container that it was
iterating over. Fixed this by first extracting all the loops that are
to be removed from the parent.

Fixes bug 45623.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D78544
2020-04-22 07:47:30 +05:30
Mircea Trofin 9ee02aef62 [llvm][NFC][CallSite] Remove CallSite from FunctionAttrs
Reviewers: dblaikie, craig.topper

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78584
2020-04-21 16:16:00 -07:00
Johannes Doerfert 46b7ed0e6f [Attributor] Remove dependence edges eagerly
If we have a dependence between an abstract attribute A to an abstract
attribute B such hat changes in A should trigger an update of B, we do
not need to keep the dependence around once the update was triggered. If
the dependence is still required the update will reinsert it into the
dependence map, if it is not we avoid triggering B in the future. This
replaces the "recompute interval" mechanism we used before to prune
stale dependences.

Number of required iterations is generally down, compile time for the
module pass (not really the CGSCC pass) is down quite a bit.

There is one test change which looks like an artifact in the undefined
behavior AA that needs to be looked at.
2020-04-21 15:22:10 -05:00
Johannes Doerfert ea439bbcbb [Attributor][NFC] Track the number of created AAs in the statistics 2020-04-21 15:22:10 -05:00
Johannes Doerfert c5794f77eb [Attributor][PM] Introduce `-attributor-enable={none,cgscc,module,all}`
The old command line option `-attributor-disable` was too coarse grained
as we want to measure the effects of the module or cgscc pass without
the other as well.

Since `none` is the default there is no real functional change.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D78571
2020-04-21 15:22:10 -05:00
Michael Liao 163bd9d858 Fix `-Wpedantic` warnings. NFC. 2020-04-21 16:09:17 -04:00
Michael Liao 21529355e1 Fix `-Wparentheses` warnings. NFC. 2020-04-21 15:02:59 -04:00
Roman Lebedev 352fef3f11
[InstCombine] Negator - sink sinkable negations
Summary:
As we have discussed previously (e.g. in D63992 / D64090 / [[ https://bugs.llvm.org/show_bug.cgi?id=42457 | PR42457 ]]), `sub` instruction
can almost be considered non-canonical. While we do convert `sub %x, C` -> `add %x, -C`,
we sparsely do that for non-constants. But we should.

Here, i propose to interpret `sub %x, %y` as `add (sub 0, %y), %x` IFF the negation can be sinked into the `%y`

This has some potential to cause endless combine loops (either around PHI's, or if there are some opposite transforms).
For former there's `-instcombine-negator-max-depth` option to mitigate it, should this expose any such issues
For latter, if there are still any such opposing folds, we'd need to remove the colliding fold.
In any case, reproducers welcomed!

Reviewers: spatel, nikic, efriedma, xbolva00

Reviewed By: spatel

Subscribers: xbolva00, mgorny, hiraditya, reames, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68408
2020-04-21 22:00:23 +03:00
Benjamin Kramer 9a08c30705 Bit-pack some pairs. No functionlity change intended. 2020-04-21 20:40:20 +02:00
Fangrui Song cca545ce46 [CallSite] Fix build breakage after D78538 2020-04-21 11:33:40 -07:00
Mircea Trofin d702325af6 [llvm][NFC][CallSite] Remove CallSite from DeadArgumentElimination
Summary: Also capitalized some induction variables, to match coding style.

Reviewers: dblaikie, craig.topper

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78538
2020-04-21 10:48:38 -07:00
Simon Pilgrim d9af50efbc [Transforms] getOrEnforceKnownAlignment - fix MSVC result of 32-bit shift implicitly converted to 64 bits warning. NFCI
We don't overflow here so we can use a U64 shift directly.
2020-04-21 18:32:12 +01:00
Johannes Doerfert 177c065e50 [Attributor] Use a pointer value type for the OpcodeInstMap
This reduces memory consumption and the need to copy complex data
structures repeatedly.

No functional change is intended.

---

Single run of the Attributor module and then CGSCC pass (oldPM)
for SPASS/clause.c (~10k LLVM-IR loc):

Before:
```
calls to allocation functions: 490390 (320725/s)
temporary memory allocations: 84601 (55330/s)
peak heap memory consumption: 41.70MB
peak RSS (including heaptrack overhead): 131.18MB
total memory leaked: 269.04KB
```

After:
```
calls to allocation functions: 489359 (301144/s)
temporary memory allocations: 82983 (51066/s)
peak heap memory consumption: 36.76MB
peak RSS (including heaptrack overhead): 126.48MB
total memory leaked: 269.04KB
```

Difference:
```
calls to allocation functions: -1031 (-10739/s)
temporary memory allocations: -1618 (-16854/s)
peak heap memory consumption: -4.94MB
peak RSS (including heaptrack overhead): 0B
total memory leaked: 0B

---
2020-04-21 11:20:09 -05:00
Johannes Doerfert 99662c22cd [Attributor] Use a pointer value type for the QueryMap
This reduces memory consumption and the need to copy complex data
structures repeatedly.

No functional change is intended.

---

Single run of the Attributor module and then CGSCC pass (oldPM)
for SPASS/clause.c (~10k LLVM-IR loc):

Before:
```
calls to allocation functions: 596180 (374484/s)
temporary memory allocations: 84979 (53378/s)
peak heap memory consumption: 52.14MB
peak RSS (including heaptrack overhead): 139.79MB
total memory leaked: 269.04KB
```

After:
```
calls to allocation functions: 489200 (303285/s)
temporary memory allocations: 83406 (51708/s)
peak heap memory consumption: 41.70MB
peak RSS (including heaptrack overhead): 131.76MB
total memory leaked: 269.04KB
```

Difference:
```
calls to allocation functions: -106980 (-5094285/s)
temporary memory allocations: -1573 (-74904/s)
peak heap memory consumption: -10.44MB
peak RSS (including heaptrack overhead): 0B
total memory leaked: 0B

---
2020-04-21 11:20:04 -05:00
Johannes Doerfert 1f570e019d [Attributor] Use a pointer value type for the access kind -> accesses map
This reduces memory consumption and the need to copy complex data
structures repeatedly.

No functional change is intended.

---

Single run of the Attributor module and then CGSCC pass (oldPM)
for SPASS/clause.c (~10k LLVM-IR loc):

Before:
```
calls to allocation functions: 616219 (381559/s)
temporary memory allocations: 83294 (51575/s)
peak heap memory consumption: 72.15MB
peak RSS (including heaptrack overhead): 160.04MB
total memory leaked: 269.04KB
```

After:
```
calls to allocation functions: 595004 (357145/s)
temporary memory allocations: 83840 (50324/s)
peak heap memory consumption: 52.14MB
peak RSS (including heaptrack overhead): 138.32MB
total memory leaked: 269.04KB
```

Difference:
```
calls to allocation functions: -21215 (-415980/s)
temporary memory allocations: 546 (10705/s)
peak heap memory consumption: -20.01MB
peak RSS (including heaptrack overhead): 0B
total memory leaked: 0B

---
2020-04-21 11:20:02 -05:00
Johannes Doerfert 40f3baeb20 [Attributor] Pass the Attributor to the AbstractAttribute constructors
AbstractAttribute::initialize is used to initialize the deduction and
the object we do not always call it. To make sure we have the option to
initialize the object even if initialize is not called we pass the
Attributor to AbstractAttribute constructors now.
2020-04-21 11:20:02 -05:00
Johannes Doerfert 91a6c88349 [Attributor] Use a pointer value type for the AAMap
This reduces memory consumption and the need to copy complex data
structures repeatedly.

No functional change is intended.

---

Single run of the Attributor module and then CGSCC pass (oldPM)
for SPASS/clause.c (~10k LLVM-IR loc):

Before:
```
calls to allocation functions: 613353 (376521/s)
temporary memory allocations: 83636 (51341/s)
peak heap memory consumption: 75.64MB
peak RSS (including heaptrack overhead): 162.97MB
total memory leaked: 269.04KB
```

After:
```
calls to allocation functions: 616575 (349929/s)
temporary memory allocations: 83650 (47474/s)
peak heap memory consumption: 72.15MB
peak RSS (including heaptrack overhead): 159.81MB
total memory leaked: 269.04KB
```

Difference:
```
calls to allocation functions: 3222 (24225/s)
temporary memory allocations: 14 (105/s)
peak heap memory consumption: -3.49MB
peak RSS (including heaptrack overhead): 0B
total memory leaked: 0B

---
2020-04-21 11:19:58 -05:00
Sanjay Patel 978166f209 [InstCombine] improve types/names for logic-of-icmp helper function; NFC 2020-04-21 10:16:45 -04:00
Florian Hahn 647c9e72e4 [VPlan] Make various tryTo* helpers private and mark as const (NFC).
The individual tryTo* helpers do not need to be public. Also, the
builder contained two consecutive public: sections, which is not
necessary. Moved the remaining public methods after the constructor.

Also make some of the tryTo* helpers const.

Reviewers: gilr, rengolin, Ayal, hsaito

Reviewed by: gilr

Differential Revision: https://reviews.llvm.org/D78288
2020-04-21 14:49:02 +01:00
Sanjay Patel ba72389269 [InstCombine] improve types/names for logic-of-icmp helper functions; NFC 2020-04-21 09:18:22 -04:00
Craig Topper 6235951ec0 [CallSite removal][Instrumentation] Use CallBase instead of CallSite in AddressSanitizer/DataFlowSanitizer/MemorySanitizer. NFC
Differential Revision: https://reviews.llvm.org/D78524
2020-04-20 22:39:14 -07:00
Max Kazantsev a116f0fa86 [LICM][NFC] Reorder checks to speed up things slightly
Side effect check is made faster than potentially heavy other checks.
2020-04-21 11:34:44 +07:00
Craig Topper 68b2e507e4 [Local] Update getOrEnforceKnownAlignment/getKnownAlignment to use Align/MaybeAlign.
Differential Revision: https://reviews.llvm.org/D78443
2020-04-20 21:31:44 -07:00
Johannes Doerfert dc3b5b00fe [OpenMPOpt] Make the combination of `ident_t*` deterministic
Before we kept the first applicable `ident_t*` during deduplication of
runtime calls. The problem is that "first" is dependent on the iteration
order of a DenseMap. Since the proper solution, which is to combine the
information from all `ident_t*`, should be deterministic on its own, we
will not try to make the iteration order deterministic. Instead, we will
create a fresh `ident_t*` if there is not a unique existing `ident_t*`
to pick.
2020-04-20 23:27:08 -05:00
Johannes Doerfert 8855fec37e [OpenMPOpt] Use a pointer value type in map
The value type was a set before which can easily lead to excessive
memory usage and copying. We use a pointer to a vector instead now.
2020-04-20 23:27:08 -05:00
Johannes Doerfert ee17263adc [OpenMPOpt] Make the SCC a vector to ensure deterministic results 2020-04-20 23:27:08 -05:00
Mircea Trofin c2d86e1f30 [llvm][NFC][CallSite] Remove CallSite from ArgumentPromotion
Reviewers: dblaikie, craig.topper

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78528
2020-04-20 19:33:42 -07:00
Johannes Doerfert 87aa362985 [Attributor] Use the BumpPtrAllocator in InformationCache as well
We now also use the BumpPtrAllocator from the Attributor in the
InformationCache. The lifetime of objects in either is pretty much the
same and it should result in consistently good performance regardless of
the allocator.

Doing so requires to call more constructors manually but so far that
does not seem to be problematic or messy.

---

Single run of the Attributor module and then CGSCC pass (oldPM)
for SPASS/clause.c (~10k LLVM-IR loc):

Before:
```
calls to allocation functions: 615359 (368257/s)
temporary memory allocations: 83315 (49859/s)
peak heap memory consumption: 75.64MB
peak RSS (including heaptrack overhead): 163.43MB
total memory leaked: 269.04KB
```

After:
```
calls to allocation functions: 613042 (359555/s)
temporary memory allocations: 83322 (48869/s)
peak heap memory consumption: 75.64MB
peak RSS (including heaptrack overhead): 162.92MB
total memory leaked: 269.04KB
```

Difference:
```
calls to allocation functions: -2317 (-68147/s)
temporary memory allocations: 7 (205/s)
peak heap memory consumption: 2.23KB
peak RSS (including heaptrack overhead): 0B
total memory leaked: 0B

---
2020-04-20 21:12:41 -05:00
Mircea Trofin 15cd1e36e4 [llvm][NFC][CallSite] Remove CallSite from CoroEarly
Reviewers: dblaikie, craig.topper

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78523
2020-04-20 18:15:25 -07:00
Sriraman Tallam 365b60fc93 New pass to make internal linkage symbol names unique.
With clang option -funique-internal-linkage-symbols, symbols with
internal linkage get names with the module hash appended.

Differential Revision: https://reviews.llvm.org/D78243
2020-04-20 15:05:22 -07:00
Craig Topper fcc9d70260 Revert "[Local] Update getOrEnforceKnownAlignment/getKnownAlignment to use Align/MaybeAlign."
This is breaking the clang build.

This reverts commit 897409fb56.
2020-04-20 13:25:06 -07:00
Craig Topper 897409fb56 [Local] Update getOrEnforceKnownAlignment/getKnownAlignment to use Align/MaybeAlign.
Differential Revision: https://reviews.llvm.org/D78443
2020-04-20 13:08:05 -07:00
Nikita Popov 54d01cbc15 [IPT] Don't use OrderedInstructions (NFC)
Use Instruction::comesBefore() instead of OrderedInstructions
inside InstructionPrecedenceTracking. This also removes the
dominator tree dependency.

Differential Revision: https://reviews.llvm.org/D78461
2020-04-20 18:25:31 +02:00
Bjorn Pettersson a8a31fdd80 [Scalarizer] Fix a non-deterministic scatter order problem
Summary:
The indexing operator in Scatterer may result in building new
instructions. When using multiple such operators in a function
argument list the order in which we build instructions depend on
argument evaluation order (which is undefined in C++).
This patch avoid such problems by expanding the components using
the [] operator prior to the function call.

Problem was seen when comparing output, while builing LLVM with
different compilers (clang vs gcc).

Reviewers: foad, cameron.mcinally, uabelho

Reviewed By: foad

Subscribers: hiraditya, mgrang, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78455
2020-04-20 16:05:33 +02:00
Florian Hahn fa284e136e [VPlan] Clean up tryToCreate(Widen)Recipe. (NFC)
This patch includes some clean-ups to tryToCreateRecipe, suggested in
D77973.

It includes:
  * Renaming tryToCreateRecipe to tryToCreateWidenRecipe.
  * Move VPBB insertion logic to caller of tryToCreateWidenRecipe.
  * Hoists instruction checks to tryToCreateWidenRecipe, making it
    clearer which instructions are handled by which recipe, simplifying
    the checks by using early exits.
  * Split up handling of induction PHIs and truncates using inductions.

Reviewers: gilr, rengolin, Ayal, hsaito

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D78287
2020-04-20 10:06:35 +01:00
Florian Hahn 4331b3812a [PredicateInfo] Use new Instruction::comesBefore instead of OI (NFC).
The recently added Instruction::comesBefore can be used instead of
OrderedInstructions.

Reviewers: rnk, nikic, efriedma

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D78452
2020-04-20 09:22:21 +01:00
Sam Parker e3056ae9a0 [NFC][TTI] Explicit use of VectorType
The API for shuffles and reductions uses generic Type parameters,
instead of VectorType, and so assertions and casts are used a lot.
This patch makes those types explicit, which means that the clients
can't be lazy, but results in less ambiguity, and that can only be a
good thing.

Bugzilla: https://bugs.llvm.org/show_bug.cgi?id=45562

Differential Revision: https://reviews.llvm.org/D78357
2020-04-20 09:16:52 +01:00
Craig Topper 53ee8fbc23 [CallSite removal][SCCP] Use CallBase instead of CallSite. NFC
Differential Revision: https://reviews.llvm.org/D78470
2020-04-20 00:16:09 -07:00
Craig Topper 4cf6d4ab48 [CallSite removal][CalledValuePropagation] Use CallBase instead of CallSite. NFC
Differential Revision: https://reviews.llvm.org/D78467
2020-04-19 22:05:40 -07:00
Florian Hahn a7aaadc135 [TTI] Clean up includes (NFC).
Remove some unnecessary includes, replace some with forward
declarations.

This also exposed a few places that were missing some includes.
2020-04-19 20:11:59 +01:00
Florian Hahn 32af48cdcf [IVDescriptors] Clean up includes.
Some includes are not required and forward declarations can be used
instead. This also exposed a few places that were not directly including
required files.
2020-04-19 20:07:47 +01:00
Florian Hahn 7a87e8f90b [LoopUtils] Clean up includes, use forward decls if appropriate (NFC).
Most of the includes in LoopUtils.h are not required in the header and
they can be replaced by forward declarations.

Unfortunately includes of TargetTransformInfo.h and IVDescriptors.h pull
in a bunch of additional things, but there is no easy way to get rid of
them at the moment I think.
2020-04-19 19:44:29 +01:00
Sanjay Patel bef6e67e95 [VectorCombine] transform bitcasted shuffle to wider elements
bitcast (shuf V, MaskC) --> shuf (bitcast V), MaskC'

This is the widen shuffle elements enhancement to D76727.
It builds on the analysis and simplifications in
D77881 and rG6a7e958a423e.

The phase ordering tests show that we can simplify inverse
shuffles across a binop in both directions (widen/narrow or
narrow/widen) now.

There's another potential transform visible in some of the
remaining TODOs - move a bitcasted operand of a shuffle
after the shuffle.

Differential Revision: https://reviews.llvm.org/D78371
2020-04-19 08:24:38 -04:00
Benjamin Kramer ff54d1c897 Remove remaining callers of CreateShuffleVector with unsigned indices and mark it as deprecated
No functionality change intended.
2020-04-19 11:48:28 +02:00
Florian Hahn 6ba0695c60 [ValueLattice] Add struct for merge options.
This makes it easier to extend the merge options in the future and also
reduces the risk of accidentally setting a wrong option.

Reviewers: efriedma, nikic, reames, davide

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D78368
2020-04-19 09:03:16 +01:00
Ayal Zaks 8e0c5f7200 [LV] Mark first-order recurrences as allowed exits
First-order recurrences require special treatment when they are live-out;
such treatment is provided by fixFirstOrderRecurrence(), so they should be
included in AllowedExit set.

(Should probably have been included originally in D16197.)

Fixes PR45526: AllowedExit set is used by prepareToFoldTailByMasking() to
check whether the treatment for live-outs also holds when folding the tail,
which is not (yet) the case for first-order recurrences.

Differential Revision: https://reviews.llvm.org/D78210
2020-04-18 23:54:21 +03:00
Craig Topper 7fde990694 Recommit "[Local] Simplify the alignment limits in getOrEnforceKnownAlignment. NFCI"
With a tweak to avoid a linker error for passing
MaxAlignmentExponent by reference to std::min.
2020-04-18 13:51:57 -07:00
Nikita Popov a42fd18d0f [PredicateInfo] Factor out PredicateInfoBuilder (NFC)
When running IPSCCP on a module with many small functions, memory
usage is dominated by PredicateInfo, which is a huge structure
(partially due to some unfortunate nested SmallVector use). However,
most of it is actually only temporary state needed to build
predicate info, and does not need to be retained after initial
construction.

This patch factors out the predicate building logic and state
into a separate PrediceInfoBuilder, with the extra bonus that
it does not need to live in the header anymore.

Differential Revision: https://reviews.llvm.org/D78326
2020-04-18 22:34:38 +02:00
Craig Topper 44d63b7528 Revert "[Local] Simplify the alignment limits in getOrEnforceKnownAlignment. NFCI"
This reverts commit e00cfe254d.

Seems to be causing a linker error on the build bots.
2020-04-18 13:23:29 -07:00
Craig Topper e00cfe254d [Local] Simplify the alignment limits in getOrEnforceKnownAlignment. NFCI
We previously clamped the trailing zero count to 31 bits. And
then clamped the final alignment to MaximumAlignment which is
1 << 29.

This patch simplifies this to just clamp the trailing zero to
29 using MaxAlignmentExponent.

I was looking into changing this function to use Align/MaybeAlign
and noticed this.

Differential Revision: https://reviews.llvm.org/D78418
2020-04-18 12:52:47 -07:00
Florian Hahn 46853b95ca [SCCP] Drop unused early exit from visitStoreInst (NFC).
There are no lattice values associated with store instructions
directly. They will never get marked as overdefined.
2020-04-18 19:44:54 +01:00
Florian Hahn 034e8d58a8 [SCCP] Drop unused early exit from visitReturnInst (NFC).
There are no lattice values associated with return instructions
directly. They will never get marked as overdefined.
2020-04-18 13:52:41 +01:00
Florian Hahn 4ee45ab60f [LV] Invalidate cost model decisions along with interleave groups.
Cost-modeling decisions are tied to the compute interleave groups
(widening decisions, scalar and uniform values). When invalidating the
interleave groups, those decisions also need to be invalidated.

Otherwise there is a mis-match during VPlan construction.
VPWidenMemoryRecipes created initially are left around w/o converting them
into VPInterleave recipes. Such a conversion indeed should not take place,
and these gather/scatter recipes may in fact be right. The crux is leaving around
obsolete CM_Interleave (and dependent) markings of instructions along with
their costs, instead of recalculating decisions, costs, and recipes.

Alternatively to forcing a complete recompute later on, we could try
to selectively invalidate the decisions connected to the interleave
groups. But we would likely need to run the uniform/scalar value
detection parts again anyways and the extra complexity is probably not
worth it.

Fixes PR45572.

Reviewers: gilr, rengolin, Ayal, hsaito

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D78298
2020-04-18 10:23:49 +01:00
Mircea Trofin 41ad8b7388 [llvm][NFC][CallSite] Remove CallSite from Evaluator.
Reviewers: craig.topper, dblaikie

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78395
2020-04-17 19:11:17 -07:00
Anna Thomas fd5e069d23 Fix buildbot failure due to obsolete CallSite usage
Fix buildbot failures due to ef49b1d97e
(which was a revert of a previous change).
2020-04-17 17:46:19 -04:00
Anna Thomas ef49b1d97e Revert "[InlineFunction] Update metadata on loads that are return values"
This reverts commit 1d0f757904 because of
https://bugs.llvm.org/show_bug.cgi?id=45590. Needs investigation.
2020-04-17 17:23:00 -04:00
Craig Topper 5f6d93c7d3 [CallSite removal][Attributor] Replaces use of CallSite with CallBase. NFC
Differential Revision: https://reviews.llvm.org/D78343
2020-04-17 10:44:31 -07:00
Craig Topper 0feaba683e [CallSite removal][MemCpyOptimizer] Replace CallSite with CallBase. NFC
There are also some adjustments to use MaybeAlign in here due
to CallBase::getParamAlignment() being deprecated. It would
be a little cleaner if getOrEnforceKnownAlignment was migrated
to Align/MaybeAlign.

Differential Revision: https://reviews.llvm.org/D78345
2020-04-17 10:32:45 -07:00
Craig Topper 8c94d616e1 Revert "[CallSite removal][MemCpyOptimizer] Replace CallSite with CallBase. NFC"
There were extra changes that weren't supposed to be in there

This reverts commit b91f78db37.
2020-04-17 10:11:22 -07:00
Craig Topper b91f78db37 [CallSite removal][MemCpyOptimizer] Replace CallSite with CallBase. NFC
There are also some adjustments to use MaybeAlign in here due
to CallBase::getParamAlignment() being deprecated. It would
be cleaner if getOrEnforceKnownAlignment was migrated
to Align/MaybeAlign.

Differential Revision: https://reviews.llvm.org/D78345
2020-04-17 10:07:20 -07:00
Florian Hahn c245d3e033 [ValueLattice] Steal bits from Tag to track range extensions (NFC).
Users of ValueLatticeElement currently have to ensure constant ranges
are not extended indefinitely. For example, in SCCP, mergeIn goes to
overdefined if a constantrange value is repeatedly merged with larger
constantranges. This is a simple form of widening.

In some cases, this leads to an unnecessary loss of information and
things can be improved by allowing a small number of extensions in the
hope that a fixed point is reached after a small number of steps.

To make better decisions about widening, it is helpful to keep track of
the number of range extensions. That state is tied directly to a
concrete ValueLatticeElement and some unused bits in the class can be
used. The current patch preserves the existing behavior by default:
CheckWiden defaults to false and if CheckWiden is true, a single change
to the range is allowed.

Follow-up patches will slightly increase the threshold for widening.

Reviewers: efriedma, davide, mssimpso

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D78145
2020-04-17 15:38:23 +01:00
Benjamin Kramer c5e7c2691d Remove accidental include.
Thank you clangd.
2020-04-17 16:36:30 +02:00
Benjamin Kramer b639091c02 Change users of CreateShuffleVector to pass the masks as int instead of Constants
No functionality change intended.
2020-04-17 16:34:29 +02:00
Benjamin Kramer 166467e822 [VectorUtils] Create shufflevector masks as int vectors instead of Constants
No functionality change intended.
2020-04-17 15:28:00 +02:00
Max Kazantsev 72c13446ce [NFC] Add missing 'const' notion to LCSSA-related functions
These functions don't really do any changes to loop info or
dominator tree. We should state this explicitly using 'const'.
2020-04-17 17:49:34 +07:00
Simon Pilgrim fa7f328a15 [cmake] LLVMVectorize - add include/llvm/Transforms/Vectorize header path
MSVC projects were missing the llvm/Transforms/Vectorize/* headers
2020-04-17 11:06:26 +01:00
Craig Topper 5034df8600 [SampleProfile] Use CallBase in function arguments and data structures to reduce the number of explicit casts. NFCI
Removing CallSite left us with a bunch of explicit casts from
Instruction to CallBase. This moves the casts earlier so that
function arguments and data structure types are CallBase so
we don't have to cast when we use them.

Differential Revision: https://reviews.llvm.org/D78246
2020-04-16 22:10:34 -07:00
Craig Topper 798b262c3c [CallSite removal][IPO] Change implementation of AbstractCallSite to store a CallBase* instead of CallSite. NFCI.
CallSite will likely be removed soon, but AbstractCallSite serves a different purpose and won't be going away.

This patch switches it to internally store a CallBase* instead of a
CallSite. The only interface changes are the removal of the getCallSite
method and getCallBackUses now takes a CallBase&. These methods had only
a few callers that were easy enough to update without needing a
compatibility shim.

In the future once the other CallSites are gone, the CallSite.h
header should be renamed to AbstractCallSite.h

Differential Revision: https://reviews.llvm.org/D78322
2020-04-16 16:24:45 -07:00
Bob Haarman cc5c58889e [WPD] Avoid noalias assumptions in unique return value optimization
Summary:
Changes the type of the @__typeid_.*_unique_member imports we generate
for unique return value optimization from i8 to [0 x i8]. This
prevents assuming that these imports do not alias, such as when
two unique return values occur in the same vtable.

Fixes PR45393.

Reviewers: tejohnson, pcc

Reviewed By: pcc

Subscribers: aganea, hiraditya, rnk, george.burgess.iv, dblaikie, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77421
2020-04-16 14:49:51 -07:00
Roman Lebedev b1fbf438f6
[OpenMPOpt] deduplicateRuntimeCalls(): avoid traditional map lookup pitfall
Summary:
This roughly halves time spent in that pass,
while unsurprisingly significantly reducing total memory usage.

This makes sense because most functions won't use any openmp functions..

old
```
   0.2329 (  0.5%)   0.0409 (  0.9%)   0.2738 (  0.5%)   0.2736 (  0.5%)  OpenMP specific optimizations
```
```
total runtime: 63.32s.
bytes allocated in total (ignoring deallocations): 8.34GB (131.70MB/s)
calls to allocation functions: 14526259 (229410/s)
temporary memory allocations: 3335760 (52680/s)
peak heap memory consumption: 324.36MB
peak RSS (including heaptrack overhead): 5.39GB
total memory leaked: 289.93MB
```

new
```
   0.1457 (  0.3%)   0.0276 (  0.6%)   0.1732 (  0.3%)   0.1731 (  0.3%)  OpenMP specific optimizations
```
```
total runtime: 55.01s.
bytes allocated in total (ignoring deallocations): 6.70GB (121.89MB/s)
calls to allocation functions: 14268205 (259398/s)
temporary memory allocations: 3225355 (58637/s)
peak heap memory consumption: 324.09MB
peak RSS (including heaptrack overhead): 5.39GB
total memory leaked: 289.87MB
```

diff
```
total runtime: -8.31s.
bytes allocated in total (ignoring deallocations): -1.63GB (196.58MB/s)
calls to allocation functions: -258054 (31034/s)
temporary memory allocations: -110405 (13277/s)
peak heap memory consumption: -262.36KB
peak RSS (including heaptrack overhead): 0B
total memory leaked: -61.45KB
```

Reviewers: jdoerfert, hfinkel

Reviewed By: jdoerfert

Subscribers: yaxunl, hiraditya, guansong, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78299
2020-04-16 19:54:02 +03:00
Bjorn Pettersson fdf9bad573 [Float2Int] Stop passing around a reference to the class member Roots. NFC
The Float2IntPass got a class member called Roots, but Roots
was also passed around to member function as a reference. This
patch simply remove those references.
2020-04-16 15:24:13 +02:00
Johannes Doerfert c4d3188adb [Attributor][NFC] Reduce indention for call site attribute seeding
Also added a TODO to remind us that indirect calls could be optimized as
well.
2020-04-16 02:32:31 -05:00
Johannes Doerfert 0741dec27b [Attributor][FIX] Handle droppable uses when replacing values
Since we use the fact that some uses are droppable in the Attributor we
need to handle them explicitly when we replace uses. As an example, an
assumed dead value can have live droppable users. In those we cannot
replace the value simply by an undef. Instead, we either drop the uses
(via `dropDroppableUses`) or keep them as they are. In this patch we do
both, depending on the situation. For values that are dead but not
necessarily removed we keep droppable uses around because they contain
information we might be able to use later. For values that are removed
we drop droppable uses explicitly to avoid replacement with undef.
2020-04-16 00:56:08 -05:00
Johannes Doerfert ea7f17ee38 [InstCombine] Simplify calls with casted `returned` attribute
The handling of the `returned` attribute in D75815 did miss the case
where the argument is (bit)casted to a different type. This is
explicitly allowed by the language reference and exposed by the
Attributor.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D77977
2020-04-16 00:56:00 -05:00
Johannes Doerfert 253d6be0f6 [Attributor][FIX] Properly check for accesses to globals
The check if globals were accessed was not always working because two
bits are set for NO_GLOBAL_MEM. The new check works also if only on kind
of globals (internal/external) is accessed.
2020-04-16 00:55:34 -05:00
Johannes Doerfert ad9c284cc3 [Attributor][NFC] Run the verifier only on functions and under EXPENSIVE_CHECKS
Running the verifier is expensive so we want to avoid it even in runs
that enable assertions. As we move closer to enabling the Attributor
this code will be executed by some buildbots but not cause overhead for
most people.
2020-04-16 00:55:33 -05:00
Craig Topper 8e1408695c [CallSite removal][TargetLibraryInfo] Replace ImmutableCallSite with CallBase in one of the getLibFunc signatures. NFC
Differential Revision: https://reviews.llvm.org/D78083
2020-04-15 22:43:41 -07:00
Mircea Trofin 4213bc761a [llvm][NFC][CallSite] Removed CallSite from some implementation details.
Reviewers: craig.topper, dblaikie

Subscribers: hiraditya, jfb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78256
2020-04-15 22:27:05 -07:00
Johannes Doerfert 898bbc252a [Attributor] Lazily collect function information
Before, we eagerly analyzed all the functions to collect information
about them, e.g. what instructions may read/write memory. This had
multiple drawbacks:
  - In CGSCC-mode we can end up looking at a callee which is not in the
    SCC but for which we need an initialized cache.
  - We end up looking at functions that we deem dead and never need to
    analyze in the first place.
  - We have a implicit dependence which is easy to break.

This patch moves the function analysis into the information cache and
makes it lazy. There is no real functional change expected except due to
the first reason above.
2020-04-15 22:26:38 -05:00
Johannes Doerfert 8c4057e3a3 [Attributor] Replace call graph call sites after function replacement
The CallGraphUpdater allows to directly alter call site information and
we should do so. This might appease the windows buildbot that crashes
during the SCC traversal.
2020-04-15 22:24:09 -05:00
Johannes Doerfert df675890b7 [CallGraphUpdater][NFC] Minor updates to D77855
I uploaded the old version accidentally instead of the one with these
minor adjustments requested by the reviewers.

Differential Revision: https://reviews.llvm.org/D77855
2020-04-15 21:26:35 -05:00
Alina Sbirlea edccc35e8f [Reassociate] Preserve AAManager and BasicAA analyses.
Now Reassociate Pass invalidates the analysis results of AAManager and BasicAA,
but it saves GlobalsAA, although it seems that it should preserve them, since
it affects only Unary and Binary operators.

Author: kpolushin (Kirill)

Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D77137
2020-04-15 16:58:03 -07:00
Johannes Doerfert 937025757c [CallGraphUpdater] Remove nodes from their SCC (old PM)
Summary:
We can and should remove deleted nodes from their respective SCCs. We
did not do this before and this was a potential problem even though I
couldn't locally trigger an issue. Since the `DeleteNode` would assert
if the node was not in the SCC, we know we only remove nodes from their
SCC and only once (when run on all the Attributor tests).

Reviewers: lebedev.ri, hfinkel, fhahn, probinson, wristow, loladiro, sstefan1, uenoku

Subscribers: hiraditya, bollu, uenoku, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77855
2020-04-15 18:38:50 -05:00
Johannes Doerfert 1b34b84ddd [CallGraphUpdater] Update the ExternalCallingNode for node replacements
Summary:
While it is uncommon that the ExternalCallingNode needs to be updated,
it can happen. It is uncommon because most functions listed as callees
have external linkage, modifying them is usually not allowed. That said,
there are also internal functions that have, or better had, their
"address taken" at construction time. We conservatively assume various
uses cause the address "to be taken". Furthermore, the user might have
become dead at some point. As a consequence, transformations, e.g., the
Attributor, might be able to replace a function that is listed
as callee of the ExternalCallingNode.

Since there is no function corresponding to the ExternalCallingNode, we
did just remove the node from the callee list if we replaced it (so
far). Now it would be preferable to replace it if needed and remove it
otherwise. However, removing the node has implications on the CGSCC
iteration. Locally, that caused some other nodes to be never visited
but it is for sure possible other (bad) side effects can occur. As it
seems conservatively safe to keep the new node in the callee list we
will do that for now.

Reviewers: lebedev.ri, hfinkel, fhahn, probinson, wristow, loladiro, sstefan1, uenoku

Subscribers: hiraditya, bollu, uenoku, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77854
2020-04-15 18:38:50 -05:00
Johannes Doerfert 7ec8d79385 [CallGraphUpdater] Properly remove strongly connected components (oldPM)
Summary:
The old code did eliminate references from and to functions that were
about to be deleted only just before we deleted them. This can cause
references from other functions that are supposed to be deleted to still
exist, depending on the order. If the functions form a strongly
connected component the problem manifests regardless of the order in
which we try to actually delete the functions.

This patch introduces a two step deletion. First we remove all
references and then we delete the function. Note that this only affects
the old call graph. There should not be any functional changes if no old
style call graph was given.

To test this we delete two strongly connected functions instead of one
in an existing test.

Reviewers: hfinkel

Subscribers: hiraditya, bollu, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77975
2020-04-15 18:38:49 -05:00
Craig Topper 240725666a [CallSite removal][CallSiteSplitting] Use CallBase instead of CallSite. NFC
Differential Revision: https://reviews.llvm.org/D78240
2020-04-15 15:38:02 -07:00
Craig Topper fbb804983d [CallSite removal][CloneFunction] Use CallSite instead of CallBase. NFC
Differential Revision: https://reviews.llvm.org/D78236
2020-04-15 15:38:02 -07:00
Philip Reames 80c46c53bd [PoisonChecking] Further clarify file scope comment, and update to match naming now used in code 2020-04-15 14:48:53 -07:00
Philip Reames 463513e959 [NFC] Adjust style and clarify comments in PoisonChecking 2020-04-15 14:48:53 -07:00
Philip Reames 75ca7127bc [NFC] Use new canCreatePoison to make code intent more clear in PoisonChecking 2020-04-15 14:48:53 -07:00
Craig Topper 592d8e7d75 [CallSite removal][SimpleLoopUnswitch] Use CallBase instead of CallSite. NFC
Differential Revision: https://reviews.llvm.org/D78227
2020-04-15 13:25:02 -07:00
Craig Topper 7b6ff8bf1f [CallSite removal][SampleProfile] Use CallBase instead of CallSite. NFC
Differential Revision: https://reviews.llvm.org/D78219
2020-04-15 12:47:17 -07:00
Davide Italiano 5f87415efc [LICM] Try to merge debug locations when sinking.
The current strategy LICM uses when sinking for debuginfo is
that of picking the debug location of one of the uses.
This causes stepping to be wrong sometimes, see, e.g. PR45523.

This patch introduces a generalization of getMergedLocation(),
that operates on a vector of locations instead of two, and try
to merge all them together, and use the new API in LICM.

<rdar://problem/61750950>
2020-04-15 12:29:34 -07:00
Craig Topper a0d92248ea [CallSite removal][PruneEH] Use CallBase instead of CallSite. NFC
Reviewers: mtrofin, dblaikie

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78182
2020-04-15 10:11:41 -07:00
Sanjay Patel 01bcc3e937 [InstCombine] prevent infinite loop with sub/abs of constant expression
PR45539:
https://bugs.llvm.org/show_bug.cgi?id=45539
2020-04-15 09:19:16 -04:00
Benjamin Kramer cc035d475f Upgrade users of 'new ShuffleVectorInst' to pass indices as an int array
No functionality change intended.
2020-04-15 14:29:43 +02:00
Florian Hahn 3f7f06888b [VPlan] Branches are not widened by VPWidenRecipe, assert (NFC). 2020-04-15 12:03:45 +01:00
Benjamin Kramer 6f64daca8f Upgrade calls to CreateShuffleVector to use the preferred form of passing an array of ints
No functionality change intended.
2020-04-15 12:51:38 +02:00
Florian Hahn 5b4b3e0b6e [VPlan] Move widening check for non-memory/non-calls to function (NFC).
After introducing VPWidenSelectRecipe, the duplicated logic can be
shared.

Reviewers: gilr, rengolin, Ayal, hsaito

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D77973
2020-04-15 11:48:37 +01:00
Florian Hahn cf9ee49b4d [DSE] Lift post-dominance for objs not accessible in caller.
We can eliminate MemoryDefs of objects not accessible after the function
returns (e.g. alloca), if there are no reads between the MemoryDef and
any function exits. We can stop traversing paths that completely
overwrite the memory location of the MemoryDef.

This patch was split off D73763.

Reviewers: dmgreen, bryant, asbirlea, Tyker, efriedma, george.burgess.iv

Reviewed By: asbirlea, george.burgess.iv

Differential Revision: https://reviews.llvm.org/D77736
2020-04-15 11:37:14 +01:00
Sameer Sahasrabuddhe 7bb9f500e2 fix warning: specialization of template in different namespace
This is related to commit 8c11bc0cd0
which introduces the FixIrreducible pass. The warning seems hard to
reproduce locally. The latest attempt ought to work.
2020-04-15 15:57:53 +05:30
Sameer Sahasrabuddhe 8c11bc0cd0 Introduce fix-irreducible pass
An irreducible SCC is one which has multiple "header" blocks, i.e., blocks
with control-flow edges incident from outside the SCC. This pass converts an
irreducible SCC into a natural loop by introducing a single new header
block and redirecting all the edges on the original headers to this
new block.

This is a useful workaround for a limitation in the structurizer
which, which produces incorrect control flow in the presence of
irreducible regions. The AMDGPU backend provides an option to
enable this pass before the structurizer, which may eventually be
enabled by default.

Reviewed By: nhaehnle

Differential Revision: https://reviews.llvm.org/D77198

This restores commit 2ada8e2525.

Originally reverted with commit 44e09b59b8.
2020-04-15 15:05:51 +05:30
Florian Hahn 79d185c792 [VPlan] Move Load/Store checks out of tryToWiden (NFC).
Handling LoadInst and StoreInst in tryToWiden seems a bit
counter-intuitive, as there is only an assertion for them and in no
case VPWidenRefipes are created for them.

I think it makes sense to move the assertion to handleReplication, where
the non-widened loads and store are handled.

Reviewers: gilr, rengolin, Ayal, hsaito

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D77972
2020-04-15 10:18:42 +01:00
Gil Rapaport b747d72c19 [LV] Fix PR45525: Incorrect assert in blend recipe
Fix an assert introduced in 41ed5d856c1: a phi with a single predecessor and a
mask is a valid case which is already supported by the code.

Differential Revision: https://reviews.llvm.org/D78115
2020-04-15 10:39:07 +03:00
Sameer Sahasrabuddhe 44e09b59b8 Revert "Introduce fix-irreducible pass"
This reverts commit 2ada8e2525.

Buildbots produced compilation errors which I was not able to quickly
reproduce locally. Need more time to investigate.
2020-04-15 12:19:50 +05:30
Sameer Sahasrabuddhe 2ada8e2525 Introduce fix-irreducible pass
An irreducible SCC is one which has multiple "header" blocks, i.e., blocks
with control-flow edges incident from outside the SCC. This pass converts an
irreducible SCC into a natural loop by introducing a single new header
block and redirecting all the edges on the original headers to this
new block.

This is a useful workaround for a limitation in the structurizer
which, which produces incorrect control flow in the presence of
irreducible regions. The AMDGPU backend provides an option to
enable this pass before the structurizer, which may eventually be
enabled by default.

Reviewed By: nhaehnle

Differential Revision: https://reviews.llvm.org/D77198
2020-04-15 11:29:19 +05:30
Teresa Johnson 33ffb62e23 Allow disabling of vectorization using internal options
Summary:
Currently, the internal options -vectorize-loops, -vectorize-slp, and
-interleave-loops do not have much practical effect. This is because
they are used to initialize the corresponding flags in the pass
managers, and those flags are then unconditionally overwritten when
compiling via clang or via LTO from the linkers. The only exception was
-vectorize-loops via opt because of some special hackery there.

While vectorization could still be disabled when compiling via clang,
using -fno-[slp-]vectorize, this meant that there was no way to disable
it when compiling in LTO mode via the linkers. This only affected
ThinLTO, since for regular LTO vectorization is done during the compile
step for scalability reasons. For ThinLTO it is invoked in the LTO
backends. See also the discussion on PR45434.

This patch makes it so the internal options can actually be used to
disable these optimizations. Ultimately, the best long term solution is
to mark the loops with metadata (similar to the approach used to fix
-fno-unroll-loops in D77058), but this enables a shorter term
workaround, and actually makes these internal options useful.

I constant propagated the initial values of these internal flags into
the pass manager flags (for some reasons vectorize-loops and
interleave-loops were initialized to true, while vectorize-slp was
initialized to false). As mentioned above, they are overwritten
unconditionally so this doesn't have any real impact, and these initial
values aren't particularly meaningful.

I then changed the passes to check the internl values and return without
performing the associated optimization when false (I changed the default
of -vectorize-slp to true so the options behave similarly). I was able
to remove the hackery in opt used to get -vectorize-loops=false to work,
as well as a special option there used to disable SLP vectorization.

Finally, I changed thinlto-slp-vectorize-pm.c to:
a) Only test SLP (moved the loop vectorization checking to a new test).
b) Use code that is slp vectorized when it is enabled, and check that
instead of whether the pass is enabled.
c) Test the new behavior of -vectorize-slp.
d) Test both pass managers.

The loop vectorization (and associated interleaving) testing I moved to
a new thinlto-loop-vectorize-pm.c test, with several changes:
a) Changed the flags on the interleaving testing so that it will
actually interleave, and check that.
b) Test the new behavior of -vectorize-loops and -interleave-loops.
c) Test both pass managers.

Reviewers: fhahn, wmi

Subscribers: hiraditya, steven_wu, dexonsmith, cfe-commits, davezarzycki, llvm-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D77989
2020-04-14 18:09:10 -07:00
Mircea Trofin 447e2c3067 [llvm][NFC][CallSite] Remove Implementation uses of CallSite
Reviewers: dblaikie, davidxl, craig.topper

Subscribers: arsenm, dschuff, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78142
2020-04-14 14:49:47 -07:00
Christopher Tetreault 8226d599ff [SVE] Remove calls to getBitWidth from Transforms
Reviewers: efriedma, sdesmalen, spatel, eugenis, chandlerc

Reviewed By: efriedma

Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77896
2020-04-14 14:31:42 -07:00
Huihui Zhang 5c1d1a62e3 [InstCombine][SVE] Fix visitGetElementPtrInst for scalable type.
Summary:
This patch fix the following issues in InstCombiner::visitGetElementPtrInst

    1. Skip for scalable type if transformation requires fixed size number of
    vector element.
    2. Skip for scalable type if transformation relies on compile-time known type
    alloc size.
    3. Use VectorType::getElementCount when scalable property is used to construct
    new VectorType.
    4. Use TypeSize::getKnownMinSize when minimal size of a scalable type is valid to determine GEP 'inbounds'.
    5. Explicitly call TypeSize::getFixedSize to avoid implicit type conversion to uint64_t.

Reviewers: sdesmalen, efriedma, spatel, ctetreau

Reviewed By: efriedma

Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78081
2020-04-14 12:38:32 -07:00
Sanjay Patel 6a7e958a42 [InstCombine] try to reduce more shuffles with bitcasted operand
This is the widen mask element sibling to D76844.

shuf (bitcast X), undef, Mask --> bitcast X'

http://volta.cs.utah.edu:8080/z/4dt3V8
2020-04-14 15:03:59 -04:00
Benjamin Kramer 7bf166665e [FunctionAttrs] Don't copy all the nodes where a reference is fine. 2020-04-14 17:18:23 +02:00
Max Kazantsev f8a42bca28 [ADCE] Fix incorrect reporting of CFG changes
This patch fixes 2 related bugs in ADCE:
- `performDeadCodeElimination` does not report changes if it did ONLY
  CFG changes (affects both old and new pass managers);
- When control flow removal is enabled, new pass manager does not
  drop CFG analyses.

Both can lead to incorrect loop info after ADCE that does only CFG changes.

Differential Revision: https://reviews.llvm.org/D78103
Reviewed By: Denis Antrushin
2020-04-14 20:26:13 +07:00
Aaron Puchert e833e58300 [ValueLattice] Remove unused DataLayout parameter of mergeIn, NFC
Reviewed By: fhahn, echristo

Differential Revision: https://reviews.llvm.org/D78061
2020-04-14 13:32:53 +02:00
Georgii Rymar 1647ff6e27 [ADT/STLExtras.h] - Add llvm::is_sorted wrapper and update callers.
It can be used to avoid passing the begin and end of a range.
This makes the code shorter and it is consistent with another
wrappers we already have.

Differential revision: https://reviews.llvm.org/D78016
2020-04-14 14:11:02 +03:00
Florian Hahn 38609fa9e4 Recommit "[SCCP] Use SimplifyBinOp for non-integer constant/expressions & overdef."
This includes a fix reported with simplifications in the presence of
NaN.

This reverts the revert commit 06408451bf.
2020-04-14 11:48:52 +01:00
Tyker 3bdfa966ec [AssumeBundles] preserve knowledge in DCE
Reviewers: jdoerfert

Reviewed By: jdoerfert

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77403
2020-04-14 12:48:15 +02:00
Tyker 086de7673e [AssumeBundles] preserve knowledge in DSE
Reviewers: jdoerfert

Reviewed By: jdoerfert

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77404
2020-04-14 12:48:15 +02:00
Tyker de4dc275f5 [AssumeBundles] preserve information in NewGVN
Reviewers: jdoerfert

Reviewed By: jdoerfert

Subscribers: Prazek, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77406
2020-04-14 12:48:14 +02:00
Tyker c35194b800 [AssumeBundles] preserve information in LICM
Reviewers: jdoerfert

Reviewed By: jdoerfert

Subscribers: hiraditya, asbirlea, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77407
2020-04-14 12:48:14 +02:00
Tyker 1d2b76a8fc [AssumeBundles] adapte GVN to assume bundles
Summary:
prevent GVN from removing assume bundles
make GVN preserve information from removed instructions

Reviewers: jdoerfert

Reviewed By: jdoerfert

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77405
2020-04-14 12:48:14 +02:00
Pratyai Mazumder 0c61e91100 [SanitizerCoverage] The section name for inline-bool-flag was too long for darwin builds, so shortening it.
Summary:
Following up on the comments on D77638.

Not undoing rGd6525eff5ebfa0ef1d6cd75cb9b40b1881e7a707 here at the moment, since I don't know how to test mac builds. Please let me know if I should include that here too.

Reviewers: vitalybuka

Reviewed By: vitalybuka

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77889
2020-04-14 02:06:33 -07:00
Mircea Trofin 4aae4e3f48 [llvm][NFC] CallSite removal from inliner-related files
Summary: This removes CallSite from inliner files. Some dependencies where thus affected.

Reviewers: dblaikie, davidxl, craig.topper

Subscribers: arsenm, jvesely, nhaehnle, eraman, hiraditya, aheejin, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77991
2020-04-13 21:28:58 -07:00
Mehdi Amini 384ca190ae Revert "Move ModuleSummaryAnalysis from libAnalysis to libObject to break the dependency from Analysis to Object"
This reverts commit 10df1563d6.

Some buildbots are broken.
2020-04-14 00:27:08 +00:00
Mehdi Amini 10df1563d6 Move ModuleSummaryAnalysis from libAnalysis to libObject to break the dependency from Analysis to Object
ModuleSummaryAnalysis is the only file in libAnalysis that brings a
dependency on the CodeGen layer from libAnalysis, moving it breaks this
dependency.

Differential Revision: https://reviews.llvm.org/D77994
2020-04-13 23:12:11 +00:00
Benjamin Kramer f1542efd97 [CHR] Clean up some code and reduce copying. NFCI. 2020-04-13 23:11:20 +02:00
Christopher Tetreault 3297e9b7c3 Clean up usages of asserting vector getters in Type
Summary:
Remove usages of asserting vector getters in Type in preparation for the
VectorType refactor. The existence of these functions complicates the
refactor while adding little value.

Reviewers: rriddle, sdesmalen, efriedma

Reviewed By: efriedma

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77259
2020-04-13 12:29:43 -07:00
Benjamin Kramer ec228d722c [InstCombine] Use SmallBitVector for convienently checking if all bits are set 2020-04-13 20:37:15 +02:00
Vedant Kumar 4831f4b7bd [InstCombine] Fix debug variance issue in tryToMoveFreeBeforeNullTest
Fix an issue where the presence of debug info could disable an
optimization in tryToMoveFreeBeforeNullTest.
2020-04-13 10:55:17 -07:00
Vedant Kumar 122a6bfb07 [Debugify] Strip added metadata in the -debugify-each pipeline
Summary:
Share logic to strip debugify metadata between the IR and MIR level
debugify passes. This makes it simpler to hunt for bugs by diffing IR
with vs. without -debugify-each turned on.

As a drive-by, fix an issue causing CallGraphNodes to become invalid
when a dead llvm.dbg.value prototype is deleted.

Reviewers: dsanders, aprantl

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77915
2020-04-13 10:55:17 -07:00
Gil Rapaport 41ed5d856c [LV] Clean up vectorizeInterleaveGroup (NFCI)
Pass from the calling recipe the interleave group itself instead of passing the
group's insertion position and having the function query CM for its interleave
group and making sure that given instruction is the insertion point of.

Differential Revision: https://reviews.llvm.org/D78002
2020-04-13 13:15:06 +03:00
Tyker 813f438baa [AssumeBundles] adapt Assumption cache to assume bundles
Summary: change assumption cache to store an assume along with an index to the operand bundle containing the knowledge.

Reviewers: jdoerfert, hfinkel

Reviewed By: jdoerfert

Subscribers: hiraditya, mgrang, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77402
2020-04-13 12:04:51 +02:00
Benjamin Kramer 06408451bf Revert "[SCCP] Use SimplifyBinOp for non-integer constant/expressions & overdef."
This reverts commit 1a02aaeaa4. Crashes on
the following test case:

$ cat crash.ll
source_filename = "__compute_module"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-grtev4-linux-gnu"

@0 = private unnamed_addr constant [24 x i8] c"\00\00\C0\7F\00\00\C0\7F\09\85\08?\ED\C94\FE~\EB/\F3\90\CF\BA\C1"
@1 = private unnamed_addr constant [24 x i8] c"\00\00\C0\7F\A3\A0\0FA\00\00\C0\7F\00\00\C0\7F\00\00\00\00\02\9AA\00"

define void @IgammaSpecialValues.448() {
entry:
  br label %fusion.26.loop_header.dim.0

fusion.26.loop_header.dim.0:                      ; preds = %fusion.26.loop_header.dim.0, %entry
  %fusion.26.invar_address.dim.0.0 = phi i64 [ 0, %entry ], [ %invar.inc17, %fusion.26.loop_header.dim.0 ]
  %0 = getelementptr inbounds [6 x float], [6 x float]* bitcast ([24 x i8]* @0 to [6 x float]*), i64 0, i64 %fusion.26.invar_address.dim.0.0
  %1 = load float, float* %0
  %2 = fmul float %1, 0.000000e+00
  %3 = getelementptr inbounds [6 x float], [6 x float]* bitcast ([24 x i8]* @1 to [6 x float]*), i64 0, i64 %fusion.26.invar_address.dim.0.0
  %4 = load float, float* %3
  %5 = fneg float %4
  %6 = fadd float %2, %5
  %invar.inc17 = add nuw nsw i64 %fusion.26.invar_address.dim.0.0, 1
  br label %fusion.26.loop_header.dim.0
}

$ opt -ipsccp -S < crash.ll
opt: llvm/include/llvm/Analysis/ValueLattice.h:251: bool llvm::ValueLatticeElement::markConstant(llvm::Constant *, bool): Assertion `getConstant() == V && "Marking constant with different value"' failed.
2020-04-13 11:23:26 +02:00
Florian Hahn 18138e0252 [VPlan] Introduce VPWidenSelectRecipe (NFC).
Widening a selects depends on whether the condition is loop invariant or
not. Rather than checking during codegen-time, the information can be
recorded at the VPlan construction time.

This was suggested as part of D76992, to reduce the reliance on
accessing the original underlying IR values.

Reviewers: gilr, rengolin, Ayal, hsaito

Reviewed By: gilr

Differential Revision: https://reviews.llvm.org/D77869
2020-04-13 08:35:28 +01:00
Eli Friedman cfb844265a [GlobalOpt] Explicitly set alignment of bool load/store operations. 2020-04-12 16:03:12 -07:00
Huihui Zhang 4bde7c5986 [NFC] Use VectorType::isScalable to align with ongoing VectorType refactor. 2020-04-12 15:39:13 -07:00
Mircea Trofin d2f1cd5d97 [llvm][NFC] Refactor uses of CallSite to CallBase - call promotion
Summary:
Updated CallPromotionUtils and impacted sites. Parameters that are
expected to be non-null, and return values that are guranteed non-null,
were replaced with CallBase references rather than pointers.

Left FIXME in places where more changes are facilitated by CallBase, but
aren't CallSites: Instruction* parameters or return values, for example,
where the contract that they are actually CallBase values.

Reviewers: davidxl, dblaikie, wmi

Reviewed By: dblaikie

Subscribers: arsenm, jvesely, nhaehnle, eraman, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77930
2020-04-12 08:27:29 -07:00
Florian Hahn ae1e353a25 [VPlan] Turn classes with all public members into structs (NFC).
struct should be used when all members are public:
 https://llvm.org/docs/CodingStandards.html#use-of-class-and-struct-keywords

Reviewers: gilr, rengolin, Ayal, hsaito

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D77865
2020-04-12 11:03:39 +01:00
Sanjay Patel 1318ddbc14 [VectorUtils] rename scaleShuffleMask to narrowShuffleMaskElts; NFC
As proposed in D77881, we'll have the related widening operation,
so this name becomes too vague.

While here, change the function signature to take an 'int' rather
than 'size_t' for the scaling factor, add an assert for overflow of
32-bits, and improve the documentation comments.
2020-04-11 10:05:49 -04:00
Benjamin Kramer e590bd6b92 [argpromote] Use formatv to simplify code. NFCI. 2020-04-11 14:54:32 +02:00
Florian Hahn 719846c469 [VPlan] Drop redundant private: at beginning of class defs (NFC).
Default visibility for classes is private, so the private: at the top of
various class definitions is redundant.

Reviewers: gilr, rengolin, Ayal, hsaito

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D77810
2020-04-11 13:27:10 +01:00
Huihui Zhang 6e7eeb44b3 [GVN] Fix VNCoercion for Scalable Vector.
Summary:
For VNCoercion, skip scalable vector when analysis rely on fixed size,
otherwise call TypeSize::getFixedSize() explicitly.

Add unit tests to check funtionality of GVN load elimination for scalable type.

Reviewers: sdesmalen, efriedma, spatel, fhahn, reames, apazos, ctetreau

Reviewed By: efriedma

Subscribers: bjope, hiraditya, jfb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76944
2020-04-10 17:49:07 -07:00
Eric Christopher 45dca04395 Exclude bitcast and ext/trunc signbit optimization on ppc_fp128
Revision a1c05fe <https://reviews.llvm.org/rGa1c05fe20f3def1f1be9f50d2adefc6b6f1578ad>
removed bitcast from the list of problematic transformations, however:

  %97 = fptrunc ppc_fp128 %2 to double            // we need to check ppc_fp128 here to prevent the transformation
  %98 = bitcast double %97 to i64                 // a1c05fe checks ppc_fp128 at here
  %99 = icmp slt i64 %98, 0
  %100 = zext i1 %99 to i8
  store i8 %100, i8* %7, align 1

so this patch does that. I'm also disabling it in the presence of extend just in case.

I verified separately that the hash of -std::infinity and std::infinity don't match now.

Differential Revision: https://reviews.llvm.org/D77911
2020-04-10 17:07:55 -07:00
Mircea Trofin da9bcdaad9 [llvm][NFC] Inliner.cpp: ensure InlineHistory ID is always initialized;
Summary:
The inline history is associated with a call site. There are two locations
we fetch inline history. In one, we fetch it together with the call
site. In the other, we initialize it under certain conditions, use it
later under same conditions (different if check), and otherwise is
uninitialized. Although currently there is no uninitialized use, the
code is more challenging to maintain correctly, than if the value were
always initialized.

Changed to the upfront initialization pattern already present in this
file.

Reviewers: davidxl, dblaikie

Subscribers: eraman, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77877
2020-04-10 15:28:53 -07:00
Matt Morehouse bef187c750 Implement `-fsanitize-coverage-whitelist` and `-fsanitize-coverage-blacklist` for clang
Summary:
This commit adds two command-line options to clang.
These options let the user decide which functions will receive SanitizerCoverage instrumentation.
This is most useful in the libFuzzer use case, where it enables targeted coverage-guided fuzzing.

Patch by Yannis Juglaret of DGA-MI, Rennes, France

libFuzzer tests its target against an evolving corpus, and relies on SanitizerCoverage instrumentation to collect the code coverage information that drives corpus evolution. Currently, libFuzzer collects such information for all functions of the target under test, and adds to the corpus every mutated sample that finds a new code coverage path in any function of the target. We propose instead to let the user specify which functions' code coverage information is relevant for building the upcoming fuzzing campaign's corpus. To this end, we add two new command line options for clang, enabling targeted coverage-guided fuzzing with libFuzzer. We see targeted coverage guided fuzzing as a simple way to leverage libFuzzer for big targets with thousands of functions or multiple dependencies. We publish this patch as work from DGA-MI of Rennes, France, with proper authorization from the hierarchy.

Targeted coverage-guided fuzzing can accelerate bug finding for two reasons. First, the compiler will avoid costly instrumentation for non-relevant functions, accelerating fuzzer execution for each call to any of these functions. Second, the built fuzzer will produce and use a more accurate corpus, because it will not keep the samples that find new coverage paths in non-relevant functions.

The two new command line options are `-fsanitize-coverage-whitelist` and `-fsanitize-coverage-blacklist`. They accept files in the same format as the existing `-fsanitize-blacklist` option <https://clang.llvm.org/docs/SanitizerSpecialCaseList.html#format>. The new options influence SanitizerCoverage so that it will only instrument a subset of the functions in the target. We explain these options in detail in `clang/docs/SanitizerCoverage.rst`.

Consider now the woff2 fuzzing example from the libFuzzer tutorial <https://github.com/google/fuzzer-test-suite/blob/master/tutorial/libFuzzerTutorial.md>. We are aware that we cannot conclude much from this example because mutating compressed data is generally a bad idea, but let us use it anyway as an illustration for its simplicity. Let us use an empty blacklist together with one of the three following whitelists:

```
  # (a)
  src:*
  fun:*

  # (b)
  src:SRC/*
  fun:*

  # (c)
  src:SRC/src/woff2_dec.cc
  fun:*
```

Running the built fuzzers shows how many instrumentation points the compiler adds, the fuzzer will output //XXX PCs//. Whitelist (a) is the instrument-everything whitelist, it produces 11912 instrumentation points. Whitelist (b) focuses coverage to instrument woff2 source code only, ignoring the dependency code for brotli (de)compression; it produces 3984 instrumented instrumentation points. Whitelist (c) focuses coverage to only instrument functions in the main file that deals with WOFF2 to TTF conversion, resulting in 1056 instrumentation points.

For experimentation purposes, we ran each fuzzer approximately 100 times, single process, with the initial corpus provided in the tutorial. We let the fuzzer run until it either found the heap buffer overflow or went out of memory. On this simple example, whitelists (b) and (c) found the heap buffer overflow more reliably and 5x faster than whitelist (a). The average execution times when finding the heap buffer overflow were as follows: (a) 904 s, (b) 156 s, and (c) 176 s.

We explain these results by the fact that WOFF2 to TTF conversion calls the brotli decompression algorithm's functions, which are mostly irrelevant for finding bugs in WOFF2 font reconstruction but nevertheless instrumented and used by whitelist (a) to guide fuzzing. This results in longer execution time for these functions and a partially irrelevant corpus. Contrary to whitelist (a), whitelists (b) and (c) will execute brotli-related functions without instrumentation overhead, and ignore new code paths found in them. This results in faster bug finding for WOFF2 font reconstruction.

The results for whitelist (b) are similar to the ones for whitelist (c). Indeed, WOFF2 to TTF conversion calls functions that are mostly located in SRC/src/woff2_dec.cc. The 2892 extra instrumentation points allowed by whitelist (b) do not tamper with bug finding, even though they are mostly irrelevant, simply because most of these functions do not get called. We get a slightly faster average time for bug finding with whitelist (b), which might indicate that some of the extra instrumentation points are actually relevant, or might just be random noise.

Reviewers: kcc, morehouse, vitalybuka

Reviewed By: morehouse, vitalybuka

Subscribers: pratyai, vitalybuka, eternalsakura, xwlin222, dende, srhines, kubamracek, #sanitizers, lebedev.ri, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #sanitizers, #llvm

Differential Revision: https://reviews.llvm.org/D63616
2020-04-10 10:44:03 -07:00
Mircea Trofin f62335b534 [llvm][NFC] Style fixes in Inliner.cpp
Summary:
Function names: camel case, lower case first letter.
Variable names: start with upper letter. For iterators that were 'i',
renamed with a descriptive name, as 'I' is 'Instruction&'.

Lambda captures simplification.

Opportunistic boolean return simplification.

Reviewers: davidxl, dblaikie

Subscribers: eraman, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77837
2020-04-10 08:04:39 -07:00
Ilya Leoshkevich 3bc439bdff [MSan] Add instrumentation for SystemZ
Summary:
This patch establishes memory layout and adds instrumentation. It does
not add runtime support and does not enable MSan, which will be done
separately.

Memory layout is based on PPC64, with the exception that XorMask
is not used - low and high memory addresses are chosen in a way that
applying AndMask to low and high memory produces non-overlapping
results.

VarArgHelper is based on AMD64. It might be tempting to share some
code between the two implementations, but we need to keep in mind that
all the ABI similarities are coincidental, and therefore any such
sharing might backfire.

copyRegSaveArea() indiscriminately copies the entire register save area
shadow, however, fragments thereof not filled by the corresponding
visitCallSite() invocation contain irrelevant data. Whether or not this
can lead to practical problems is unclear, hence a simple TODO comment.
Note that the behavior of the related copyOverflowArea() is correct: it
copies only the vararg-related fragment of the overflow area shadow.

VarArgHelper test is based on the AArch64 one.

s390x ABI requires that arguments are zero-extended to 64 bits. This is
particularly important for __msan_maybe_warning_*() and
__msan_maybe_store_origin_*() shadow and origin arguments, since non
zeroed upper parts thereof confuse these functions. Therefore, add ZExt
attribute to the corresponding parameters.

Add ZExt attribute checks to msan-basic.ll. Since with
-msan-instrumentation-with-call-threshold=0 instrumentation looks quite
different, introduce the new CHECK-CALLS check prefix.

Reviewers: eugenis, vitalybuka, uweigand, jonpa

Reviewed By: eugenis

Subscribers: kristof.beyls, hiraditya, danielkiss, llvm-commits, stefansf, Andreas-Krebbel

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76624
2020-04-10 16:53:49 +02:00
Christopher Tetreault 3bebf02861 Clean up usages of asserting vector getters in Type
Summary:
Remove usages of asserting vector getters in Type in preparation for the
VectorType refactor. The existence of these functions complicates the
refactor while adding little value.

Reviewers: sdesmalen, rriddle, efriedma

Reviewed By: sdesmalen

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77262
2020-04-10 07:47:19 -07:00
Florian Hahn 1a02aaeaa4 [SCCP] Use SimplifyBinOp for non-integer constant/expressions & overdef.
For non-integer constants/expressions and overdefined, I think we can
just use SimplifyBinOp to do common folds. By just passing a context
with the DL, SimplifyBinOp should not try to get additional information
from looking at definitions.

For overdefined values, it should be enough to just pass the original
operand.

Note: The comment before the `if (isconstant(V1State)...` was wrong
originally: isConstant() also matches integer ranges with a single
element. It is correct now.

Reviewers: efriedma, davide, mssimpso, aartbik

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D76459
2020-04-10 11:02:57 +01:00
John McCall 8423a6f363 Rename OptimalLayout to OptimizedStructLayout at Chris's request. 2020-04-10 00:14:20 -04:00
Max Kazantsev 4e87823026 [LoopLoadElim] Fix crash by always checking simplify form
Loop simplify form should always be checked because logic of
propagateStoredValueToLoadUsers relies on it (in particular, it
requires preheader).

Reviewed By: Fedor Sergeev, Florian Hahn
Differential Revision: https://reviews.llvm.org/D77775
2020-04-10 09:23:28 +07:00
Mircea Trofin 655aa1ae4a [llvm][NFC] Replace CallSite with CallBase in Inliner
Summary:
*Almost* all uses are replaced. Left FIXMEs for the two sites that
require refactoring outside of Inliner, to scope this patch.

Subscribers: eraman, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77817
2020-04-09 15:01:58 -07:00
Christopher Tetreault 19cc9b9ded Clean up usages of asserting vector getters in Type
Summary:
Remove usages of asserting vector getters in Type in preparation for the
VectorType refactor. The existence of these functions complicates the
refactor while adding little value.

Reviewers: efriedma, sdesmalen, rriddle

Reviewed By: sdesmalen

Subscribers: hiraditya, dantrushin, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77261
2020-04-09 14:59:14 -07:00
Christopher Tetreault 00a1032412 Clean up usages of asserting vector getters in Type
Summary:
Remove usages of asserting vector getters in Type in preparation for the
VectorType refactor. The existence of these functions complicates the
refactor while adding little value.

Reviewers: rriddle, sdesmalen, efriedma

Reviewed By: sdesmalen

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77260
2020-04-09 13:35:41 -07:00
Zequan Wu eccfa35d53 Fix lifetime call in landingpad blocking Simplifycfg pass
Fix lifetime call in landingpad blocks simplifycfg from removing the
landingpad.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D77188
2020-04-09 13:07:32 -07:00
Gil Rapaport e2a1867880 [LV] Add VPValue operands to VPBlendRecipe (NFCI)
InnerLoopVectorizer's code called during VPlan execution still relies on
original IR's def-use relations to decide which vector code to generate,
limiting VPlan transformations ability to modify def-use relations and still
have ILV generate the vector code.
This commit introduces VPValues for VPBlendRecipe to use as the values to
blend. The recipe is generated with VPValues wrapping the phi's incoming values
of the scalar phi. This reduces ingredient def-use usage by ILV as a step
towards full VPlan-based def-use relations.

Differential Revision: https://reviews.llvm.org/D77539
2020-04-09 18:48:33 +03:00
Ayal Zaks 1678489234 [LV] FoldTail w/o Primary Induction
Introduce a new VPWidenCanonicalIVRecipe to generate a canonical vector
induction for use in fold-tail-with-masking, if a primary induction is absent.

The canonical scalar IV having start = 0 and step = VF*UF, created during code
-gen to control the vector loop, is widened into a canonical vector IV having
start = {<Part*VF, Part*VF+1, ..., Part*VF+VF-1> for 0 <= Part < UF} and
step = <VF*UF, VF*UF, ..., VF*UF>.

Differential Revision: https://reviews.llvm.org/D77635
2020-04-09 17:45:23 +03:00
Sanjay Patel 812970edda [InstCombine] replace undef in vector constant for safe shift transform (PR45447)
As noted in PR45447, we have a vector-constant-with-undef-element transform bug:
https://bugs.llvm.org/show_bug.cgi?id=45447

We replace undefs with a safe constant (0 or -1) based on the (non-)negative
predicate constraint.

So this is correct:
http://volta.cs.utah.edu:8080/z/WZE36H
...but this is not:
http://volta.cs.utah.edu:8080/z/boj8gJ

Previously, we were relying on getSafeVectorConstantForBinop() in the related fold (D76800).
But that's making an assumption about what qualifies as "safe", and that assumption may
not always hold.

Differential Revision: https://reviews.llvm.org/D77739
2020-04-09 08:00:46 -04:00
Anton Bikineev 9e1ccec8d5 tsan: don't instrument __attribute__((naked)) functions
Naked functions are required to not have compiler generated
prologues/epilogues, hence no instrumentation is needed for them.

Bugzilla: https://bugs.llvm.org/show_bug.cgi?id=45400

Differential Revision: https://reviews.llvm.org/D77477
2020-04-09 13:47:47 +02:00
Florian Hahn a7efe06af0 [LV] Assert no DbgInfoIntrinsic calls are passed to widening (NFC).
When building a VPlan, BasicBlock::instructionsWithoutDebug() is used to
iterate over the instructions in a block. This means that no recipes
should be created for debug info intrinsics already and we can turn the
early exit into an assertion.

Reviewers: Ayal, gilr, rengolin, aprantl

Reviewed By: aprantl

Differential Revision: https://reviews.llvm.org/D77636
2020-04-09 11:37:32 +01:00
Florian Hahn 9997ee23ed [VPlan] Add & use VPValue operands for VPWidenCallRecipe (NFC).
This patch adds VPValue versions for the arguments of the call to
VPWidenCallRecipe and uses them during code-generation.

Similar to D76373 this reduces ingredient def-use usage by ILV as
a step towards full VPlan-based def-use relations.

Reviewers: Ayal, gilr, rengolin

Reviewed By: gilr

Differential Revision: https://reviews.llvm.org/D77655
2020-04-09 10:23:26 +01:00
Jay Foad c63aed890e [KnownBits] Move AND, OR and XOR logic into KnownBits
Summary:
There are at least three clients for KnownBits calculations:
ValueTracking, SelectionDAG and GlobalISel. To reduce duplication the
common logic should be moved out of these clients and into KnownBits
itself.

This patch does this for AND, OR and XOR calculations by implementing
and using appropriate operator overloads KnownBits::operator& etc.

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D74060
2020-04-09 10:10:37 +01:00
Serge Pavlov c7ff5b38f2 [FPEnv] Use single enum to represent rounding mode
Now compiler defines 5 sets of constants to represent rounding mode.
These are:

1. `llvm::APFloatBase::roundingMode`. It specifies all 5 rounding modes
defined by IEEE-754 and is used in `APFloat` implementation.

2. `clang::LangOptions::FPRoundingModeKind`. It specifies 4 of 5 IEEE-754
rounding modes and a special value for dynamic rounding mode. It is used
in clang frontend.

3. `llvm::fp::RoundingMode`. Defines the same values as
`clang::LangOptions::FPRoundingModeKind` but in different order. It is
used to specify rounding mode in in IR and functions that operate IR.

4. Rounding mode representation used by `FLT_ROUNDS` (C11, 5.2.4.2.2p7).
Besides constants for rounding mode it also uses a special value to
indicate error. It is convenient to use in intrinsic functions, as it
represents platform-independent representation for rounding mode. In this
role it is used in some pending patches.

5. Values like `FE_DOWNWARD` and other, which specify rounding mode in
library calls `fesetround` and `fegetround`. Often they represent bits
of some control register, so they are target-dependent. The same names
(not values) and a special name `FE_DYNAMIC` are used in
`#pragma STDC FENV_ROUND`.

The first 4 sets of constants are target independent and could have the
same numerical representation. It would simplify conversion between the
representations. Also now `clang::LangOptions::FPRoundingModeKind` and
`llvm::fp::RoundingMode` do not contain the value for IEEE-754 rounding
direction `roundTiesToAway`, although it is supported natively on
some targets.

This change defines all the rounding mode type via one `llvm::RoundingMode`,
which also contains rounding mode for IEEE rounding direction `roundTiesToAway`.

Differential Revision: https://reviews.llvm.org/D77379
2020-04-09 13:26:47 +07:00
Pratyai Mazumder e8d1c6529b [SanitizerCoverage] sancov/inline-bool-flag instrumentation.
Summary:
New SanitizerCoverage feature `inline-bool-flag` which inserts an
atomic store of `1` to a boolean (which is an 8bit integer in
practice) flag on every instrumented edge.

Implementation-wise it's very similar to `inline-8bit-counters`
features. So, much of wiring and test just follows the same pattern.

Reviewers: kcc, vitalybuka

Reviewed By: vitalybuka

Subscribers: llvm-commits, hiraditya, jfb, cfe-commits, #sanitizers

Tags: #clang, #sanitizers, #llvm

Differential Revision: https://reviews.llvm.org/D77244
2020-04-08 22:43:52 -07:00
Vitaly Buka 8b1a6c0a57 [NFC][SanitizerCoverage] Simplify alignment calculation
This reverts commit e42f2a0cd8b8007c816d0e63f5000c444e29105e.
2020-04-08 22:43:52 -07:00
Johannes Doerfert cb0ecc5c33 [CallGraphUpdater] Remove dead constants before replacing a function
Dead constants might be left when a function is replaced, we can
gracefully handle this case and avoid complexity for the users who would
see an assertion otherwise.
2020-04-08 22:52:46 -05:00
Craig Topper f3d3cec648 [InstCombine] Avoid a call to deprecated version of CreateCall.
Passing a Value * to CreateCall has to call getPointerElementType
to find the type of the pointer.

In this case we can rely on the fact that Intrinsic::getDeclaration
returns a Function * and use that version of CreateCall.
2020-04-08 17:41:16 -07:00
Johannes Doerfert 0985554b70 [Attributor][NFC] Split AbstractAttributes out of Attributor.cpp
Attributor.cpp became quite big and we need to start provide structure.
The Attributor code is now in Attributor.cpp and the classes derived
from AbstractAttribute are in AttributorAttributes.cpp. Minor changes
were required but no intended functional changes.

We also minimized includes as part of this.

Reviewed By: baziotis

Differential Revision: https://reviews.llvm.org/D76873
2020-04-08 19:02:14 -05:00
Christopher Tetreault 155740cc33 Clean up usages of asserting vector getters in Type
Summary:
Remove usages of asserting vector getters in Type in preparation for the
VectorType refactor. The existence of these functions complicates the
refactor while adding little value.

Reviewers: sdesmalen, rriddle, efriedma

Reviewed By: sdesmalen

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77263
2020-04-08 15:15:41 -07:00
Florian Hahn bbbec71609 [DSE.MSSA] Only use callCapturesBefore for calls.
callCapturesBefore always returns ModRef , if UseInst isn't a call. As
we only call it if we already know Mod is set, this only destroys the
Must bit for non-calls.
2020-04-08 15:12:33 +01:00
Florian Hahn a6353fdf3b [DSE,MSSA] Hoist getMemoryAccess call (NFC). 2020-04-08 15:10:05 +01:00
Sanjay Patel a1c05fe20f [InstCombine] exclude bitcast of ppc_fp128 in icmp signbit fold
Based on the post-commit comments for rG0f56bbc, there might
be a problem with this transform:

(bitcast (fpext/fptrunc X)) to iX) < 0 --> (bitcast X to iY) < 0

...and the ppc_fp128 data type, so conservatively bypass if we
are bitcasting a ppc_fp128.

We might be able to account for endian or other differences to
enable this for PowerPC again if that is useful.

Differential Revision: https://reviews.llvm.org/D77642
2020-04-08 08:56:19 -04:00
Max Kazantsev 7adb9e06fd [LoopLoadElim] Add test showing that LoopLoadElim doesn't work correctly with new PM 2020-04-08 17:32:03 +07:00
Kazu Hirata 91eb442fde [JumpThreading] NFC: Simplify ComputeValueKnownInPredecessorsImpl
Summary:
ComputeValueKnownInPredecessorsImpl is the main folding mechanism in
JumpThreading.cpp.  To avoid potential infinite recursion while
chasing use-def chains, it uses:

  DenseSet<std::pair<Value *, BasicBlock *>> &RecursionSet

to keep track of Value-BB pairs that we've processed.

Now, when ComputeValueKnownInPredecessorsImpl recursively calls
itself, it always passes BB as is, so the second element is always BB.

This patch simplifes the function by dropping "BasicBlock *" from
RecursionSet.

Reviewers: wmi, efriedma

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77699
2020-04-07 18:37:36 -07:00
Eli Friedman 565b56a72c [NFC] Clean up uses of LoadInst constructor. 2020-04-07 16:28:53 -07:00
Daniel Sanders 1adeeabb79 Add MIR-level debugify with only locations support for now
Summary:
Re-used the IR-level debugify for the most part. The MIR-level code then
adds locations to the MachineInstrs afterwards based on the LLVM-IR debug
info.

It's worth mentioning that the resulting locations make little sense as
the range of line numbers used in a Function at the MIR level exceeds that
of the equivelent IR level function. As such, MachineInstrs can appear to
originate from outside the subprogram scope (and from other subprogram
scopes). However, it doesn't seem worth worrying about as the source is
imaginary anyway.

There's a few high level goals this pass works towards:
* We should be able to debugify our .ll/.mir in the lit tests without
  changing the checks and still pass them. I.e. Debug info should not change
  codegen. Combining this with a strip-debug pass should enable this. The
  main issue I ran into without the strip-debug pass was instructions with MMO's and
  checks on both the instruction and the MMO as the debug-location is
  between them. I currently have a simple hack in the MIRPrinter to
  resolve that but the more general solution is a proper strip-debug pass.
* We should be able to test that GlobalISel does not lose debug info. I
  recently found that the legalizer can be unexpectedly lossy in seemingly
  simple cases (e.g. expanding one instr into many). I have a verifier
  (will be posted separately) that can be integrated with passes that use
  the observer interface and will catch location loss (it does not verify
  correctness, just that there's zero lossage). It is a little conservative
  as the line-0 locations that arise from conflicts do not track the
  conflicting locations but it can still catch a fair bit.

Depends on D77439, D77438

Reviewers: aprantl, bogner, vsk

Subscribers: mgorny, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77446
2020-04-07 16:25:13 -07:00
Fangrui Song d2ef8c1f2c [ThinLTO] Drop dso_local if a GlobalVariable satisfies isDeclarationForLinker()
dso_local leads to direct access even if the definition is not within this compilation unit (it is
still in the same linkage unit). On ELF, such a relocation (e.g. R_X86_64_PC32) referencing a
STB_GLOBAL STV_DEFAULT object can cause a linker error in a -shared link.

If the linkage is changed to available_externally, the dso_local flag should be dropped, so that no
direct access will be generated.

The current behavior is benign, because -fpic does not assume dso_local
(clang/lib/CodeGen/CodeGenModule.cpp:shouldAssumeDSOLocal).
If we do that for -fno-semantic-interposition (D73865), there will be an
R_X86_64_PC32 linker error without this patch.

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D74751
2020-04-07 15:46:01 -07:00
Florian Hahn 6aabb109be [SCCP] Use ranges for predicate info conditions.
This patch updates the code that deals with conditions from predicate
info to make use of constant ranges.

For ssa_copy instructions inserted by PredicateInfo, we have 2 ranges:
1. The range of the original value.
2. The range imposed by the linked condition.

1. is known, 2. can be determined using makeAllowedICmpRegion. The
intersection of those ranges is the range for the copy.

With this patch, we get a nice increase in the number of instructions
eliminated by both SCCP and IPSCCP for some benchmarks:

For MultiSource, SPEC2000 & SPEC2006:

Tests: 237
Same hash: 170 (filtered out)
Remaining: 67
Metric: sccp.NumInstRemoved
Program                                        base    patch   diff
 test-suite...Source/Benchmarks/sim/sim.test    10.00   71.00  610.0%
 test-suite...CFP2000/177.mesa/177.mesa.test   361.00  1626.00 350.4%
 test-suite...encode/alacconvert-encode.test   141.00  602.00  327.0%
 test-suite...decode/alacconvert-decode.test   141.00  602.00  327.0%
 test-suite...CI_Purple/SMG2000/smg2000.test   1639.00 4093.00 149.7%
 test-suite...peg2/mpeg2dec/mpeg2decode.test    75.00  163.00  117.3%
 test-suite...T2006/401.bzip2/401.bzip2.test   358.00  513.00  43.3%
 test-suite...rks/FreeBench/pifft/pifft.test    11.00   15.00  36.4%
 test-suite...langs-C/unix-tbl/unix-tbl.test     4.00    5.00  25.0%
 test-suite...lications/sqlite3/sqlite3.test   541.00  667.00  23.3%
 test-suite.../CINT2000/254.gap/254.gap.test   243.00  299.00  23.0%
 test-suite...ks/Prolangs-C/agrep/agrep.test    25.00   29.00  16.0%
 test-suite...marks/7zip/7zip-benchmark.test   1135.00 1304.00 14.9%
 test-suite...lications/ClamAV/clamscan.test   1105.00 1268.00 14.8%
 test-suite...urce/Applications/lua/lua.test   398.00  436.00   9.5%

Metric: sccp.IPNumInstRemoved
Program                                        base   patch   diff
 test-suite...C/CFP2000/179.art/179.art.test     1.00   3.00  200.0%
 test-suite...006/447.dealII/447.dealII.test   429.00 1056.00 146.2%
 test-suite...nch/fourinarow/fourinarow.test     3.00   7.00  133.3%
 test-suite...CI_Purple/SMG2000/smg2000.test   818.00 1748.00 113.7%
 test-suite...ks/McCat/04-bisect/bisect.test     3.00   5.00  66.7%
 test-suite...CFP2000/177.mesa/177.mesa.test   165.00 255.00  54.5%
 test-suite...ediabench/gsm/toast/toast.test    18.00  27.00  50.0%
 test-suite...telecomm-gsm/telecomm-gsm.test    18.00  27.00  50.0%
 test-suite...ks/Prolangs-C/agrep/agrep.test    24.00  35.00  45.8%
 test-suite...TimberWolfMC/timberwolfmc.test    43.00  62.00  44.2%
 test-suite...encode/alacconvert-encode.test    46.00  66.00  43.5%
 test-suite...decode/alacconvert-decode.test    46.00  66.00  43.5%
 test-suite...langs-C/unix-tbl/unix-tbl.test    12.00  17.00  41.7%
 test-suite...peg2/mpeg2dec/mpeg2decode.test    31.00  41.00  32.3%
 test-suite.../CINT2000/254.gap/254.gap.test   117.00 154.00  31.6%

Reviewers: efriedma, davide

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D76611
2020-04-07 11:09:18 +01:00
Jun Ma 46bff786bc [Coroutines] Remove alignment check in shouldBeMustTail
Differential Revision: https://reviews.llvm.org/D77362
2020-04-07 09:07:34 +08:00
Eli Friedman 3f13ee8a00 [NFC] Modernize misc. uses of Align/MaybeAlign APIs.
Use the current getAlign() APIs where it makes sense, and use Align
instead of MaybeAlign when we know the value is non-zero.
2020-04-06 17:53:04 -07:00
Eli Friedman 68b03aee1a Remove SequentialType from the type heirarchy.
Now that we have scalable vectors, there's a distinction that isn't
getting captured in the original SequentialType: some vectors don't have
a known element count, so counting the number of elements doesn't make
sense.

In some cases, there's a better way to express the commonality using
other methods. If we're dealing with GEPs, there's GEP methods; if we're
dealing with a ConstantDataSequential, we can query its element type
directly.

In the relatively few remaining cases, I just decided to write out
the type checks. We're talking about relatively few places, and I think
the abstraction doesn't really carry its weight. (See thread "[RFC]
Refactor class hierarchy of VectorType in the IR" on llvmdev.)

Differential Revision: https://reviews.llvm.org/D75661
2020-04-06 17:03:49 -07:00
Vedant Kumar 5f185a8999 [AddressSanitizer] Fix for wrong argument values appearing in backtraces
Summary:
In some cases, ASan may insert instrumentation before function arguments
have been stored into their allocas. This causes two issues:

1) The argument value must be spilled until it can be stored into the
   reserved alloca, wasting a stack slot.

2) Until the store occurs in a later basic block, the debug location
   will point to the wrong frame offset, and backtraces will show an
   uninitialized value.

The proposed solution is to move instructions which initialize allocas
for arguments up into the entry block, before the position where ASan
starts inserting its instrumentation.

For the motivating test case, before the patch we see:

```
 | 0033: movq %rdi, 0x68(%rbx)  |   | DW_TAG_formal_parameter     |
 | ...                          |   |   DW_AT_name ("a")          |
 | 00d1: movq 0x68(%rbx), %rsi  |   |   DW_AT_location (RBX+0x90) |
 | 00d5: movq %rsi, 0x90(%rbx)  |   |       ^ not correct ...     |
```

and after the patch we see:

```
 | 002f: movq %rdi, 0x70(%rbx)  |   | DW_TAG_formal_parameter     |
 |                              |   |   DW_AT_name ("a")          |
 |                              |   |   DW_AT_location (RBX+0x70) |
```

rdar://61122691

Reviewers: aprantl, eugenis

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77182
2020-04-06 15:59:25 -07:00
Daniel Sanders 15f7bc7857 Add option to limit Debugify to locations (omitting variables)
Summary:
It can be helpful to test behaviour w.r.t locations without having DEBUG_VALUE
around. In particular, because DEBUG_VALUE has the potential to change CodeGen
behaviour (e.g. hasOneUse() vs hasOneNonDbgUse()) while locations generally
don't.

Reviewers: aprantl, bogner

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77438
2020-04-06 15:04:55 -07:00
Kirill Naumov 3f995ce8b5 [CFGPrinter][CallPrinter][polly] Adding distinct structure for CFGDOTInfo
The patch introduces the system to distinctively store the information
needed for the Control Flow Graph as well as the instrumentary needed for
the follow-up changes: BlockFrequencyInfo and BranchProbabilityInfo.
The patch is a part of sequence of three patches, related to graphs Heat Coloring.

Reviewers: rcorcs, apilipenko, davidxl, sfertile, fedor.sergeev, eraman, bollu

Differential Revision: https://reviews.llvm.org/D76820
2020-04-06 17:42:54 +00:00
Florian Hahn 7aba6a0333 [LV] Fix value that could be read uninitialized.
This should fix
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-bootstrap-msan/builds/18569
2020-04-06 17:54:50 +01:00
Florian Hahn 90be3c24a7 [VPlan] Introduce new VPWidenCallRecipe (NFC).
This patch moves calls to their own recipe, to simplify the transition
to VPUser for operands of VPWidenRecipe, as discussed in D76992.

Subsequently additional information can be added to the recipe rather
than computing it during the execute step.

Reviewers: rengolin, Ayal, gilr, hsaito

Reviewed By: gilr

Differential Revision: https://reviews.llvm.org/D77467
2020-04-06 16:07:37 +01:00
Guillaume Chatelet 808286342a [Alignment][NFC] Assume AlignmentFromAssumptions::getNewAlignment is always set.
Summary:
In D77454 we explain that `LoadInst` and `StoreInst` always have their alignment defined.
This allows to work backward here and to infer that `getNewAlignment` does not need to return `0` in case of failure.
Returning `1` also works since it needs to be greater than the Load/Store alignment which is a least `1`.

This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77538
2020-04-06 14:54:57 +00:00
Florian Hahn 6babae74c7 [Matrix] Update load/storeMatrix to take indices as Value* (NFC).
This allows using the functions to be used with loop dependent indices.
2020-04-06 14:48:48 +01:00
Guillaume Chatelet ff858d7781 [Alignment][NFC] Add DebugStr and operator*
Summary:
This is a roll forward of D77394 minus AlignmentFromAssumptions (which needs to be addressed separately)
Differences from D77394:
 - DebugStr() now prints the alignment value or `None` and no more `Align(x)` or `MaybeAlign(x)`
   - This is to keep Warning message consistent (CodeGen/SystemZ/alloca-04.ll)
 - Removed a few unneeded headers from Alignment (since it's included everywhere it's better to keep the dependencies to a minimum)

Reviewers: courbet

Subscribers: sdardis, hiraditya, jrtc27, atanasyan, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77537
2020-04-06 12:09:45 +00:00
Florian Hahn 39f2d9aa81 [Matrix] Add option to use row-major matrix layout as default.
This patch adds a -matrix-default-layout option which can be used to
set the default matrix layout to row-major or column-major (default).

The initial patch updates codegen for loads, stores, binary operators
and matrix multiply.

Reviewers: anemet, Gerolf, andrew.w.kaylor, LuoYuanke

Reviewed By: anemet

Differential Revision: https://reviews.llvm.org/D76325
2020-04-06 10:00:56 +01:00
Florian Hahn d1fed7081d [Matrix] Add initial tiling for load/multiply/store chains.
This patch adds initial fusion for load/multiply/store chains of matrix
operations.

The patch contains roughly two parts:

1. Code generation for a fused load/multiply/store chain (LowerMatrixMultiplyFused).
First, we ensure that both loads of the multiply operands do not alias the store.
If they do, we create new non-aliasing copies of the operands. Note that this
may introduce new basic block. Finally we process TileSize x TileSize blocks.
That is: load tiles from the input operands, multiply and store them.

2. Identify fusion candidates & matrix instructions.
As a first step, collect all instructions with shape info and fusion candidates
(currently @llvm.matrix.multiply calls). Next, try to fuse candidates and
collect instructions eliminated by fusion. Finally iterate over all matrix
instructions, skip the ones eliminated by fusion and lower the rest as usual.

Reviewers: anemet, Gerolf, hfinkel, andrew.w.kaylor, LuoYuanke

Reviewed By: anemet

Differential Revision: https://reviews.llvm.org/D75566
2020-04-06 09:28:15 +01:00
Guillaume Chatelet 6000478f39 Revert "[Alignment][NFC] Add DebugStr and operator*"
This reverts commit 1e34ab98fc.
2020-04-06 07:55:25 +00:00
Guillaume Chatelet 1e34ab98fc [Alignment][NFC] Add DebugStr and operator*
Summary:
Also updates files to use them.

This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: sdardis, hiraditya, jrtc27, atanasyan, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77394
2020-04-06 07:12:46 +00:00
Tarindu Jayatilaka b43b59fcc0 Expose `attributor-disable` to the new and old pass managers
The new and old pass managers (PassManagerBuilder.cpp and
PassBuilder.cpp) are exposed to an `extern` declaration of
`attributor-disable` option which will guard the addition of the
attributor passes to the pass pipelines.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D76871
2020-04-05 22:29:34 -05:00
Anna Thomas 1d0f757904 [InlineFunction] Update metadata on loads that are return values
This patch builds upon D76140 by updating metadata on pointer typed
loads in inlined functions, when the load is the return value, and the
callsite contains return attributes which can be updated as metadata on
the load.
Added test cases show this for nonnull, dereferenceable,
dereferenceable_or_null

Reviewed-By: jdoerfert

Differential Revision: https://reviews.llvm.org/D76792
2020-04-05 14:50:10 -04:00
Sanjay Patel 538a8f0227 [InstCombine] convert bitcast-shuffle to vector trunc
As discussed in D76983, that patch can turn a chain of insert/extract
with scalar trunc ops into bitcast+extract and existing instcombine
vector transforms end up creating a shuffle out of that (see the
PhaseOrdering test for an example). Currently, that process requires
at least this sequence: -instcombine -early-cse -instcombine.

Before D76983, the sequence of insert/extract would reach the SLP
vectorizer and become a vector trunc there.

Based on a small sampling of public targets/types, converting the
shuffle to a trunc is better for codegen in most cases (and a
regression of that form is the reason this was noticed). The trunc is
clearly better for IR-level analysis as well.

This means that we can induce "spontaneous vectorization" without
invoking any explicit vectorizer passes (at least a vector cast op
may be created out of scalar casts), but that seems to be the right
choice given that we started with a chain of insert/extract, and the
backend would expand back to that chain if a target does not support
the op.

Differential Revision: https://reviews.llvm.org/D77299
2020-04-05 09:48:02 -04:00
Sanjay Patel 4036a0af24 [InstCombine] enhance freelyNegateValue() by handling 'not'
This patch extends D77230. If we have a 'not' instruction inside a
negated expression, we can ignore extra uses of that op because the
negation has a one-to-one replacement: negate becomes increment.

Alive2 examples of the test cases:
http://volta.cs.utah.edu:8080/z/T5-u9P
http://volta.cs.utah.edu:8080/z/eT89L6

Differential Revision: https://reviews.llvm.org/D77459
2020-04-05 09:16:19 -04:00
Stefanos Baziotis f3dd3a66d3 [Attributor] AAUndefinedBehavior: Use AAValueSimplify in memory accessing instructions.
Query AAValueSimplify on pointers in memory accessing instructions to take
advantage of the constant propagation (or any other value simplification) of such values.
2020-04-05 02:46:26 +03:00
Florian Hahn a2b18c5a08 [LV] Simplify tryToWiden as recipes are not re-used (NFC).
After 49d00824bb, VPWidenRecipe only stores a single instruction.
tryToWiden can simply return the widen recipe, like other helpers in
VPRecipeBuilder.
2020-04-04 18:30:50 +01:00
Nikita Popov 4ede730096 [InstCombine] Don't limit uses in eraseInstFromFunction()
eraseInstFromFunction() adds the operands of the erased instructions,
as those might now be dead as well. However, this is limited to
instructions with less than 8 operands.

This check doesn't make a lot of sense to me. As the instruction
gets removed afterwards, I don't see a potential for anything
overly pathological happening here (as we can only add those
operands to the worklist once). The impact on CTMark is in
the noise. We also have the same code in instruction sinking
and don't limit the operand count there.

Differential Revision: https://reviews.llvm.org/D77325
2020-04-04 18:37:30 +02:00
Luofan Chen eec6d87626 [Attributor] Deduce attributes for non-exact functions
This patch is based on D63312 and D63319. For now we create shallow wrappers for all functions that are IPO amendable.
See also [this github issue](https://github.com/llvm/llvm-project/issues/172).

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D76404
2020-04-04 11:34:58 -05:00
Nikita Popov 6896d559f3 [VNCoercion] Use IRBuilderBase; NFC
And remove include from header.
2020-04-04 12:44:50 +02:00
Nikita Popov ebd5a1b049 [Reassociate] Use IRBuilderBase; NFC
And remove now unnecessary IRBuilder.h include in header.
2020-04-04 12:34:16 +02:00
Nikita Popov 1055e9e3c8 [IVDescriptors] Remove IRBuilder.h include; NFC
IVDescriptors.h itself does not reference IRBuilder at all.
Move the include into transformation passes that do.
2020-04-04 12:07:57 +02:00
Sanjay Patel ce97ce3a5d [VectorCombine] try to form a better extractelement
Extracting to the same index that we are going to insert back into
allows forming select ("blend") shuffles and enables further transforms.

Admittedly, this is a quick-fix for a more general problem that I'm
hoping to solve by adding transforms for patterns that start with an
insertelement.

But this might resolve some regressions known to be caused by the
extract-extract transform (although I have not gotten more details on
those yet).

In the motivating case from PR34724:
https://bugs.llvm.org/show_bug.cgi?id=34724

The combination of subsequent instcombine and codegen transforms gets us this improvement:

  vmovshdup	%xmm0, %xmm2    ## xmm2 = xmm0[1,1,3,3]
  vhaddps	%xmm1, %xmm1, %xmm4
  vmovshdup	%xmm1, %xmm3    ## xmm3 = xmm1[1,1,3,3]
  vaddps	%xmm0, %xmm2, %xmm0
  vaddps	%xmm1, %xmm3, %xmm1
  vshufps	$200, %xmm4, %xmm0, %xmm0 ## xmm0 = xmm0[0,2],xmm4[0,3]
  vinsertps	$177, %xmm1, %xmm0, %xmm0 ## xmm0 = zero,xmm0[1,2],xmm1[2]

  -->

  vmovshdup	%xmm0, %xmm2    ## xmm2 = xmm0[1,1,3,3]
  vhaddps	%xmm1, %xmm1, %xmm1
  vaddps	%xmm0, %xmm2, %xmm0
  vshufps	$200, %xmm1, %xmm0, %xmm0 ## xmm0 = xmm0[0,2],xmm1[0,3]

Differential Revision: https://reviews.llvm.org/D76623
2020-04-03 13:55:13 -04:00
Roman Lebedev 7d572ef2dd
Revert "[SCEV] rewriteLoopExitValues(): even if have hard uses, still rewrite if cheap (PR44668)"
As discussed in post-commit review in https://reviews.llvm.org/D73501
if the goal of this is to help vectorizer, then we should actually
be teaching vectorizer to do this, because right now this rewrite
is still budget-limited, which isn't what we'd want.

Additionally, while the rest of the patch series was universally profitable,
this particular patch is reportedly (https://reviews.llvm.org/D73501#1905171)
exposing cost-modeling issues on ARM.

So let's just back this particular patch out. Once there's an undo transform,
this could be considered for reintegration.

This reverts commit 44edc6fd2c.
2020-04-03 20:15:04 +03:00
Matt Arsenault 57a55313c3 InstCombine: Reduce minnum/maxnum if inputs are casted 2020-04-03 11:57:25 -04:00
Guillaume Chatelet 1a584a8d50 [Alignment][NFC] Remove unused private functions
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77297
2020-04-03 09:16:20 +00:00
OCHyams 9b56cc9361 [DebugInfo] Salvage debug info when sinking loop invariant instructions
Reviewed By: vsk, aprantl, djtodoro

Differential Revision: https://reviews.llvm.org/D77318
2020-04-03 09:19:26 +01:00
Hongtao Yu 88da019977 Fix a bug in the inliner that causes subsequent double inlining
Summary:
A recent change in the instruction simplifier enables a call to a function that just returns one of its parameter to be simplified as simply loading the parameter. This exposes a bug in the inliner where double inlining may be involved which in turn may cause compiler ICE when an already-inlined callsite is reused for further inlining.
To put it simply, in the following-like C program, when the function call second(t) is inlined, its code t = third(t) will be reduced to just loading the return value of the callsite first(). This causes the inliner internal data structure to register the first() callsite for the call edge representing the third() call, therefore incurs a double inlining when both call edges are considered an inline candidate. I'm making a fix to break the inliner from reusing a callsite for new call edges.

```
void top()
{
    int t = first();
    second(t);
}

void second(int t)
{
   t = third(t);
   fourth(t);
}

void third(int t)
{
   return t;
}
```
The actual failing case is much trickier than the example here and is only reproducible with the legacy inliner. The way the legacy inliner works is to process each SCC in a bottom-up order. That means in reality function first may be already inlined into top, or function third is either inlined to second or is folded into nothing. To repro the failure seen from building a large application, we need to figure out a way to confuse the inliner so that the bottom-up inlining is not fulfilled. I'm doing this by making the second call indirect so that the alias analyzer fails to figure out the right call graph edge from top to second and top can be processed before second during the bottom-up.  We also need to tweak the test code so that when the inlining of top happens, the function body of second is not that optimized, by delaying the pass of function attribute deducer (i.e, which tells function third has no side effect and just returns its parameter). Since the CGSCC pass is iterative, additional calls are added to top to postpone the inlining of second to the second round right after the first function attribute deducing pass is done. I haven't been able to repro the failure with the new pass manager since the processing order of ininlined callsites is a bit different, but in theory the issue could happen there too.

Note that this fix could introduce a side effect that blocks the simplification of inlined code, specifically for a call site that can be folded to another call site. I hope this can probably be complemented by subsequent inlining or folding, as shown in the attached unit test. The ideal fix should be to separate the use of VMap. However, in reality this failing pattern shouldn't happen often. And even if it happens, there should be a good chance that the non-folded call site will be refolded by iterative inlining or subsequent simplification.

Reviewers: wenlei, davidxl, tejohnson

Reviewed By: wenlei, davidxl

Subscribers: eraman, nikic, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76248
2020-04-02 21:08:05 -07:00
Jun Ma 9c6f32a0ff [Coroutines] Simplify implementation using removePredecessor
Differential Revision: https://reviews.llvm.org/D77035
2020-04-03 09:20:07 +08:00
Anna Thomas bf7a16a768 [InlineFunction] Update valid return attributes at callsite within callee body
Consider a callee function that has a call (C) within it which feeds
into the return. When we inline that callee into a callsite that has
return attributes, we can backward propagate valid attributes to the
call (C) within that inlined callee body.

This is safe to do so only if we can guarantee transfer of execution to
successor in the window of instructions between return value (i.e. the
call C) and the return instruction.

Also, this is valid only for attributes which are a property of a
callsite and not those that are not dependent on the ABI, or a property
of the call itself.

Reviewed-By: reames, jdoerfert

Differential Revision: https://reviews.llvm.org/D76140
2020-04-02 14:13:12 -04:00
Sanjay Patel f4448063cc [InstCombine] try to reduce shuffle with bitcasted operand
shuf (bitcast X), undef, Mask --> bitcast X'

The 'inverse shuffles' test (shuf_bitcast_operand) is a pattern
in the motivating examples from PR35454:
https://bugs.llvm.org/show_bug.cgi?id=35454
(see also D76727)

We can deal with this class of patterns in generic instcombine
because we are not creating any new shuffles, just a bitcast.

Alive2 proof:
http://volta.cs.utah.edu:8080/z/mwDUZf

Differential Revision: https://reviews.llvm.org/D76844
2020-04-02 13:44:50 -04:00
Sanjay Patel b6050ca181 [VectorCombine] transform bitcasted shuffle to narrower elements
bitcast (shuf V, MaskC) --> shuf (bitcast V), MaskC'

We do not attempt this in InstCombine because we do not want to change
types and create new shuffle ops that are potentially not lowered as
well as the original code. Here, we can check the cost model to see if
it is worthwhile.

I've aggressively enabled this transform even if the types are the same
size and/or equal cost because moving the bitcast allows InstCombine to
make further simplifications.

In the motivating cases from PR35454:
https://bugs.llvm.org/show_bug.cgi?id=35454
...this is enough to let instcombine and the backend eliminate the
redundant shuffles, but we probably want to extend VectorCombine to
handle the inverse pattern (shuffle-of-bitcast) to get that
simplification directly in IR.

Differential Revision: https://reviews.llvm.org/D76727
2020-04-02 13:30:22 -04:00
Sanjay Patel 12fcbcecff [InstCombine] add tests for cmyk benchmark; NFC
These are versions of a function that regressed with:
rGf2fbdf76d8d0

That particular problem occurs with an instcombine-simplifycfg-instcombine
sequence, but we can show that it exists within instcombine only with
other variations of the pattern.
2020-04-02 13:00:46 -04:00
Benjamin Kramer de8831934a [LoopDataPrefetch] Remove unused include that's a layering violation 2020-04-02 17:46:10 +02:00
Benjamin Kramer dffc503187 Revert "[SimplifyLibCalls] Erase replaced instructions"
This reverts commit 2a77544ad5. This
introduces a use-after-free in Transforms/InstCombine/sincospi.ll.
Found by asan.
2020-04-02 17:30:47 +02:00
Sanjay Patel 1008435f3d Revert "[InstCombine] do not exclude min/max from icmp with casted operand fold"
This reverts commit f2fbdf76d8.
As noted in the post-commit thread:
https://reviews.llvm.org/rGf2fbdf76d8d0
...this can obscure a min/max pattern where the components
have extra uses. We can show that the problem is independent
of this change with a slightly modified source example, so
this revert just delays/reduces the need to fix the real
problem.

We need to improve our analysis of negation or -- more
generally -- subtraction using patches like D77230 or D68408.
2020-04-02 09:15:23 -04:00
Tyker c00cb76274 [NFC] Split Knowledge retention and place it more appropriatly
Summary:
Splitting Knowledge retention into Queries in Analysis and Builder into Transform/Utils
allows Queries and Transform/Utils to use Analysis.

Reviewers: jdoerfert, sstefan1

Reviewed By: jdoerfert

Subscribers: mgorny, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77171
2020-04-02 15:01:41 +02:00
Jonas Paulsson 36d4421f50 [LoopDataPrefetch + SystemZ] Let target decide on prefetching for each loop.
This patch adds

- New arguments to getMinPrefetchStride() to let the target decide on a
  per-loop basis if software prefetching should be done even with a stride
  within the limit of the hw prefetcher.

- New TTI hook enableWritePrefetching() to let a target do write prefetching
  by default (defaults to false).

- In LoopDataPrefetch:

  - A search through the whole loop to gather information before emitting any
    prefetches. This way the target can get information via new arguments to
    getMinPrefetchStride() and emit prefetches more selectively. Collected
    information includes: Does the loop have a call, how many memory
    accesses, how many of them are strided, how many prefetches will cover
    them. This is NFC to before as long as the target does not change its
    definition of getMinPrefetchStride().

  - If a previous access to the same exact address was 'read', and the
    current one is 'write', make it a 'write' prefetch.

  - If two accesses that are covered by the same prefetch do not dominate
    each other, put the prefetch in a block that dominates both of them.

  - If a ConstantMaxTripCount is less than ItersAhead, then skip the loop.

- A SystemZ implementation of getMinPrefetchStride().

Review: Ulrich Weigand, Michael Kruse

Differential Revision: https://reviews.llvm.org/D70228
2020-04-02 14:57:46 +02:00
Florian Hahn a63b5c9e53 [CallSiteSplitting] Simplify isPredicateOnPHI & continue checking PHIs.
As pointed out by @thakis, currently CallSiteSplitting bails out after
checking the first PHI node. We should check all PHI nodes, until we
find one where call site splitting is beneficial.

This patch also slightly simplifies the code using BasicBlock::phis().

Reviewers: davidxl, junbuml, thakis

Reviewed By: davidxl

Differential Revision: https://reviews.llvm.org/D77089
2020-04-02 10:11:27 +01:00
Johannes Doerfert bcd8009369 [Attributor] Use the proper context instruction in genericValueTraversal
There was a TODO in genericValueTraversal to provide the context
instruction and due to the lack of it users that wanted one just used
something available. Unfortunately, using a fixed instruction is wrong
in the presence of PHIs so we need to update the context instruction
properly.

Reviewed By: uenoku

Differential Revision: https://reviews.llvm.org/D76870
2020-04-01 22:20:47 -05:00
Johannes Doerfert ac96c8fd85 [Attributor][FIX] Do not compute ranges for arguments of declarations
This cannot be triggered right now, as far as I know, but it doesn't
make sense to deduce a constant range on arguments of declarations.
Exposed during testing of AAValueSimplify extensions.
2020-04-01 22:05:30 -05:00
Johannes Doerfert 54d6a608bf [Attributor][NFC] Predetermine the module
It could happen that we delete the first function in the SCC in the
future so we should be careful accessing `Functions` after the manifest
stage.
2020-04-01 21:56:17 -05:00
Johannes Doerfert 9e19693994 [Attributor] Derive better alignment for accessed pointers
Use DL & ABI information for better alignment deduction, e.g., if a type
is accessed and the ABI specifies an alignment requirement for such an
access we can use it. This is based on a patch by @lebedev.ri and
inspired by getBaseAlign in Loads.cpp.

Depends on D76673.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D76674
2020-04-01 21:49:57 -05:00
Johannes Doerfert b1c788d051 [Attributor][FIX] Prevent alignment breakage wrt. must-tail calls
If we have a must-tail call the callee and caller need to have matching
ABIs. Part of that is alignment which we might modify when we deduce
alignment of arguments of either. Since we would need to keep them in
sync, which is not as simple, we simply avoid deducing alignment for
arguments of the must-tail caller or callee.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D76673
2020-04-01 21:40:07 -05:00
Johannes Doerfert 41f2a57d0b [Attributor][NFC] Use a BumpPtrAllocator to allocate `AbstractAttribute`s
We create a lot of AbstractAttributes and they live as long as
the Attributor does. It seems reasonable to allocate them via a
BumpPtrAllocator owned by the Attributor.

Reviewed By: baziotis

Differential Revision: https://reviews.llvm.org/D76589
2020-04-01 20:53:28 -05:00
Sanjay Patel 3d90048791 [InstCombine] enhance freelyNegateValue() by handling xor
Negation is equivalent to bitwise-not + 1, so try to convert more
subtracts into adds using this relationship:
0 - (A ^ C) => ((A ^ C) ^ -1) + 1 => A ^ ~C + 1

I doubt this will recover the regression noted in rGf2fbdf76d8d0,
but seems like we're going to need to improve here and/or revive D68408?

Alive2 proofs:
http://volta.cs.utah.edu:8080/z/Re5tMU
http://volta.cs.utah.edu:8080/z/An-uns

Differential Revision: https://reviews.llvm.org/D77230
2020-04-01 15:05:13 -04:00
Jonathan Roelofs 1148f004fa Fix PR45371: SeparateConstOffsetFromGEP clean up bookkeeping
find() was altering the UserChain, even in cases where it subsequently
discovered that the resulting constant was a 0. This confuses
rebuildWithoutConstOffset() when it attempts to walk the chain later, since it
is expected that the chain itself be a path down the use-def edges of an
expression.
2020-04-01 12:38:15 -06:00
Nikita Popov 50a3e8738a Revert "[InstCombine] Erase old instruction when replacing extractelements"
This reverts commit d40368fdb5.

llvm-clang-x86_64-expensive-checks-debian failure looks related.
2020-04-01 20:10:11 +02:00
Nikita Popov 2a77544ad5 [SimplifyLibCalls] Erase replaced instructions
After RAUWing an instruction, also erase it. This makes sure we
don't perform extra InstCombine iterations to clean up the garbage.
2020-04-01 20:00:10 +02:00
Uday Bondhugula 6ee11c3b0f [NewGVN] Make NewGVN aware of aligned_alloc
Make the New GVN pass aware of aligned_alloc.

Depends on D76975.

Differential Revision: https://reviews.llvm.org/D76976
2020-04-01 23:26:51 +05:30
Uday Bondhugula 4cf70af94f [GVN] Make GVN aware of aligned_alloc
Make the GVN pass aware of aligned_alloc.

Depends on D76974.

Differential Revision: https://reviews.llvm.org/D76975
2020-04-01 23:26:50 +05:30
Uday Bondhugula c4499e3333 [Attributor] Make attributor aware of aligned_alloc for heap to stack conversion
Make the attributor pass aware of aligned_alloc for converting heap
allocations to stack ones.

Depends on D76971.

Differential Revision: https://reviews.llvm.org/D76974
2020-04-01 23:26:50 +05:30
Nikita Popov d40368fdb5 [InstCombine] Erase old instruction when replacing extractelements
As we are not returning the result of replaceInstUsesWith(),
so we need to clean up ourselves.

NFC apart from worklist order.
2020-04-01 19:55:28 +02:00
Nikita Popov 4b35c816ef [InstCombine] Use replaceOperand() in div transforms
To make sure the old operand is DCEd.

NFC apart from worklist order.
2020-04-01 19:55:00 +02:00
Benjamin Kramer 66b9f5f7f0 [GVNSink] Simplify code. NFC. 2020-04-01 13:13:00 +02:00
Cullen Rhodes 84aa6cf1a9 [Transforms][SROA] Promote allocas with mem2reg for scalable types
Summary:
Aggregate types containing scalable vectors aren't supported and as far
as I can tell this pass is mostly concerned with optimisations on
aggregate types, so the majority of this pass isn't very useful for
scalable vectors.

This patch modifies SROA such that mem2reg is run on allocas with
scalable types that are promotable, but nothing else such as slicing is
done.

The use of TypeSize in this pass has also been updated to be explicitly
fixed size. When invoking the following methods in DataLayout:

    * getTypeSizeInBits
    * getTypeStoreSize
    * getTypeStoreSizeInBits
    * getTypeAllocSize

we now called getFixedSize on the resultant TypeSize. This is quite an
extensive change with around 50 calls to these functions, and also the
first change of this kind (being explicit about fixed vs scalable
size) as far as I'm aware, so feedback welcome.

A test is included containing IR with scalable vectors that this pass is
able to optimise.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D76720
2020-04-01 10:34:11 +00:00
Eli Friedman ba4764c2cc Fix leak in GVNSink introduced in D72467. 2020-03-31 16:21:27 -07:00
Evgenii Stepanov f9471b0010 Fix MSan false positive due to select folding.
Summary:
Select folding in JumpThreading can create a conditional branch on a
code patch that did not have one in the original program. This is not a
valid transformation in sanitize_memory functions.

Note that JumpThreading does select folding in 3 different places. Two
of them seem safe - they apply to a select instruction in a BB that ends
with an unconditional branch to another BB, which (in turn) ends with a
conditional branch or a switch with the same condition.

Fixes PR45220.

Reviewers: glider, dvyukov, efriedma

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76332
2020-03-31 15:25:42 -07:00
Anna Thomas 58a05675da Revert "[InlineFunction] Handle return attributes on call within inlined body"
This reverts commit 28518d9ae3.
There is a failure in MsgPackReader.cpp when built with clang. It
complains about "signext and zeroext" are incompatible. Investigating
offline if it is infact a UB in the MsgPackReader code.
2020-03-31 16:16:34 -04:00
Nikita Popov b7fe795e5b [InstCombine] Use replaceOperand() in some select transforms
To make sure the old operand is DCEd.

NFC apart from worklist order.
2020-03-31 22:10:55 +02:00
Eli Friedman 1ee6ec2bf3 Remove "mask" operand from shufflevector.
Instead, represent the mask as out-of-line data in the instruction. This
should be more efficient in the places that currently use
getShuffleVector(), and paves the way for further changes to add new
shuffles for scalable vectors.

This doesn't change the syntax in textual IR. And I don't currently plan
to change the bitcode encoding in this patch, although we'll probably
need to do something once we extend shufflevector for scalable types.

I expect that once this is finished, we can then replace the raw "mask"
with something more appropriate for scalable vectors.  Not sure exactly
what this looks like at the moment, but there are a few different ways
we could handle it.  Maybe we could try to describe specific shuffles.
Or maybe we could define it in terms of a function to convert a fixed-length
array into an appropriate scalable vector, using a "step", or something
like that.

Differential Revision: https://reviews.llvm.org/D72467
2020-03-31 13:08:59 -07:00
Nikita Popov c538c57d6d [InstCombine] Use replaceOperand() in descaling
To make sure the old operand gets DCEd.

NFC apart from worklist order.
2020-03-31 22:05:53 +02:00
Nikita Popov 19df7fa892 [InstCombine] Erase old alloca in cast of alloca transform
As we don't return the replaceInstUsesWith() result, we are
responsible for erasing the instruction.

NFC apart from worklist order.
2020-03-31 21:57:39 +02:00
Nikita Popov 87357808b8 [InstCombine] Use replaceOperand() in non zero phi transform
To make sure the old operand gets DCEd.

NFC apart from worklist order changes.
2020-03-31 21:54:21 +02:00
Nikita Popov f3d4166368 [InstCombine] Report change in non zero phi transform
We need to inform InstCombine (and transitively the pass manager)
that we changed an instruction.
2020-03-31 21:52:40 +02:00
Anna Thomas 28518d9ae3 [InlineFunction] Handle return attributes on call within inlined body
Consider a callee function that has a call (C) within it which feeds
into the return.  When we inline that callee into a callsite that has
return attributes, we can backward propagate those attributes to the
call (C) within that inlined callee body.

This is safe to do so only if we can guarantee transfer of execution to
successor in the window of instructions between return value (i.e. the
call C) and the return instruction.

See added test cases.

Reviewed-By: reames, jdoerfert

Differential Revision: https://reviews.llvm.org/D76140
2020-03-31 14:35:40 -04:00
Uday Bondhugula dc817b2dea [InstCombine] Deduce attributes for aligned_alloc in InstCombine
Make InstCombine aware of the aligned_alloc library function.

Signed-off-by: Uday Bondhugula <uday@polymagelabs.com>

Depends on D76970.

Differential Revision: https://reviews.llvm.org/D76971
2020-03-31 23:17:28 +05:30
Florian Hahn b0cd7b2799 [SCCP] Limit use of range info for binops to integers for now.
This fixes a crash when building the test suite.
2020-03-31 17:08:09 +01:00
Tyker 4aeb7e1ef4 [AssumeBundles] Preserve information in EarlyCSE
Summary: this patch preserve information from various places in EarlyCSE into assume bundles.

Reviewers: jdoerfert

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76769
2020-03-31 17:47:04 +02:00
Florian Hahn b37543750c [ValueLattice] Distinguish between constant ranges with/without undef.
This patch updates ValueLattice to distinguish between ranges that are
guaranteed to not include undef and ranges that may include undef.

A constant range guaranteed to not contain undef can be used to simplify
instructions to arbitrary values. A constant range that may contain
undef can only be used to simplify to a constant. If the value can be
undef, it might take a value outside the range. For example, consider
the snipped below

define i32 @f(i32 %a, i1 %c) {
  br i1 %c, label %true, label %false
true:
  %a.255 = and i32 %a, 255
  br label %exit
false:
  br label %exit
exit:
  %p = phi i32 [ %a.255, %true ], [ undef, %false ]
  %f.1 = icmp eq i32 %p, 300
  call void @use(i1 %f.1)
  %res = and i32 %p, 255
  ret i32 %res
}

In the exit block, %p would be a constant range [0, 256) including undef as
%p could be undef. We can use the range information to replace %f.1 with
false because we remove the compare, effectively forcing the use of the
constant to be != 300. We cannot replace %res with %p however, because
if %a would be undef %cond may be true but the  second use might not be
< 256.

Currently LazyValueInfo uses the new behavior just when simplifying AND
instructions and does not distinguish between constant ranges with and
without undef otherwise. I think we should address the remaining issues
in LVI incrementally.

Reviewers: efriedma, reames, aqjune, jdoerfert, sstefan1

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D76931
2020-03-31 12:50:20 +01:00
Daan Sprenkels 464b9aeafe [InstCombine] Transform extelt-trunc -> bitcast-extelt
Canonicalize the case when a scalar extracted from a vector is
truncated.  Transform such cases to bitcast-then-extractelement.
This will enable erasing the truncate operation.

This commit fixes PR45314.

reviewers: spatel

Differential revision: https://reviews.llvm.org/D76983
2020-03-31 11:53:41 +02:00
Sebastian Neubauer 5d3a69feca [AMDGPU] New llvm.amdgcn.ballot intrinsic
Add a new llvm.amdgcn.ballot intrinsic modeled on the ballot function
in GLSL and other shader languages. It returns a bitfield containing the
result of its boolean argument in all active lanes, and zero in all
inactive lanes.

This is intended to replace the existing llvm.amdgcn.icmp and
llvm.amdgcn.fcmp intrinsics after a suitable transition period.

Use the new intrinsic in the atomic optimizer pass.

Differential Revision: https://reviews.llvm.org/D65088
2020-03-31 10:35:39 +02:00
Florian Hahn 0c9c58ada0 [SCCP] Use constant ranges for casts.
For casts with constant range operands, we can use
ConstantRange::castOp.

Reviewers: davide, efriedma, mssimpso

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D71938
2020-03-31 09:22:04 +01:00
Wei Mi ebad678857 [SampleFDO] Port MD5 name table support to extbinary format.
Compbinary format uses MD5 to represent strings in name table. That gives smaller profile without the need of compression/decompression when writing/reading the profile. The patch adds the support in extbinary format. It is off by default but user can choose to enable it.

Note the feature of using MD5 in name table can bring very small chance of name conflict leading to profile mismatch. Besides, profile using the feature won't have the profile remapping support.

Differential Revision: https://reviews.llvm.org/D76255
2020-03-30 22:07:08 -07:00
Sanjay Patel f2fbdf76d8 [InstCombine] do not exclude min/max from icmp with casted operand fold
InstCombine has a mess of logic that tries to preserve min/max patterns,
but AFAICT, this one is not necessary because we can always narrow the
corresponding select in this sequence to match the narrow compare.

The biggest danger for this patch is inducing infinite looping or
assert from exceeding max iterations. If any bots hit that in the
vicinity of this commit, this is the likely patch to blame.
2020-03-30 16:10:51 -04:00
Thomas Raoux 3ea0774b13 [ConstantFold][NFC] Compile time optimization for large vectors
Optimize the common case of splat vector constant. For large vector
going through all elements is expensive. For splatr/broadcast cases we
can skip going through all elements.

Differential Revision: https://reviews.llvm.org/D76664
2020-03-30 11:27:09 -07:00
Sameer Sahasrabuddhe 3cbbded68c Introduce unify-loop-exits pass.
For each natural loop with multiple exit blocks, this pass creates a
new block N such that all exiting blocks now branch to N, and then
control flow is redistributed to all the original exit blocks.

The bulk of the tranformation is a new function introduced in
BasicBlockUtils that an redirect control flow from a set of incoming
blocks to a set of outgoing blocks via a common "hub".

This is a useful workaround for a limitation in the structurizer which
incorrectly orders blocks when processing a nest of loops. This pass
bypasses that issue by ensuring that each natural loop is recognized
as a separate region. Since the structurizer is a region pass, it no
longer sees a nest of loops in a single region, and instead processes
each "level" in the nesting as a separate region.

The AMDGPU backend provides a new option to enable this pass before
the structurizer, which may eventually be enabled by default.

Reviewers: madhur13490, arsenm, nhaehnle

Reviewed By: nhaehnle

Differential Revision: https://reviews.llvm.org/D75865
2020-03-30 13:23:56 -04:00
Vedant Kumar dcc410b5cf [LoopVectorize] Fix crash on "getNoopOrZeroExtend cannot truncate!" (PR45259)
In InnerLoopVectorizer::getOrCreateTripCount, when the backedge taken
count is a SCEV add expression, its type is defined by the type of the
last operand of the add expression.

In the test case from PR45259, this last operand happens to be a
pointer, which (according to llvm::Type) does not have a primitive size
in bits. In this case, LoopVectorize fails to truncate the SCEV and
crashes as a result.

Uing ScalarEvolution::getTypeSizeInBits makes the truncation work as expected.

https://bugs.llvm.org/show_bug.cgi?id=45259

Differential Revision: https://reviews.llvm.org/D76669
2020-03-30 10:14:14 -07:00
Chris Jackson f6b2c003f3 [DebugInfo] Ensure that a demanded bits optimisation in
InstCombine does not result in an incorrect debuginfo variable
value

- Add an additional salvage and a test.

Reviewers: aprantl, djtodoro

Differential Revision: https://reviews.llvm.org/D76854

Bugzilla:  https://bugs.llvm.org/show_bug.cgi?id=44371
2020-03-30 15:39:22 +01:00
Chris Jackson 135709aa90 [DebugInfo] Ensure dead store elimination can mark an operand
value as undefined

    - Correct a debug info salvage and add a test

    Reviewers: aprantl, vsk

    Differential Revision: https://reviews.llvm.org/D76930
    Bugzilla: https://bugs.llvm.org/show_bug.cgi?id=45080
2020-03-30 14:58:14 +01:00
Florian Hahn 9e81249d76 [Matrix] Rename emitChainedMatrixMultiply to emitMatrixMultiply (NFC).
The Chained in the name potentially leads to confusion. Also updated the
comment to drop the unnecessary mention of tile-sized.
2020-03-30 11:17:25 +01:00
Jun Ma 31a1d85c53 [Coroutines 2/2] Improve symmetric control transfer feature
Differential Revision: https://reviews.llvm.org/D76913
2020-03-30 09:53:09 +08:00
Jun Ma a94fa2c049 [Coroutines 1/2] Improve symmetric control transfer feature
Differential Revision: https://reviews.llvm.org/D76911
2020-03-30 09:53:09 +08:00
Nikita Popov 8253a86b65 [InstCombine] Erase old mul when creating umulo
As we don't return the result of replaceInstUsesWith(), we are
responsible for erasing the instruction.

There is a small subtlety here in that we need to do this after
the other uses of Builder, which uses the original multiply as
the insertion point.

NFC apart from worklist order changes.
2020-03-29 20:46:08 +02:00
Nikita Popov 53d209076a [InstCombine] Use replaceOperand() in demanded elements simplification
To make sure that dead operands get DCEd. This fixes the largest
source of leftover dead operands we see in tests.

NFC apart from worklist changes.
2020-03-29 20:43:19 +02:00
Nikita Popov 0c87140065 [InstCombine] Use replaceOperand() in assoc cast simplification
To make sure the old operands are DCEd.

NFC apart from worklist order.
2020-03-29 20:28:37 +02:00
Nikita Popov a9ddcd6411 [InstCombine] Erase old add when optimizing add overflow
We don't return the replaceInstUsesWith() result, so we're
responsible for cleaning up.

NFC apart from worklist order changes.
2020-03-29 20:20:14 +02:00
Uday Bondhugula c0955edfd6 Introduce support for lib function aligned_alloc in TLI / memory builtins
Aligned_alloc is a standard lib function and has been in glibc since
2.16 and in the C11 standard. It has semantics similar to malloc/calloc
for several analyses/transforms. This patch introduces aligned_alloc
in target library info and memory builtins. Subsequent ones will
make other passes aware and fix https://bugs.llvm.org/show_bug.cgi?id=44062

This change will also be useful to LLVM generators that need to allocate
buffers of vector elements larger than 16 bytes (for eg. 256-bit ones),
element boundary alignment for which is not typically provided by glibc malloc.

Signed-off-by: Uday Bondhugula <uday@polymagelabs.com>

Differential Revision: https://reviews.llvm.org/D76970
2020-03-29 23:36:24 +05:30
Sanjay Patel fc3cc8a4b0 [VectorCombine] skip debug intrinsics first for efficiency 2020-03-29 13:58:04 -04:00
Nikita Popov 26fa33755f [InstCombine] Simplify select of cmpxchg transform
Rather than converting to a dummy select with equal true and false
ops, just directly return the resulting value.

As a side-effect, this fixes missing DCE of the previously replaced
operand.
2020-03-29 18:57:32 +02:00
Nikita Popov 28f67bd5c5 [InstCombine] Fix worklist management in varargs transform
Add a replaceUse() helper to mirror replaceOperand() for the
rare cases where we're working directly on uses.

NFC apart from worklist order changes.
2020-03-29 18:04:12 +02:00
Nikita Popov 6f07a9e80a [InstCombine] Erase original add when creating saddo
Usually when we replaceInstUsesWith() we also return the original
instruction, and InstCombine will take care of erasing it. Here
we don't do that, so we need to manually erase it.

NFC apart from worklist order changes.
2020-03-29 18:01:32 +02:00
Nikita Popov 1e363023b8 [InstCombine] Use replaceOperand() in a few more places
To make sure the old operands get DCEd.

NFC apart from worklist order changes.
2020-03-29 18:01:00 +02:00
Florian Hahn 49d00824bb [VPlan] Use one VPWidenRecipe per original IR instruction. (NFC).
This patch changes VPWidenRecipe to only store a single original IR
instruction. This is the first required step towards modeling it's
operands as VPValues and also towards breaking it up into a
VPInstruction.

Discussed as part of D74695.

Reviewers: Ayal, gilr, rengolin

Reviewed By: gilr

Differential Revision: https://reviews.llvm.org/D76988
2020-03-29 13:47:28 +01:00
Richard Diamond 4bf015c035 [AlignmentFromAssumptions] Fix a SCEV assertion resulting from address space differences.
Summary:
On targets with different pointer sizes, -alignment-from-assumptions could attempt to create SCEV expressions which use different effective SCEV types. The provided test illustrates the issue.

In `getNewAlignment`, AASCEV would be the (only) alloca, which would have an effective SCEV type of i32. But PtrSCEV, the GEP in this case, due to being in the flat/default address space, will have an effective SCEV of i64.

This patch resolves the issue by truncating PtrSCEV to AASCEV's effective type.

Reviewers: hfinkel, jdoerfert

Reviewed By: jdoerfert

Subscribers: jvesely, nhaehnle, hiraditya, javed.absar, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75471
2020-03-29 01:26:31 -05:00
Nikita Popov 2215dcf1d7 [InstCombine] Remove unreachable blocks before DCE
Dropping unreachable code may reduce use counts on other instructions,
so it's better to do this earlier rather than later.

NFC-ish, may only impact worklist order.
2020-03-28 21:19:16 +01:00
Nikita Popov 97cc1275c7 [InstCombine] Merge two functions; NFC
Merge AddReachableCodeToWorklist() into prepareICWorklistFromFunction().
It's one logical step, and this makes it easier to move code.
2020-03-28 21:19:16 +01:00
Nikita Popov 30d712103f [InstCombine] Use replaceOperand() API in GEP transforms
To make sure that replaced operands get DCEd. This drops one
iteration from gepphigep.ll, which is still not optimal.

This was the last test case performing more than 3 iterations.

NFC-ish, only worklist order should change.
2020-03-28 19:07:25 +01:00
Nikita Popov b1f78baeaa [InstCombine] Reduce code duplication in GEP of PHI transform; NFC
The `NewGEP->setOperand(DI, NewPN)` call was duplicated, and the
insertion of NewGEP is the same in both if/else, so we can extract it.
2020-03-28 19:07:25 +01:00
Nikita Popov 672e8bfbfc [InstCombine] Fix worklist management in foldXorOfICmps()
Because this code does not use the IC-aware replaceInstUsesWith()
helper, we need to manually push users to the worklist.

This is NFC-ish, in that it may only change worklist order.
2020-03-28 18:25:21 +01:00
Enna1 03bc311a16 [CorrelatedValuePropagation] Remove redundant if statement in processSelect()
This statement

    if (ReplaceWith == S) ReplaceWith = UndefValue::get(S->getType());

is introduced in https://reviews.llvm.org/rG35609d97ae89b8e13f40f4e6b9b056954f8baa83
to fix a case where unreachable code can cause select instruction
simplification to fail. In https://reviews.llvm.org/rGd10480657527ffb44ea213460fb3676a6b1300aa,
we begin to perform a depth-first walk of basic blocks. This means
we will not visit unreachable blocks. So we do not need this the
special check any more.

Differential Revision: https://reviews.llvm.org/D76753
2020-03-28 18:01:17 +01:00
Florian Hahn 81f173ed0e [SCCP] Remove LatticeVal alias now that transition is done (NFC).
The LatticeVal alias was introduced to reduce the diff size for the
transition to ValueLatticeElement, which is done now.

This patch removes the unnecessary alias and updates some very verbose
type uses with auto.
2020-03-28 15:40:24 +00:00
Florian Hahn a44bf59c93 [SCCP] Remove unused toLatticeValue helper (NFC).
LatticeVal is an alias for ValueLatticeElement and the function is not
used any longer.
2020-03-28 15:40:24 +00:00
Uday Bondhugula 06066c4003 [NFC] Attributor comment updates / cast cleanup
Minor update/fixes to comments for the Attributor pass, and dyn_cast -> cast.

Signed-off-by: Uday Bondhugula <uday@polymagelabs.com>

Differential Revision: https://reviews.llvm.org/D76972
2020-03-28 13:36:43 +05:30
Sanjay Patel 0f56bbc1a5 [InstCombine] reduce FP-casted and bitcasted signbit check
PR45305:
https://bugs.llvm.org/show_bug.cgi?id=45305

Alive2 proofs:
http://volta.cs.utah.edu:8080/z/bVyrko
http://volta.cs.utah.edu:8080/z/Vxpz9q
2020-03-27 17:33:59 -04:00
Sjoerd Meijer 401a324c51 [LV] Refactor widenIntOrFpInduction. NFC.
This untangles the logic in widenIntOrFpInduction in order to make more
explicit and visible how exactly the induction variable is lowered.

Differential Revision: https://reviews.llvm.org/D76686
2020-03-27 12:58:50 +00:00
Jonathan Roelofs 7a89a5d81b [InstCombine] Fix Incorrect fold of ashr+xor -> lshr w/ vectors
Fixes https://bugs.llvm.org/show_bug.cgi?id=43665
2020-03-26 12:09:36 -06:00
Fangrui Song 4c52d51e78 [InstCombine] Fix a code-sinking bug after D73832/f1a9efabcb9b
- UserParent = PN->getIncomingBlock(*I->use_begin());
+ UserParent = PN->getIncomingBlock(*SingleUse);

The first use of I may be droppable (llvm.assume).

When compiling llvm/lib/IR/AutoUpgrade.cpp with a bootstrapped clang
with ThinLTO with minimized bitcode files, I see such a case in
the function _ZN4llvm20UpgradeIntrinsicCallEPNS_8CallInstEPNS_8FunctionE

  clang -c -fthinlto-index=AutoUpgrade.o.thinlto.bc AutoUpgrade.bc -O3

Unfortunately it is really difficult to get a minimized reproduce.
2020-03-25 22:50:53 -07:00
John McCall 9514c048d8 Use optimal layout and preserve alloca alignment in coroutine frames.
Previously, we would ignore alloca alignment when building the frame
and just use the natural alignment of the allocated type.  If an alloca
is over-aligned for its IR type, this could lead to a frame entry with
inadequate alignment for the downstream uses of the alloca.

Since highly-aligned fields also tend to produce poor layouts under a
naive layout algorithm, I've also switched coroutine frames to use the
new optimal struct layout algorithm.

In order to communicate the frame size and alignment to later passes,
I needed to set align+dereferenceable attributes on the frame-pointer
parameter of the resume function.  This is clearly the right thing to
do, but the align attribute currently seems to result in assumptions
being added during inlining that the optimizer cannot easily remove.
2020-03-26 00:51:09 -04:00
Tyker f1a9efabcb Ignore/Drop droppable uses for code-sinking in InstCombine
Summary:
This patch allows code-sinking in InstCombine to be performed when instruction have uses in llvm.assume.

Use are considered droppable when it is preferable to modify the User such that the use disappears rather than to prevent a transformation because of the use.
for now uses are considered droppable if they are in an llvm.assume.

Reviewers: jdoerfert, nikic, spatel, lebedev.ri, sstefan1

Reviewed By: jdoerfert

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73832
2020-03-25 20:42:52 +01:00
Alina Sbirlea 3abcbf9903 [CFG/BasicBlock] Rename succ_const to const_succ. [NFC]
Summary:
Rename `succ_const_iterator` to `const_succ_iterator` and
`succ_const_range` to `const_succ_range` for consistency with the
predecessor iterators, and the corresponding iterators in
MachineBasicBlock.

Reviewers: nicholas, dblaikie, nlewycky

Subscribers: hiraditya, bmahjour, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75952
2020-03-25 12:40:55 -07:00
Gil Rapaport 078c863305 [LV] Replace stored value with a VPValue (NFCI)
InnerLoopVectorizer's code called during VPlan execution still relies on
original IR's def-use relations to decide which vector code to generate,
limiting VPlan transformations ability to modify def-use relations and still
have ILV generate the vector code.
This commit introduces a VPValue for VPWidenMemoryInstructionRecipe to use as
the stored value. The recipe is generated with a VPValue wrapping the stored
value of the scalar store. This reduces ingredient def-use usage by ILV as a
step towards full VPlan-based def-use relations.

Differential Revision: https://reviews.llvm.org/D76373
2020-03-25 19:36:55 +02:00
Tyker d72c586aeb [NFC] Rename function to match Coding Convention and fix typo in KnowledgeRetention 2020-03-25 18:31:13 +01:00
Johannes Doerfert 5699d08b79 [Attributor] Use knowledge retained in llvm.assume (operand bundles)
This patch integrates operand bundle llvm.assumes [0] with the
Attributor. Most IRAttributes will now look at uses of the associated
value and if there are llvm.assume operand bundle uses with the right
tag we will check if they are in the must-be-executed-context (around
the context instruction). Droppable users, which is currently only
llvm::assume, are handled special in some places now as well.

[0] http://lists.llvm.org/pipermail/llvm-dev/2019-December/137632.html

Reviewed By: uenoku

Differential Revision: https://reviews.llvm.org/D74888
2020-03-24 15:33:40 -05:00
Juneyoung Lee 49f75132bc [DivRemPairs] Freeze operands if they can be undef values
Summary:
DivRemPairs is unsound with respect to undef values.

```
      // bb1:
      //   %rem = srem %x, %y
      // bb2:
      //   %div = sdiv %x, %y
      // -->
      // bb1:
      //   %div = sdiv %x, %y
      //   %mul = mul %div, %y
      //   %rem = sub %x, %mul
```

If X can be undef, X should be frozen first.
For example, let's assume that Y = 1 & X = undef:
```
   %div = sdiv undef, 1 // %div = undef
   %rem = srem undef, 1 // %rem = 0
 =>
   %div = sdiv undef, 1 // %div = undef
   %mul = mul %div, 1   // %mul = undef
   %rem = sub %x, %mul  // %rem = undef - undef = undef
```
http://volta.cs.utah.edu:8080/z/m7Xrx5

Same for Y. If X = 1 and Y = (undef | 1), %rem in src is either 1 or 0,
but %rem in tgt can be one of many integer values.

This resolves https://bugs.llvm.org/show_bug.cgi?id=42619 .

This miscompilation disappears if undef value is removed, but it may take a while.
DivRemPair happens pretty late during the optimization pipeline, so this optimization seemed as a good candidate to fix without major regression using freeze than other broken optimizations.

Reviewers: spatel, lebedev.ri, george.burgess.iv

Reviewed By: spatel

Subscribers: wuzish, regehr, nlopes, nemanjai, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76483
2020-03-25 03:46:14 +09:00
Jun Ma a44de12ab2 [Coroutines] Also check lifetime intrinsic for local variable when build
coroutine frame

Currently we move all allocas into the frame when build coroutine frame in
CoroSplit pass. However, this can be relaxed.

Since CoroSplit pass run after Inline pass, we can use lifetime intrinsic to
do such analysis: If the scope of lifetime intrinsic is not across any suspend
point, rather than move the allocas to frame, we can just move them to entry bb
of corresponding function. This reduce the frame size.

More importantly, this also avoid data race in multithread environment.
Consider one inline function by coroutine: it starts a thread which access
local variables, while after inline the movement of allocs to frame also access
them. cause data race.

Differential Revision: https://reviews.llvm.org/D75664
2020-03-24 13:41:55 +08:00
Vedant Kumar b7cd291c15 [GlobalOpt] Treat null-check of loaded value as use of global (PR35760)
PR35760 shows an example program which, when compiled with `clang -O0`
or gcc at any optimization level, prints '0'. However, llvm transforms
the program in a way that causes it to print '1'.

Fix the issue by having `AllUsesOfValueWillTrapIfNull` return false when
analyzing a load from a global which is used by an `icmp`. This special
case was untested [0] so this is just deleting dead code.

An alternative fix might be to change the GlobalStatus analysis for the
global to report "Stored" instead of "StoredOnce". However, "StoredOnce"
is appropriate when only one value other than the initializer is stored
to the global.

[0]
http://lab.llvm.org:8080/coverage/coverage-reports/coverage/Users/buildslave/jenkins/workspace/coverage/llvm-project/llvm/lib/Transforms/IPO/GlobalOpt.cpp.html#L662

Differential Revision: https://reviews.llvm.org/D76645
2020-03-23 22:36:09 -07:00
Johannes Doerfert f09f4b2676 [OpenMPOpt] Initialize value to avoid use of uninitialized memory
This should fix the issue reported here:
  https://reviews.llvm.org/D76058#1937554
2020-03-23 19:17:19 -05:00
Matt Arsenault b20a1d840f GVNSink: Allow handling addrspacecast 2020-03-23 16:50:58 -04:00
Stefanos Baziotis a650d555fc [Attributor][NFC] Refactorings and typos in doc
Reviewed By: sstefan1, uenoku

Differential Revision: https://reviews.llvm.org/D76175
2020-03-23 22:44:10 +02:00
Matt Arsenault 43d98a0ecf Allow replacing intrinsic operands with variables
Since intrinsics can now specify when an argument is required to be
constant, it is now OK to replace arguments with variables if they
aren't. This means intrinsics must now be accurately marked with
immarg.
2020-03-23 15:51:57 -04:00
Sanjay Patel a1fe6beb1e [InstCombine] remove one-use check for ctpop -> cttz
Two one-use checks were added with rGfdcb27105537,
but only the first one is necessary to limit an
increase in instruction count. The second transform
only creates one instruction, so it is always a
reasonable canonicalization/optimization.
2020-03-23 13:59:57 -04:00
Johannes Doerfert 9d38f98dc3 [OpenMPOpt] Validate declaration types against the expected types
Validation of the found runtime library functions declarations types
(return and argument types) with the expected types.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D76058
2020-03-23 11:43:36 -05:00
Benjamin Kramer ff2f5097ed [Attributor] Fold single-use variable into assert
Fixes unused variable warning in Release builds.
2020-03-23 17:41:52 +01:00
Johannes Doerfert c57689bef2 [Attributor][NFC] Copy llvm::function_ref, don't use references
On IRC this was called a "code smell" so we get rid of it.
2020-03-23 10:45:24 -05:00
Johannes Doerfert 68fed27067 [Attributor] Handle calls in AAValueConstantRange properly
We did handle calls that were operands of certain instructions but not
standalone calls we visit via indirection, e.g., selects.
2020-03-23 10:45:24 -05:00
Johannes Doerfert 54ec9b54f6 [Attributor] Unify handling of must-tail calls
We special cased must-tail calls all over the place because they cannot
be modified as other calls can be. However, we already centralized the
modification API so we can centralize the handling as well. This
simplifies the code and allows to remove must-tail calls completely.
2020-03-23 10:45:24 -05:00
Johannes Doerfert 0995001ce5 [Attributor][NFC] Predetermine the module before verification
It could happen that we delete the first function in the SCC in the
future so we should be careful accessing `Functions` after the manifest
stage.
2020-03-23 10:45:23 -05:00
Johannes Doerfert f3bf4b05c2 [Attributor][NFC] clang-format Attributor.{h,cpp} 2020-03-23 10:45:23 -05:00
Simon Pilgrim fdcb271055 [InstCombine] Limit CTPOP -> CTTZ simplifications to one use
Tweak D76568 so we only combine if it will remove the bit-twiddling.

Suggested by @spatel
2020-03-23 14:33:41 +00:00
Guillaume Chatelet 32851f8d63 [Alignment][NFC] Deprecate VectorUtils::getAlignment
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: hiraditya, rogfer01, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76542
2020-03-23 13:54:15 +01:00
Simon Pilgrim 72d1419bfb [InstCombine] Add CTPOP -> CTTZ simplifications (PR43513)
As detailed on PR43513, we can simplify:

ctpop(x | -x) -> bitwidth - cttz(x, false)
Alive2: http://volta.cs.utah.edu:8080/z/caw49X

ctpop(~x & (x - 1)) -> cttz(x, false)
Alive2: http://volta.cs.utah.edu:8080/z/5zfVrx

I've tweaked the initial test cases I added at rG2d712fb75584 to increase commutativity testing.

Differential Revision: https://reviews.llvm.org/D76568
2020-03-23 11:04:33 +00:00
Nikita Popov dc81923659 [InstCombine] Remove ExpensiveCombines option
D75801 removed the last and only user of this option, so we can
drop it now. The original idea behind this was to only run expensive
transforms under -O3, but apart from the one known bits transform,
this has never really taken off. I believe nowadays the recommendation
is to put expensive transforms in AggressiveInstCombine instead,
though that isn't terribly popular either :)

Differential Revision: https://reviews.llvm.org/D76540
2020-03-22 16:56:28 +01:00
Matt Arsenault 830cfda19f Utils: Mostly convert memcpy expansion to use Align
The TTI hooks aren't converted. I also think the intrinsics should
have mandatory alignment and never return MaybeAlign.
2020-03-22 11:21:44 -04:00
Nikita Popov a63eaa5449 [SLP] Avoid repeated visitation in getVectorElementSize(); NFC
We need to insert into the Visited set at the same time we insert
into the worklist. Otherwise we may end up pushing the same
instruction to the worklist multiple times, and only adding it to
the visited set later.
2020-03-22 14:34:29 +01:00
Simon Pilgrim f00a4b531a [InstCombine][X86] simplifyX86immShift - remove ConstantAggregateZero handling. NFC.
The llvm::computeKnownBits path now handles this.
2020-03-21 11:30:44 +00:00
Nikita Popov 2b52e4e629 [InstCombine] Remove known bits constant folding
If ExpensiveCombines is enabled (which is the case with -O3 on the
legacy PM and always on the new PM), InstCombine tries to compute
the known bits of all instructions in the hope that all bits end up
being known, which is fairly expensive.

How effective is it? If we add some statistics on how often the
constant folding succeeds and how many KnownBits calculations are
performed and run test-suite we get:

    "instcombine.NumConstPropKnownBits": 642,
    "instcombine.NumConstPropKnownBitsComputed": 18744965,

In other words, we get one fold for every 30000 KnownBits calculations.
However, the truth is actually much worse: Currently, known bits are
computed before performing other folds, so there is a high chance
that cases that get folded by known bits would also have been
handled by other folds.

What happens if we compute known bits after all other folds
(hacky implementation: https://gist.github.com/nikic/751f25b3b9d9e0860db5dde934f70f46)?

    "instcombine.NumConstPropKnownBits": 0,
    "instcombine.NumConstPropKnownBitsComputed": 18105547,

So it turns out despite doing 18 million known bits calculations,
the known bits fold does not do anything useful on test-suite.
I was originally planning to move this into AggressiveInstCombine
so it only runs once in the pipeline, but seeing this, I think
we're better off removing it entirely.

As this is the only use of the "expensive combines" mechanism,
it may be removed afterwards, but I'll leave that to a separate patch.

Differential Revision: https://reviews.llvm.org/D75801
2020-03-20 20:54:06 +01:00
Nikita Popov 3205d1a860 [InstCombine] Handle known shl nsw sign bit in SimplifyDemanded
Ideally SimplifyDemanded should compute the same known bits as
computeKnownBits(). This patch addresses one discrepancy, where
ValueTracking is more powerful: If we have a shl nsw shift, we
know that the sign bit of the input and output must be the same.
If this results in a conflict, the result is poison.

This is implemented in
2c4ca6832f/lib/Analysis/ValueTracking.cpp (L1175-L1179)
and
2c4ca6832f/lib/Analysis/ValueTracking.cpp (L904-L908).

This implements the same basic logic in SimplifyDemanded. It's
slightly stronger, because I return undef instead of zero for the
poison case (which is not an option inside ValueTracking).

As mentioned in https://reviews.llvm.org/D75801#inline-698484,
we could detect poison in more cases, this just establishes parity
with the existing logic.

Differential Revision: https://reviews.llvm.org/D76489
2020-03-20 18:16:05 +01:00
Simon Pilgrim 34659de5fd [InstCombine][X86] simplifyX86immShift - convert variable in-range vector shift by scalar amounts to generic shifts (PR40391)
The sll/srl/sra scalar vector shifts can be replaced with generic shifts if the shift amount is known to be in range.

This also required public DemandedElts variants of llvm::computeKnownBits to be exposed (PR36319).
2020-03-20 15:48:06 +00:00
Nikita Popov 0372768776 [InstCombine] Simplify calls with "returned" attribute
If a call argument has the "returned" attribute, we can simplify
the call to the value of that argument. This was already partially
handled by InstSimplify/InstCombine for the case where the argument
is an integer constant, and the result is thus known via known bits.
The non-constant (or non-int) argument cases weren't handled though.

This previously landed as an InstSimplify transform, but was reverted
due to assertion failures when compiling the Linux kernel. The reason
is that simplifying a call to another call breaks assumptions in
call graph updating during inlining. As the code is not easy to fix,
and there is no particularly strong motivation for having this in
InstSimplify, the transform is only performed in InstCombine instead.

Differential Revision: https://reviews.llvm.org/D75815
2020-03-20 10:23:39 +01:00
Nikita Popov 5c10967157 [InstCombine] Don't replace musttail result based on known bits
This is the same change as D75824, but for two cases where
InstCombine performs the same optimization: Replacing an instruction
whose bits are fully known with a constant. This is not (generally)
legal for musttail calls.

Differential Revision: https://reviews.llvm.org/D76457
2020-03-20 10:17:09 +01:00
Florian Hahn be86bc76f0 [Matrix] Generalize ColumnMatrixTy to MatrixTy (NFC).
This patch sets the stage for supporting both row and column major
layouts for matrixes. It renames ColumnMatrixTy to MatrixTy, adds
booleans indicating the underlying layout to both MatrixTy and ShapeInfo
and generalizes the methods of MatrixTy to support both row and column
major layouts.

Reviewers: Gerolf, anemet, andrew.w.kaylor, LuoYuanke

Reviewed By: anemet

Differential Revision: https://reviews.llvm.org/D76324
2020-03-20 08:32:13 +00:00
Florian Hahn 3a8372ed02 [DSE] Support traversing MemoryPhis.
For MemoryPhis, we have to avoid that the MemoryPhi may be executed
before before the access we are currently looking at.

To do this we do a post-order numbering of the basic blocks in the
function and bail out once we reach a MemoryPhi with a larger (or equal)
post-order block number than the current MemoryAccess.
This changes the order in which we visit stores for elimination.

This patch also adds support for exploring multiple paths. We keep a worklist (ToCheck) of memory accesses that might be eliminated by our starting MemoryDef or MemoryPhis for further exploration.  For MemoryPhis, we add the incoming values to the worklist, for MemoryDefs we add the defining access.

Reviewers: dmgreen, rnk, efriedma, bryant, asbirlea

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D72148
2020-03-20 07:51:42 +00:00
Jun Ma 032251e34d [Coroutines] Fix PR45130
For now, when final suspend can be simplified by simplifySuspendPoint,
handleFinalSuspend is executed as well to remove last case in switch
instruction. This patch fixes it.

Differential Revision: https://reviews.llvm.org/D76345
2020-03-20 11:27:08 +08:00
Benjamin Kramer 1db8b341a6 [Matrix] Fold single-use variable into assert
Avoids -Wunused-variable warnings in Release builds.
2020-03-19 21:42:22 +01:00
Florian Hahn 796fb2e474 [Matrix] Move multiply-add code generation into separate function (NFC).
This logic can be shared with the tiled code generation.

Reviewers: anemet, Gerolf, hfinkel, andrew.w.kaylor, LuoYuanke

Reviewed By: anemet

Differential Revision: https://reviews.llvm.org/D75565
2020-03-19 20:26:19 +00:00
Kazu Hirata e23d786526 [JumpThreading] Fix infinite loop (PR44611)
Summary:
This patch fixes https://bugs.llvm.org/show_bug.cgi?id=44611 by
preventing an infinite loop in the jump threading pass when
-jump-threading-across-loop-headers is on.  Specifically, without this
patch, jump threading through two basic blocks would trigger on the
same area of the CFG over and over, resulting in an infinite loop.

Consider testcase PR44611-across-header-hang.ll in this patch.  The
first opportunity to thread through two basic blocks is:

  from bb_body2 through bb_header and bb_body1 to bb_body2.

The pass duplicates bb_header and bb_body1 as, say, bb_header.thread1
and bb_body1.thread1.  Since bb_header contains a successor edge back
to itself, bb_header.thread1 also contains a successor edge to
bb_header, immediately giving rise to the next jump threading
opportunity:

  from bb_header.thread1 through bb_header and bb_body1 to bb_body2.

After that, we repeatedly thread an incoming edge into bb_header
through bb_header and bb_body1 to bb_body2.  In other words, we keep
peeling one iteration from bb_header's self loop.

The patch fixes the problem by preventing the pass from duplicating a
basic block containing a self loop.

Reviewers: wmi, junparser, efriedma

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76390
2020-03-19 12:49:36 -07:00
Florian Hahn 0cc2d23751 [Matrix] Hoist load/store generation logic, add helpers for tiled access.
This patch slightly generalizes the code to emit loads and stores of a
matrix and adds helpers to load/store a tile of a larger matrix.

This will be used in a follow-up patch introducing initial tiling.

Reviewers: anemet, Gerolf, hfinkel, andrew.w.kaylor, LuoYuanke

Reviewed By: anemet

Differential Revision: https://reviews.llvm.org/D75564
2020-03-19 19:28:21 +00:00
Simon Pilgrim a11e5b32df [InstCombine][X86] simplifyX86immShift - handle variable out-of-range vector shift by immediate amounts (PR40391)
If we know the SSE shift amount is out of range then we can simplify to zero value (logical) or a 'signsplat' bitwidth-1 shift (arithmetic). This allows us to remove the equivalent ConstantInt constant folding path from simplifyX86immShift.
2020-03-19 18:27:31 +00:00
Simon Pilgrim 433897da4a [InstCombine][X86] simplifyX86immShift - convert variable in-range vector shift by immediate amounts to generic shifts (PR40391)
The slli/srli/srai 'immediate' vector shifts (although its not immediate anymore to match gcc) can be replaced with generic shifts if the shift amount is known to be in range.
2020-03-19 15:44:24 +00:00
Florian Hahn 4a58996dd2 [SCCP] Use constant ranges for PHI nodes.
For PHIs with multiple incoming values, we can improve precision by
using constant ranges for integers. We can over-approximate phis
by merging the incoming values.

Reviewers: davide, efriedma, mssimpso

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D71933
2020-03-19 12:45:33 +00:00
Florian Hahn 8a36594a7e [SCCP] Use constant ranges for binary operators.
If one of the operands of a binary operator is a constant range, we can
use ConstantRange::binaryOp to approximate the result.

We still handle single element constant ranges as we did previously,
with ConstantExpr::get(), because ConstantRange::binaryOp still gives
worse results in a few cases for single element ranges.

Also note that we bail out early if any of the operands is still unknown.

Reviewers: davide, efriedma, mssimpso

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D71936
2020-03-19 09:35:48 +00:00
Huihui Zhang 2ea5495759 [InstCombine][SVE] Fix InstCombiner::visitAllocaInst for scalable vector.
Summary:
DataLayout::getTypeAllocSize() return TypeSize. For cases where scalable
property doesn't matter (check for zero-sized alloca), we should explicitly
call getKnownMinSize() to avoid implicit type conversion to uint64_t, which is
invalid for scalable vector type.

Reviewers: sdesmalen, efriedma, spatel, apazos

Reviewed By: efriedma

Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76386
2020-03-18 20:57:14 -07:00
Florian Hahn fd2c15e602 [VPlan] Do not print mapping for Value2VPValue.
The latest improvements to VPValue printing make this mapping clear when
printing the operand. Printing the mapping separately is not required
any longer.

Reviewers: rengolin, hsaito, Ayal, gilr

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D76375
2020-03-18 21:44:07 +00:00
Florian Hahn 00c1cd1934 [VPlan] Record underlying value for VPValues created by addVPValue (NFC).
Now that printing VPValues uses the underlying IR value name, if
available, recording the underlying value here improves printing.

Reviewers: rengolin, hsaito, Ayal, gilr

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D76374
2020-03-18 21:30:58 +00:00
Eli Friedman e24e95fe90 Remove CompositeType class.
The existence of the class is more confusing than helpful, I think; the
commonality is mostly just "GEP is legal", which can be queried using
APIs on GetElementPtrInst.

Differential Revision: https://reviews.llvm.org/D75660
2020-03-18 13:53:17 -07:00
Florian Hahn e6a74803d4 [VPlan] Use underlying value for printing, if available.
When the an underlying value is available, we can use its name for
printing, as discussed in D73078.

Reviewers: rengolin, hsaito, Ayal, gilr

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D76200
2020-03-18 17:46:57 +00:00
Simon Pilgrim f4e495a18e [InstCombine][X86] simplifyX86varShift - convert variable in-range per-element shift amounts to generic shifts (PR40391)
AVX2/AVX512 per-element shifts can be replaced with generic shifts if the shift amounts are guaranteed to be in-range (upper bits are known zero).
2020-03-18 11:26:54 +00:00
Florian Hahn 5672ae8d86 [SCCP] Use constant ranges for select, if cond is overdefined.
For selects with an unknown condition, we can approximate the result by
merging the state of both options. This automatically takes care of
the case where on operand is undef.

Reviewers: davide, efriedma, mssimpso

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D71935
2020-03-18 09:26:02 +00:00
Michael Liao f2f8bdc2b1 Fix `-Wunused-variable` warning. NFC. 2020-03-17 20:15:50 -04:00
Florian Hahn a72ae99cf9 [SCCP] Split up callsite handling, only propagate result on change (NFC)
Functions include their arguments in the use-list. Changed function
values mean that the result of the function changed. We only need
to update the call sites with the new function result and do not
have to propagate the call arguments.

To do so, this patch splits up the visitCallSite into handleCallResult
and handleCallArguments and updates markUsersAsChanged to only update
call results for functions.

Reviewers: efriedma, davide

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D75846
2020-03-17 20:05:35 +00:00
Sanjay Patel be9e3d9416 [InstCombine] reduce demand-limited bool math to logic, part 2
Follow-on suggested in:
D75961
2020-03-17 15:18:18 -04:00
Tyker e8ac825f5b [AssumeBundles] Detection of Empty bundles
Summary: Prevent InstCombine from removing llvm.assume for which the arguement is true when they have operand bundles with usefull information.

Reviewers: jdoerfert, nikic, lebedev.ri

Reviewed By: jdoerfert

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76147
2020-03-17 15:50:15 +01:00
Florian Hahn 1d6f919df2 [SCCP] Explicitly mark values as overdefined (NFC).
This was part of D60582 but can be committed separately.
2020-03-17 12:13:30 +00:00
Roman Lebedev 398b497cd0
[NFC] LoopRotate: do issue debug message when not rotating due to instr count
It is somewhat problematic to notice this issue otherwise.
2020-03-17 09:26:09 +03:00
Serguei Katkov 80c351cdb6 [InstCombine] Transform to undef incorrect atomic unordered mem intrinsics
According to LangRef:
If len is not a positive integer multiple of element_size, then the behaviour of the intrinsic is undefined.

Add InstCombine rule to transform intrinsic to undef operation.

This is a follow-up for D76116.

Reviewers: reames
Reviewed By: reames
Subscribers: hiraditya, jfb, dantrushin, llvm-commits
Differential Revision: https://reviews.llvm.org/D76215
2020-03-17 10:20:16 +07:00
Matt Arsenault b0bdb186f5 Utils: Always set alignment when expanding mem intrinsics
This was creating natural aligned loads and stores, which may not be
the case. The target could request a wider type load with less
alignment.
2020-03-16 14:34:29 -04:00
Matt Arsenault 05e7d8d6ce TTI: Add addrspace parameters to memcpy lowering functions 2020-03-16 14:34:29 -04:00
Florian Hahn 4878aa36d4 [ValueLattice] Add new state for undef constants.
This patch adds a new undef lattice state, which is used to represent
UndefValue constants or instructions producing undef.

The main difference to the unknown state is that merging undef values
with constants (or single element constant ranges) produces  the
constant/constant range, assuming all uses of the merge result will be
replaced by the found constant.

Contrary, merging non-single element ranges with undef needs to go to
overdefined. Using unknown for UndefValues currently causes mis-compiles
in CVP/LVI (PR44949) and will become problematic once we use
ValueLatticeElement for SCCP.

Reviewers: efriedma, reames, davide, nikic

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D75120
2020-03-14 17:19:59 +00:00
Whitney Tsang aca7167535 [NFC][LoopUnrollAndJam] clang-format.
I am currently working on this file.
2020-03-14 00:04:10 +00:00
Akira Hatanaka c6f1713c46 [ObjC][ARC] Don't remove autoreleaseRV/retainRV pairs if the call isn't
a tail call

This reapplies the patch in https://reviews.llvm.org/rG1f5b471b8bf4,
which was reverted because it was causing crashes.

https://bugs.chromium.org/p/chromium/issues/detail?id=1061289#c2

Check that HasSafePathToCall is true before checking the call is a tail
call.

Original commit message:

Previosly ARC optimizer removed the autoreleaseRV/retainRV pair in the
following code, which caused the object returned by @something to be
placed in the autorelease pool because the call to @something isn't a
tail call:

```
  %call = call i8* @something(...)
  %2 = call i8* @objc_retainAutoreleasedReturnValue(i8* %call)
  %3 = call i8* @objc_autoreleaseReturnValue(i8* %2)
  ret i8* %3
```

Fix the bug by checking whether @something is a tail call.

rdar://problem/59275894
2020-03-13 13:52:14 -07:00
Alexey Zhikhartsev f71abec661 [LoopInterchange] Fix interchanging contents of preheader BBs
Summary:
Previously LCSSA was getting broken by placing instructions into the
(newly) inner *header* instead of the *pre*header.

Fixes PR43474

Reviewers: fhahn

Reviewed By: fhahn

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75943
2020-03-13 15:59:37 -04:00
Reid Kleckner 478b06e687 Revert "[ObjC][ARC] Check the basic block size before calling DominatorTree::dominate"
This reverts commit 5c3117b0a9

This should not be necessary after
7593a480db, and Florian Hahn has confirmed
that the problem no longer reproduces with this patch.

I happened to notice this code because the FIXME talks about
OrderedBasicBlock.

Reviewed By: fhahn, dexonsmith

Differential Revision: https://reviews.llvm.org/D76075
2020-03-13 11:57:55 -07:00
Huihui Zhang fc1f205745 [SLPVectorizer][SVE] Bail out early for scalable vector.
Summary:
SLPVectorizer try to vectorize list of scalar instructions of the same type,
instructions already vectorized are rejected through isValidElementType().

Without this patch, tryToVectorizeList() will first try to determine vectorization
factor of a list of Instructions before checking whether each instruction has unsupported
type or not. For instructions already vectorized for SVE, it will crash at getVectorElementSize(),
where it try to return a fixed size.

This patch make sure invalid element types are rejected before trying to get vectorization
factor. This make sure we are not trying to vectorize instructions already vectorized.

Reviewers: sdesmalen, efriedma, spatel, RKSimon, ABataev, apazos, rengolin

Reviewed By: efriedma

Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76017
2020-03-13 11:23:31 -07:00
Sanjay Patel 94f5d73182 [SimplifyCFG] fix formatting; NFC 2020-03-13 14:12:28 -04:00
Sanjay Patel 51e53af11c [SimplifyCFG] fix debug print formatting; NFC 2020-03-13 14:12:28 -04:00
Florian Hahn 0c5b6e2ea5 Recommit "[SCCP] Use ValueLatticeElement instead of LatticeVal (NFCI)"
This patch should fix the cause of the stage2 failures and
PR45185.

This reverts the revert commit c52f839e72.
2020-03-13 17:03:22 +00:00
Tyker 69375fd0a3 [AssumeBundles] Preserve Information in the inliner
Summary:
during inling Create and insert an llvm.assume with attributes to preserve them.
to prevent any changes for now generation of llvm.assume is under a flag disabled by default.

Reviewers: jdoerfert

Reviewed By: jdoerfert

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75825
2020-03-13 17:35:47 +01:00
omarahmed1111 b285b333dc [Attributor] Detect possibly unbounded cycles in functions
This patch add mayContainUnboundedCycle helper function which checks whether a function has any cycle which we don't know if it is bounded or not.
Loops with maximum trip count are considered bounded, any other cycle not.
It also contains some fixed tests and some added tests contain bounded and
unbounded loops and non-loop cycles.

Reviewed By: jdoerfert, uenoku, baziotis

Differential Revision: https://reviews.llvm.org/D74691
2020-03-13 11:17:33 -05:00
Pankaj Gode bf990530ae [Attributor] Improve noalias preservation using reachability
Resolution for below fixme:
(ii) Check whether the value is captured in the scope using AANoCapture.
FIXME: This is conservative though, it is better to look at CFG and
             check only uses possibly executed before this callsite.

Propagates caller argument's noalias attribute to callee.

Reviewed by: jdoerfert, uenoku

Reviewers: jdoerfert, sstefan1, uenoku

Subscribers: uenoku, sstefan1, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D71617
2020-03-13 21:09:08 +05:30
Sanjay Patel cbeffa3f6c [SimplifyCFG] convert if-else chain to switch; NFC
Fix formatting of related function names while changing the code.
2020-03-13 10:28:41 -04:00
Nico Weber 86eb2c3991 Revert "[ObjC][ARC] Don't remove autoreleaseRV/retainRV pairs if the call isn't"
This reverts commit 1f5b471b8b.
Causes asserts when building code with arc. See
https://bugs.chromium.org/p/chromium/issues/detail?id=1061289#c2
for a full repro. Will post a creduced repro once creduce is done
running.
2020-03-13 10:16:02 -04:00
Johannes Doerfert a198adb490 [Attributor] IPO across definition boundary of a function marked alwaysinline
Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D75590
2020-03-13 01:06:12 -05:00
rathod-sahaab 263c4a3c75 Fix compiler warning when compiling without asserts
This patch aims to prevent warning-as-error failures in release build.
As suggested in this comment
https://reviews.llvm.org/D69930#1910922

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D75970
2020-03-13 00:26:49 -05:00
Huihui Zhang 118abf2017 [SVE] Update API ConstantVector::getSplat() to use ElementCount.
Summary:
Support ConstantInt::get() and Constant::getAllOnesValue() for scalable
vector type, this requires ConstantVector::getSplat() to take in 'ElementCount',
instead of 'unsigned' number of element count.

This change is needed for D73753.

Reviewers: sdesmalen, efriedma, apazos, spatel, huntergr, willlovett

Reviewed By: efriedma

Subscribers: tschuett, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D74386
2020-03-12 13:22:41 -07:00
Florian Hahn c52f839e72 Revert "[SCCP] Use ValueLatticeElement instead of LatticeVal (NFCI)"
This commit is likely causing clang-with-lto-ubuntu to fail
http://lab.llvm.org:8011/builders/clang-with-lto-ubuntu/builds/16052

Also causes PR45185.

This reverts commit f1ac5d2263.
2020-03-12 18:49:11 +00:00
Hideto Ueno d9bf79f4e9 [Attributor][FIX] Add a missing dependence track in noalias deduction 2020-03-12 15:27:35 +00:00
Florian Hahn f1ac5d2263 [SCCP] Use ValueLatticeElement instead of LatticeVal (NFCI)
This patch switches SCCP to use ValueLatticeElement for lattice values,
instead of the local LatticeVal, as first step to enable integer range support.

This patch does not make use of constant ranges for additional operations
and the only difference for now is that integer constants are represented by
single element ranges. To preserve the existing behavior, the following helpers
are used

* isConstant(LV): returns true when LV is either a constant or a constant range with a single element. This should return true in the same cases where LV.isConstant() returned true previously.
* getConstant(LV): returns a constant if LV is either a constant or a constant range with a single element. This should return a constant in the same cases as LV.getConstant() previously.
* getConstantInt(LV): same as getConstant, but additionally casted to ConstantInt.

Reviewers: davide, efriedma, mssimpso

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D60582
2020-03-12 12:03:06 +00:00
Max Kazantsev 3dc6e53c97 [LoopPeel] Turn incorrect assert into a check
Summary:
This patch replaces incorrectt assert with a check. Previously it asserts that
if SCEV cannot prove `isKnownPredicate(A != B)`, then it should be able to prove
`isKnownPredicate(A == B)`.

Both these fact may be not provable. It is shown in the provided test:

Could not prove: `{-294,+,-2}<%bb1> !=  0`
Asserting: `{-294,+,-2}<%bb1> == 0`

Obviously, this SCEV is not equal to zero, but 0 is in its range so we cannot
also prove that it is not zero.

Instead of assert, we should be checking the required conditions explicitly.

Reviewers: lebedev.ri, fhahn, sanjoy, fedor.sergeev
Reviewed By: lebedev.ri
Subscribers: hiraditya, zzheng, javed.absar, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76050
2020-03-12 17:23:07 +07:00
Sanjay Patel fae900921b [InstCombine] reduce demand-limited bool math to logic
The cmp math test is inspired by memcmp() patterns seen in D75840.
I know there's at least 1 related fold we can do here if both
values are sext'd, but I'm not seeing a way to generalize further.

We have some other bool math patterns that we want to reduce, but
that might require fixing the bogus transforms noted in D72396.

Alive proof translations of the regression tests:
https://rise4fun.com/Alive/zGWi

  Name: demand add 1
  %xz = zext i1 %x to i32
  %ys = sext i1 %y to i32
  %sub = add i32 %xz, %ys
  %r = lshr i32 %sub, 31
  =>
  %notx = xor i1 %x, 1
  %and = and i1 %y, %notx
  %r = zext i1 %and to i32

  Name: demand add 2
  %xz = zext i1 %x to i5
  %ys = sext i1 %y to i5
  %sub = add i5 %xz, %ys
  %r = and i5 %sub, 16
  =>
  %notx = xor i1 %x, 1
  %and = and i1 %y, %notx
  %r = select i1 %and, i5 -16, i5 0

  Name: demand add 3
  %xz = zext i1 %x to i8
  %ys = sext i1 %y to i8
  %a = add i8 %ys, %xz
  %r = ashr i8 %a, 7
  =>
  %notx = xor i1 %x, 1
  %and = and i1 %y, %notx
  %r = sext i1 %and to i8

  Name: cmp math
  %gt = icmp ugt i32 %x, %y
  %lt = icmp ult i32 %x, %y
  %xz = zext i1 %gt to i32
  %yz = zext i1 %lt to i32
  %s = sub i32 %xz, %yz
  %r = lshr i32 %s, 31
  =>
  %r = zext i1 %lt to i32

Differential Revision: https://reviews.llvm.org/D75961
2020-03-11 15:45:58 -04:00
Florian Hahn bc6c8c4bbb [Matrix] Add remark propagation along the inlined-at chain.
This patch adds support for propagating matrix expressions along the
inlined-at chain and emitting remarks at the traversed function scopes.

To motivate this new behavior, consider the example below. Without the
remark 'up-leveling', we would only get remarks in load.h and store.h,
but we cannot generate a remark describing the full expression in
toplevel.cpp, which is the place where the user has the best chance of
spotting/fixing potential problems.

With this patch, we generate a remark for the load in load.h, one for
the store in store.h and one for the complete expression in
toplevel.cpp. For a bigger example, please see remarks-inlining.ll.

    load.h:
    template <typename Ty, unsigned R, unsigned C> Matrix<Ty, R, C> load(Ty *Ptr) {
      Matrix<Ty, R, C> Result;
      Result.value = *reinterpret_cast <typename Matrix<Ty, R, C>::matrix_t *>(Ptr);
      return Result;
    }

    store.h:
    template <typename Ty, unsigned R, unsigned C> void store(Matrix<Ty, R, C> M1, Ty *Ptr) {
       *reinterpret_cast<typename decltype(M1)::matrix_t *>(Ptr) = M1.value;
    }

    toplevel.cpp
    void test(double *A, double *B, double *C) {
      store(add(load<double, 3, 5>(A), load<double, 3, 5>(B)), C);
    }

For a given function, we traverse the inlined-at chain for each
matrix instruction (= instructions with shape information). We collect
the matrix instructions in each DISubprogram we visit. This produces a
mapping of DISubprogram -> (List of matrix instructions visible in the
subpogram). We then generate remarks using the list of instructions for
each subprogram in the inlined-at chain. Note that the list of instructions
for a subprogram includes the instructions from its own subprograms
recursively. For example using the example above, for the subprogram
'test' this includes inline functions 'load' and 'store'. This allows
surfacing the remarks at a level useful to users.

Please note that the current approach may create a lot of extra remarks.
Additional heuristics to cut-off the traversal can be implemented in the
future. For example, it might make sense to stop 'up-leveling' once all
matrix instructions are at the same debug location.

Reviewers: anemet, Gerolf, thegameg, hfinkel, andrew.w.kaylor, LuoYuanke

Reviewed By: anemet

Differential Revision: https://reviews.llvm.org/D73600
2020-03-11 17:40:08 +00:00
Anna Welker a6d3bec83f [TTI][ARM][MVE] Refine gather/scatter cost model
Refines the gather/scatter cost model, but also changes the TTI
function getIntrinsicInstrCost to accept an additional parameter
which is needed for the gather/scatter cost evaluation.
This did require trivial changes in some non-ARM backends to
adopt the new parameter.
Extending gathers and truncating scatters are now priced cheaper.

Differential Revision: https://reviews.llvm.org/D75525
2020-03-11 10:23:41 +00:00
Fangrui Song a0c0389ffb [SimplifyLibcalls] Don't replace locked IO (fgetc/fgets/fputc/fputs/fread/fwrite) with unlocked IO (*_unlocked)
This essentially reverts some of the SimplifyLibcalls part changes of D45736 [SimplifyLibcalls] Replace locked IO with unlocked IO.

C11 7.21.5.2 The fflush function

> If stream is a null pointer, the fflush function performs this flushing action on all streams for which the behavior is defined above.

i.e. fopen'ed FILE* is inherently captured.

POSIX.1-2017 getc_unlocked, getchar_unlocked, putc_unlocked, putchar_unlocked - stdio with explicit client locking

> These functions can safely be used in a multi-threaded program if and only if they are called while the invoking thread owns the ( FILE *) object, as is the case after a successful call to the flockfile() or ftrylockfile() functions.

After a thread fopen'ed a FILE*, when it is calling foobar() which is now replaced by foobar_unlocked(),
if another thread is concurrently calling fflush(0), the behavior is undefined.

C11 7.22.4.4 The exit function

> Next, all open streams with unwritten buffered data are flushed, all open streams are closed, and all files created by the tmpfile function are removed.

The replacement is only feasible if the program is single threaded, or exit or fflush(0) is never called.
See also http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20180528/556615.html
for how the replacement makes libc interceptors difficult to implement.

dalias: in a worst case, it's unbounded data corruption because of concurrent access to pointers
without synchronization.  f->wpos or rpos could get outside of the buffer, thread A could do
f->wpos += j after knowing j is in bounds, while thread B also changes it concurrently.

This can produce exploitable conditions depending on libc internals.

Revert the SimplifyLibcalls part change because the cons obviously
overweigh the pros.  Even when the replacement is feasible, the benefit
is indemonstrable, more so in an application instead of an artificial
glibc benchmark.  Theoretically the replacement could be beneficial when
calling getc_unlocked/putc_unlocked in a loop, but then it is better
using a blocked IO operation and the user is likely aware of that.

The function attribute inference is still useful and thus kept.

Reviewed By: xbolva00

Differential Revision: https://reviews.llvm.org/D75933
2020-03-10 11:11:58 -07:00
Benjamin Kramer 247a177cf7 Give helpers internal linkage. NFC. 2020-03-10 18:27:42 +01:00
Tyker a4cde9ad7b Fixed [AssumeBundles] Move to IR so it can be used by Analysis
This is a recommit of 57c964aaa7
after fixing modules build.
2020-03-10 18:02:39 +01:00
Simon Moll d871ef4e6a [instcombine] remove fsub to fneg hacks; only emit fneg
Summary: Rewrite the fsub-0.0 idiom to fneg and always emit fneg for fp
negation. This also extends the scalarization cost in instcombine for unary
operators to result in the same IR rewrites for fneg as for the idiom.

Reviewed By: cameron.mcinally

Differential Revision: https://reviews.llvm.org/D75467
2020-03-10 16:57:02 +01:00
Florian Hahn c8c14d979a [InstCombine] Support vectors in SimplifyAddWithRemainder.
SimplifyAddWithRemainder currently also matches for vector types, but
tries to create an integer constant, which causes a crash.

By using Constant::getIntegerValue() we can support both the scalar and
vector cases.

The 2 added test cases crash without the fix.

Reviewers: spatel, lebedev.ri

Reviewed By: spatel, lebedev.ri

Differential Revision: https://reviews.llvm.org/D75906
2020-03-10 14:29:40 +00:00
Jonas Paulsson c2dafe12dc [SimplifyCFG] Skip merging return blocks if it would break a CallBr.
SimplifyCFG should not merge empty return blocks and leave a CallBr behind
with a duplicated destination since the verifier will then trigger an
assert. This patch checks for this case and avoids the transformation.

CodeGenPrepare has a similar check which also has a FIXME comment about why
this is needed. It seems perhaps better if these two passes would eventually
instead update the CallBr instruction instead of just checking and avoiding.

This fixes https://bugs.llvm.org/show_bug.cgi?id=45062.

Review: Craig Topper

Differential Revision: https://reviews.llvm.org/D75620
2020-03-10 14:59:13 +01:00
Sanjay Patel 467eec0910 [InstCombine] fold gep-of-select-of-constants (PR45084)
As shown in:
https://bugs.llvm.org/show_bug.cgi?id=45084
...we failed to combine a gep with constant indexes with a
pointer operand that is a select of constants.

Differential Revision: https://reviews.llvm.org/D75807
2020-03-10 09:25:13 -04:00
Florian Hahn 2d6ecf4648 [SLP] Support vectorizing functions provided by vector libs.
It seems like the SLPVectorizer is currently not aware of vector
versions of functions provided by libraries like Accelerate [1].
This patch updates SLPVectorizer to use the same infrastructure
the LoopVectorizer uses to detect vectorizable library functions.

For calls, it computes the cost of an intrinsic call (existing behavior)
and the cost of a vector function library call, if available. Like
LoopVectorizer, it assumes the cost of the vector function is simply the
cost of a call to a vector function.

[1] https://developer.apple.com/documentation/accelerate

Reviewers: ABataev, RKSimon, spatel

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D75878
2020-03-10 13:10:50 +00:00
ahatanak 1f5b471b8b [ObjC][ARC] Don't remove autoreleaseRV/retainRV pairs if the call isn't
a tail call

Previosly ARC optimizer removed the autoreleaseRV/retainRV pair in the
following code, which caused the object returned by @something to be
placed in the autorelease pool because the call to @something isn't a
tail call:

```
  %call = call i8* @something(...)
  %2 = call i8* @objc_retainAutoreleasedReturnValue(i8* %call)
  %3 = call i8* @objc_autoreleaseReturnValue(i8* %2)
  ret i8* %3
```

Fix the bug by checking whether @something is a tail call.

rdar://problem/59275894
2020-03-09 13:21:38 -07:00
Nikita Popov c3ca6876ed [InstCombine] Don't simplify calls without uses
When simplifying a call without uses, replaceInstUsesWith() is
going to do nothing, but we'll skip all following folds. We can
only run into this problem with calls that both simplify and are
not trivially dead if unused, which currently seems to happen only
with calls to undef, as the test diff shows. When extending
SimplifyCall() to handle "returned" attributes, this becomes a much
bigger problem, so I'm fixing this first.

Differential Revision: https://reviews.llvm.org/D75814
2020-03-09 18:47:46 +01:00
Jonas Devlieghere 882f589e20 Revert "[AssumeBundles] Move to IR so it can be used by Analysis"
This breaks the modules build:

http://green.lab.llvm.org/green/job/clang-stage2-Rthinlto/
http://green.lab.llvm.org/green/view/LLDB/job/lldb-cmake/

This reverts commit 57c964aaa7.
2020-03-09 09:02:47 -07:00
evgeny 6d2032e259 [WPD] Provide a way to prevent functions from being devirtualized
Differential revision: https://reviews.llvm.org/D75617
2020-03-09 14:05:15 +03:00
Hideto Ueno bdcbdb4848 [Attributor] Deduction based on path exploration
This patch introduces the propagation of known information based on path exploration.
For example,
```
int u(int c, int *p){
  if(c) {
     return *p;
  } else {
     return *p + 1;
  }
}
```
An argument `p` is dereferenced whatever c's value is.

For an instruction `CtxI`, we accumulate branch instructions in the must-be-executed-context of `CtxI` and then, we take the conjunction of the successors' known state.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D65593
2020-03-09 14:29:26 +09:00
Sanjay Patel a69158c12a [VectorCombine] fold extract-extract-op with different extraction indexes
opcode (extelt V0, Ext0), (ext V1, Ext1) --> extelt (opcode (splat V0, Ext0), V1), Ext1

The first part of this patch generalizes the cost calculation to accept
different extraction indexes. The second part creates a shuffle+extract
before feeding into the existing code to create a vector op+extract.

The patch conservatively uses "TargetTransformInfo::SK_PermuteSingleSrc"
rather than "TargetTransformInfo::SK_Broadcast" (splat specifically
from element 0) because we do not have a more general "SK_Splat"
currently. That does not affect any of the current regression tests,
but we might be able to find some cost model target specialization where
that comes into play.

I suspect that we can expose some missing x86 horizontal op codegen with
this transform, so I'm speculatively adding a debug flag to disable the
binop variant of this transform to allow easier testing.

The test changes show that we're sensitive to cost model diffs (as we
should be), so that means that patches like D74976
should have better coverage.

Differential Revision: https://reviews.llvm.org/D75689
2020-03-08 09:57:55 -04:00
Tyker 57c964aaa7 [AssumeBundles] Move to IR so it can be used by Analysis
Summary:
Assume bundles need to be usable by Analysis and Transforms/Utils isn't.
so this commit moves utilities to deal with asusme bundles to IR.

Reviewers: jdoerfert

Reviewed By: jdoerfert

Subscribers: mgorny, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75618
2020-03-08 12:21:50 +01:00
Tyker 84056394e9 [AssumeBundles] Add API to query a bundles from a use
Summary: Finding what information is know about a value from a use is generally useful and can be done quickly.

Reviewers: jdoerfert

Reviewed By: jdoerfert

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75616
2020-03-08 12:04:23 +01:00
Nikita Popov 51a466a61f [InstCombine] Fix known bits handling in SimplifyDemandedUseBits
Fixes a regression from D75801. SimplifyDemandedUseBits() is also
supposed to compute the known bits (of the demanded subset) of the
instruction. For unknown instructions it does so by directly calling
computeKnownBits(). For known instructions it will compute known
bits itself. However, for instructions where only some cases are
handled directly (e.g. a constant shift amount) the known bits
invocation for the unhandled case is sometimes missing. This patch
adds the missing calls and thus removes the main discrepancy with
ExpensiveCombines mode.

Differential Revision: https://reviews.llvm.org/D75804
2020-03-07 18:16:41 +01:00
Stefanos Baziotis 01c48d7d11 [Attributor] Fold terminators before changing instructions to unreachable
It is possible that an instruction to be changed to unreachable is
in the same block with a terminator that can be constant-folded.
In this case, as of now, the instruction will be changed to
unreachable before the terminator is folded. But, then the
whole BB becomes invalidated and so when we go ahead to fold
the terminator, we trap.

Change the order of these two.

Differential Revision: https://reviews.llvm.org/D75780
2020-03-07 12:38:44 +02:00
Andrew Monshizadeh c5a06019d2 Extend TimeTrace to LLVM's new pass manager
With the addition of the LLD time tracing it made sense to include coverage
for LLVM's various passes. Doing so ensures that ThinLTO is also covered
with a time trace.

Before:
{F11333974}

After:
{F11333928}

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D74516
2020-03-06 14:45:19 -08:00
Anna Thomas 59029b9eef [RS4GC] Handle uses of extractelement for conversion from vector to scalar base
As mentioned in the comments, extractelement is special
since we actually want a scalar base for that element we extracted from
the vector (i.e. not a vector base).
This same logic should apply to uses of the extractelement such as phis
and selects which have the same BDV as the extractelement.
Howeber, for these uses we conservatively mark the BDV state as
conflict, since setting the EE's new base BDV does not always dominate
these uses.

Added testcase showcases the problem where the BDV identification chokes
on the incorrect cast from vector to scalar for the phi use of
extractelement.

Tests-Run: make check, internal fuzzer testing

Reviewers: reames, skatkov, dantrushin
Reviewed-By: dantrushin

Differential Revision: https://reviews.llvm.org/D75704
2020-03-06 16:28:49 -05:00
Roman Lebedev 1badf7c33a
[InstComine] Forego of one-use check in `(X - (X & Y)) --> (X & ~Y)` if Y is a constant
Summary:
This is potentially more friendly for further optimizations,
analysies, e.g.: https://godbolt.org/z/G24anE

This resolves phase-ordering bug that was introduced
in D75145 for https://godbolt.org/z/2gBwF2
https://godbolt.org/z/XvgSua

Reviewers: spatel, nikic, dmgreen, xbolva00

Reviewed By: nikic, xbolva00

Subscribers: hiraditya, zzheng, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75757
2020-03-06 21:39:07 +03:00
Jay Foad 11d1573bb6 [APFloat] Make use of new overloaded comparison operators. NFC.
Reviewers: ekatz, spatel, jfb, tlively, craig.topper, RKSimon, nikic, scanon

Subscribers: arsenm, jvesely, nhaehnle, hiraditya, dexonsmith, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75744
2020-03-06 16:42:53 +00:00
Fangrui Song 952ee0df9e ThinLTOBitcodeWriter: drop dso_local when a GlobalVariable is converted to a declaration
If we infer the dso_local flag for -fpic, dso_local should be dropped
when we convert a GlobalVariable a declaration. dso_local causes the
generation of direct access (e.g. R_X86_64_PC32). Such relocations referencing
STB_GLOBAL STV_DEFAULT objects are not allowed in a -shared link.

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D74749
2020-03-05 18:09:33 -08:00
Zhongduo Lin eae228a292 [IndVarSimplify] Extend previous special case for load use instruction to any narrow type loop variant to avoid extra trunc instruction
Summary:
The widenIVUse avoids generating trunc by evaluating the use as AddRec, this
will not work when:
   1) SCEV traces back to an instruction inside the loop that SCEV can not
expand, eg. add %indvar, (load %addr)
   2) SCEV finds a loop variant, eg. add %indvar, %loopvariant

While SCEV fails to avoid trunc, we can still try to use instruction
combining approach to prove trunc is not required. This can be further
extended with other instruction combining checks, but for now we handle the
following case (sub can be "add" and "mul", "nsw + sext" can be "nus + zext")
```
Src:
  %c = sub nsw %b, %indvar
  %d = sext %c to i64
Dst:
  %indvar.ext1 = sext %indvar to i64
  %m = sext %b to i64
  %d = sub nsw i64 %m, %indvar.ext1
```
Therefore, as long as the result of add/sub/mul is extended to wide type with
right extension and overflow wrap combination, no
trunc is required regardless of how %b is generated. This pattern is common
when calculating address in 64 bit architecture.

Note that this patch reuse almost all the code from D49151 by @az:
https://reviews.llvm.org/D49151

It extends it by providing proof of why trunc is unnecessary in more general case,
it should also resolve some of the concerns from the following discussion with @reames.

http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20180910/585945.html

Reviewers: sanjoy, efriedma, sebpop, reames, az, javed.absar, amehsan

Reviewed By: az, amehsan

Subscribers: hiraditya, llvm-commits, amehsan, reames, az

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73059
2020-03-05 16:27:59 -05:00
Hiroshi Yamauchi 76b9901fb1 [PGO][PGSO] Use IsColdXNthPercentile for sample PGO.
Summary:
This performs better for sample PGO.
NFC as PGSOColdCodeOnlyForSamplePGO is still true.

Reviewers: davidxl

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75550
2020-03-05 09:54:54 -08:00
Florian Hahn 40e7bfc424 [VPlan] Use consecutive numbers to print VPValues instead of addresses.
Currently when printing VPValues we use the object address, which makes
it hard to distinguish VPValues as they usually are large numbers with
varying distance between them.

This patch adds a simple slot tracker, similar to the ModuleSlotTracker
used for IR values. In order to dump a VPValue or anything containing a
VPValue, a slot tracker for the enclosing VPlan needs to be created. The
existing VPlanPrinter can take care of that for the existing code. We
assign consecutive numbers to each VPValue we encounter in a reverse
post order traversal of the VPlan.

Reviewers: rengolin, hsaito, fhahn, Ayal, dorit, gilr

Reviewed By: gilr

Differential Revision: https://reviews.llvm.org/D73078
2020-03-05 14:55:15 +00:00
Simon Pilgrim 01a91a6de7 Fix static analyzer uninitialized variable warning. NFCI. 2020-03-05 14:22:24 +00:00
Jun Ma b10deb9487 [Coroutines] Optimized coroutine elision based on reachability
Differential Revision: https://reviews.llvm.org/D75440
2020-03-05 14:43:50 +08:00
Sameer Sahasrabuddhe 42febbab91 StructurizeCFG: simplify phi nodes when possible
After structurization, some phi nodes can have a single incoming edge
and can be simplified away. This change runs a simplify query on all
phis that are either modified or added by the structurizer. This also
moves some phis closer to their use as a side benefit.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D75500
2020-03-05 10:33:15 +05:30
Guozhi Wei ee9a3eba76 [CodeGenPrepare] Handle ExtractValueInst in dupRetToEnableTailCallOpts
As the test case shows if there is an ExtractValueInst in the Ret block, function dupRetToEnableTailCallOpts can't duplicate it into the block containing call. So later no tail call is generated in CodeGen.

    This patch adds the ExtractValueInst handling code in function dupRetToEnableTailCallOpts and FoldReturnIntoUncondBranch, and later tail call can be generated for this case.

Differential Revision: https://reviews.llvm.org/D74242
2020-03-04 11:10:32 -08:00
David Green 38e532278e [LSR] Add masked load and store handling
This teaches Loop Strength Reduction the details about masked load and
store address operands, so that it can have a better time optimising
them as it would for normal loads and stores.

Differential Revision: https://reviews.llvm.org/D75371
2020-03-04 18:36:10 +00:00
Nikita Popov 293d813020 [InstCombine] Don't explicitly invoke const folding in shift combine
InstCombine uses an IRBuilder that automatically performs
target-dependent constant folding, so explicitly invoking it here
is not necessary.
2020-03-04 18:33:00 +01:00
Nikita Popov 9b5de84e27 [InstCombine] Use IRBuilder to create bitcast
This makes sure that the constant expression bitcast goes through
target-dependent constant folding, and thus avoids an additional
iteration of InstCombine.
2020-03-04 18:28:38 +01:00
Nikita Popov 0e890cd4d4 [ConstantFolding] Always return something from ConstantFoldConstant
Spin-off from D75407. As described there, ConstantFoldConstant()
currently returns null for non-ConstantExpr/ConstantVector inputs,
but otherwise always returns non-null, independently of whether
any folding has happened or not.

This is confusing and makes consumer code more complicated.
I would expect either that ConstantFoldConstant() returns only if
it actually folded something, or that it always returns non-null.
I'm going to the latter possibility here, which appears to be more
useful considering existing usage.

Differential Revision: https://reviews.llvm.org/D75543
2020-03-04 18:24:47 +01:00
Sanjay Patel 71a316883d [PassManager] adjust VectorCombine placement
The initial placement of vector-combine in the opt pipeline revealed phase ordering bugs:
https://bugs.llvm.org/show_bug.cgi?id=45015
https://bugs.llvm.org/show_bug.cgi?id=42022

This patch contains a few independent changes:

1. Move the pass up in the pipeline, so it happens just after loop-vectorization.
   This is only to keep vectorization passes together in the pipeline at the moment.
   I don't have evidence of interaction between these yet.
2. Add an -early-cse pass after -vector-combine to clean up redundant ops. This was
   partly proposed as far back as rL219644 (which is why it's effectively being moved
   in the old PM code). This is important because the subsequent -instcombine doesn't
   work as well without EarlyCSE. With the CSE, -instcombine is able to squash
   shuffles together in 1 of the tests (because those are simple "select" shuffles).
3. Remove the -vector-combine pass that was running after SLP. We may want to do that
   eventually, but I don't have a test case to support it yet.

Differential Revision: https://reviews.llvm.org/D75145
2020-03-04 11:10:49 -05:00
Matt Arsenault f9047ede58 LICM: Reorder condition checks
Check the fast math flag before the more expensive loop check.
2020-03-03 17:15:57 -05:00
Brian Gesiak aa85b437a9 [Coroutines] Use dbg.declare for frame variables
Summary:
https://gist.github.com/modocache/ed7c62f6e570766c0f39b35dad675c2f
is an example of a small C++ program that uses C++20 coroutines that
is difficult to debug, due to the loss of debug info for variables that
"spill" across coroutine suspension boundaries. This patch addresses
that issue by inserting 'llvm.dbg.declare' intrinsics that point the
debugger to the variables' location at an offset to the coroutine frame.

With this patch, I confirmed that running the 'frame variable' commands in
https://gist.github.com/modocache/ed7c62f6e570766c0f39b35dad675c2f at
the specified breakpoints results in the correct values being printed
for coroutine frame variables 'i' and 'j' when using an lldb built from
trunk, as well as with gdb 8.3 (lldb 9.0.1, however, could not print the
values). The added test case also verifies this improved behavior.

The existing coro-debug.ll test case is also modified to reflect the
locations at which Clang actually places calls to 'dbg.declare', and
additional checks are added to ensure this patch works as intended in that
example as well.

Reviewers: vsk, jmorse, GorNishanov, lewissbaker, wenlei

Subscribers: EricWF, aprantl, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75338
2020-03-03 17:13:46 -05:00
Tyker c5ec8890c9 [NFC] Try fix ubsan buildbot after 876d133789 2020-03-03 17:53:02 +01:00
Tyker 876d133789 [AssumeBundles] Add API to fill a map from operand bundles of an llvm.assume.
Summary: This patch adds a new way to query operand bundles of an llvm.assume that is much better suited to some users like the Attributor that need to do many queries on the operand bundles of llvm.assume. Some modifications of the IR like replaceAllUsesWith can cause information in the map to be outdated, so this API is more suited to analysis passes and passes that don't make modification that could invalidate the map.

Reviewers: jdoerfert, sstefan1, uenoku

Reviewed By: jdoerfert

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75020
2020-03-03 14:22:52 +01:00
Florian Hahn 05afa55521 [VPlan] Add getPlan() to VPBlockBase.
This patch adds a getPlan accessor to VPBlockBase, which finds the entry
block of the plan containing the block and returns the plan set for this
block.

VPBlockBase contains a VPlan pointer, but it should only be set for
the entry block of a plan. This allows moving blocks without updating
the pointer for each moved block and in the future we might introduce a
parent relationship between plans and blocks, similar to the one in LLVM IR.

Reviewers: rengolin, hsaito, fhahn, Ayal, dorit, gilr

Reviewed By: gilr

Differential Revision: https://reviews.llvm.org/D74445
2020-03-03 13:20:13 +00:00
David Green ec7e4a9a80 [LoopVectorizer] Add reduction tests for inloop reductions. NFC
Also adds a force-reduction-intrinsics option for testing, for forcing
the generation of reduction intrinsics even when the backend is not
requesting them.
2020-03-03 10:54:00 +00:00
Alok Kumar Sharma 6f029dadf6 [DebugInfo] Avoid generating duplicate llvm.dbg.value
Summary:
This is to avoid generating duplicate llvm.dbg.value instrinsic if it already exists after the Instruction.

Before inserting llvm.dbg.value instruction, LLVM checks if the same instruction is already present before the instruction to avoid duplicates.
Currently it misses to check if it already exists after the instruction.
flang generates IR like this.

%4 = load i32, i32* %i1_311, align 4, !dbg !42
call void @llvm.dbg.value(metadata i32 %4, metadata !35, metadata !DIExpression()), !dbg !33
When this IR is processed in llvm, it ends up inserting duplicates.
%4 = load i32, i32* %i1_311, align 4, !dbg !42
call void @llvm.dbg.value(metadata i32 %4, metadata !35, metadata !DIExpression()), !dbg !33
call void @llvm.dbg.value(metadata i32 %4, metadata !35, metadata !DIExpression()), !dbg !33
We have now updated LdStHasDebugValue to include the cases when instruction is already
followed by same dbg.value instruction we intend to insert.

Now,

Definition and usage of function LdStHasDebugValue are deleted.
RemoveRedundantDbgInstrs is called for the cleanup of duplicate dbg.value's

Testing:
Added unit test for validation
check-llvm
check-debuginfo (the debug info integration tests)

Reviewers: aprantl, probinson, dblaikie, jmorse, jini.susan.george
SouraVX, awpandey, dstenb, vsk

Reviewed By: aprantl, jmorse, dstenb, vsk

Differential Revision: https://reviews.llvm.org/D74030
2020-03-03 09:56:45 +05:30
Juneyoung Lee 9f1f244d3c [LICM] Allow freeze to hoist/sink out of a loop
Summary: This patch allows LICM to hoist/sink freeze instructions out of a loop.

Reviewers: reames, fhahn, efriedma

Reviewed By: reames

Subscribers: jfb, lebedev.ri, hiraditya, asbirlea, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75400
2020-03-03 12:29:39 +09:00
Sumanth Gundapaneni 9897daa6bf Update LSR's logic that identifies a post-increment SCEV value.
One of the checks has been removed as it seem invalid.
The LoopStep size is always almost a 32-bit.

Differential Revision: https://reviews.llvm.org/D75079
2020-03-02 16:34:18 -06:00
Teresa Johnson 80bf137fa1 Revert "Restore "[WPD/LowerTypeTests] Delay lowering/removal of type tests until after ICP""
This reverts commit 80d0a137a5, and the
follow on fix in 873c0d0786. It is
causing test failures after a multi-stage clang bootstrap. See
discussion on D73242 and D75201.
2020-03-02 14:02:13 -08:00
Teresa Johnson 873c0d0786 [ThinLTO/LowerTypeTests] Handle unpromoted local type ids
Summary:
Fixes an issue that cropped up after the changes in D73242 to delay
the lowering of type tests. LTT couldn't handle any type tests with
non-string type id (which happens for local vtables, which we try to
promote during the compile step but cannot always when there are no
exported symbols).

We can simply treat the same as having an Unknown resolution, which
delays their lowering, still allowing such type tests to be used in
subsequent optimization (e.g. planned usage during ICP). The final
lowering which simply removes these handles them fine.

Beefed up an existing ThinLTO test for such unpromoted type ids so that
the internal vtable isn't removed before lower type tests, which hides
the problem.

Reviewers: evgeny777, pcc

Subscribers: inglorion, hiraditya, steven_wu, dexonsmith, aganea, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75201
2020-03-02 09:31:44 -08:00
Arkady Shlykov 3dcaf296ae [Loop Peeling] Add possibility to enable peeling on loop nests.
Summary:
Current peeling implementation bails out in case of loop nests.
The patch introduces a field in TargetTransformInfo structure that
certain targets can use to relax the constraints if it's
profitable (disabled by default).
Also additional option is added to enable peeling manually for
experimenting and testing purposes.

Reviewers: fhahn, lebedev.ri, xbolva00

Reviewed By: xbolva00

Subscribers: RKSimon, xbolva00, hiraditya, zzheng, llvm-commits

Differential Revision: https://reviews.llvm.org/D70304
2020-03-02 08:37:11 -08:00
David Green d0d38df091 [LoopVectorizer] Change types of lists from pointers to references. NFC
getReductionVars, getInductionVars and getFirstOrderRecurrences were all
being returned from LoopVectorizationLegality as pointers to lists. This
just changes them to be references, cleaning up the interface slightly.

Differential Revision: https://reviews.llvm.org/D75448
2020-03-02 15:04:41 +00:00
Reid Kleckner 1adbe86d87 [WinEH] Fix inttoptr+phi optimization in presence of catchswitch
getFirstInsertionPt's return value must be checked for validity before
casting it to Instruction*. Don't attempt to insert casts after a phi in
a catchswitch block.

Fixes PR45033, introduced in D37832.

Reviewed By: davidxl, hfinkel

Differential Revision: https://reviews.llvm.org/D75381
2020-03-01 07:49:28 -08:00
Juneyoung Lee 5cbb265694 [GVN] Fold equivalent freeze instructions
Summary:
This patch defines two freeze instructions to have the same value number if they are equivalent.

This is allowed because GVN replaces all uses of a duplicated instruction with another.

If it partially rewrites use, it is not allowed. e.g)

```
a = freeze(x)
b = freeze(x)
use(a)
use(a)
use(b)
=>
use(a)
use(b) // This is not allowed!
use(b)
```

Reviewers: fhahn, reames, spatel, efriedma

Reviewed By: fhahn

Subscribers: lebedev.ri, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75398
2020-03-01 07:32:05 +09:00
Simon Pilgrim 7e9747b50b [X86][F16C] Remove cvtph2ps intrinsics and use generic half2float conversion (PR37554)
This removes everything but int_x86_avx512_mask_vcvtph2ps_512 which provides the SAE variant, but even this can use the fpext generic if the rounding control is the default.

Differential Revision: https://reviews.llvm.org/D75162
2020-02-29 18:57:35 +00:00
Vedant Kumar dd1ea9de2e Reland: [Coverage] Revise format to reduce binary size
Try again with an up-to-date version of D69471 (99317124 was a stale
revision).

---

Revise the coverage mapping format to reduce binary size by:

1. Naming function records and marking them `linkonce_odr`, and
2. Compressing filenames.

This shrinks the size of llc's coverage segment by 82% (334MB -> 62MB)
and speeds up end-to-end single-threaded report generation by 10%. For
reference the compressed name data in llc is 81MB (__llvm_prf_names).

Rationale for changes to the format:

- With the current format, most coverage function records are discarded.
  E.g., more than 97% of the records in llc are *duplicate* placeholders
  for functions visible-but-not-used in TUs. Placeholders *are* used to
  show under-covered functions, but duplicate placeholders waste space.

- We reached general consensus about giving (1) a try at the 2017 code
  coverage BoF [1]. The thinking was that using `linkonce_odr` to merge
  duplicates is simpler than alternatives like teaching build systems
  about a coverage-aware database/module/etc on the side.

- Revising the format is expensive due to the backwards compatibility
  requirement, so we might as well compress filenames while we're at it.
  This shrinks the encoded filenames in llc by 86% (12MB -> 1.6MB).

See CoverageMappingFormat.rst for the details on what exactly has
changed.

Fixes PR34533 [2], hopefully.

[1] http://lists.llvm.org/pipermail/llvm-dev/2017-October/118428.html
[2] https://bugs.llvm.org/show_bug.cgi?id=34533

Differential Revision: https://reviews.llvm.org/D69471
2020-02-28 18:12:04 -08:00
Vedant Kumar 3388871714 Revert "[Coverage] Revise format to reduce binary size"
This reverts commit 99317124e1. This is
still busted on Windows:

http://lab.llvm.org:8011/builders/lld-x86_64-win7/builds/40873

The llvm-cov tests report 'error: Could not load coverage information'.
2020-02-28 18:03:15 -08:00
Vedant Kumar 99317124e1 [Coverage] Revise format to reduce binary size
Revise the coverage mapping format to reduce binary size by:

1. Naming function records and marking them `linkonce_odr`, and
2. Compressing filenames.

This shrinks the size of llc's coverage segment by 82% (334MB -> 62MB)
and speeds up end-to-end single-threaded report generation by 10%. For
reference the compressed name data in llc is 81MB (__llvm_prf_names).

Rationale for changes to the format:

- With the current format, most coverage function records are discarded.
  E.g., more than 97% of the records in llc are *duplicate* placeholders
  for functions visible-but-not-used in TUs. Placeholders *are* used to
  show under-covered functions, but duplicate placeholders waste space.

- We reached general consensus about giving (1) a try at the 2017 code
  coverage BoF [1]. The thinking was that using `linkonce_odr` to merge
  duplicates is simpler than alternatives like teaching build systems
  about a coverage-aware database/module/etc on the side.

- Revising the format is expensive due to the backwards compatibility
  requirement, so we might as well compress filenames while we're at it.
  This shrinks the encoded filenames in llc by 86% (12MB -> 1.6MB).

See CoverageMappingFormat.rst for the details on what exactly has
changed.

Fixes PR34533 [2], hopefully.

[1] http://lists.llvm.org/pipermail/llvm-dev/2017-October/118428.html
[2] https://bugs.llvm.org/show_bug.cgi?id=34533

Differential Revision: https://reviews.llvm.org/D69471
2020-02-28 17:33:25 -08:00
Matt Morehouse 30bb737a75 [DFSan] Add __dfsan_cmp_callback.
Summary:
When -dfsan-event-callbacks is specified, insert a call to
__dfsan_cmp_callback on every CMP instruction.

Reviewers: vitalybuka, pcc, kcc

Reviewed By: kcc

Subscribers: hiraditya, #sanitizers, eugenis, llvm-commits

Tags: #sanitizers, #llvm

Differential Revision: https://reviews.llvm.org/D75389
2020-02-28 15:49:44 -08:00
Matt Morehouse f668baa459 [DFSan] Add __dfsan_mem_transfer_callback.
Summary:
When -dfsan-event-callbacks is specified, insert a call to
__dfsan_mem_transfer_callback on every memcpy and memmove.

Reviewers: vitalybuka, kcc, pcc

Reviewed By: kcc

Subscribers: eugenis, hiraditya, #sanitizers, llvm-commits

Tags: #sanitizers, #llvm

Differential Revision: https://reviews.llvm.org/D75386
2020-02-28 15:48:25 -08:00
Matt Morehouse 52f889abec [DFSan] Add __dfsan_load_callback.
Summary:
When -dfsan-event-callbacks is specified, insert a call to
__dfsan_load_callback() on every load.

Reviewers: vitalybuka, pcc, kcc

Reviewed By: vitalybuka, kcc

Subscribers: hiraditya, #sanitizers, llvm-commits, eugenis, kcc

Tags: #sanitizers, #llvm

Differential Revision: https://reviews.llvm.org/D75363
2020-02-28 14:26:09 -08:00
Austin Kerbow 4fa63fd452 [VectorCombine] Fix assert on compare extract index
Extract index could be a differnet integral type.

Differential Revision: https://reviews.llvm.org/D75327
2020-02-28 10:37:08 -08:00
Valery N Dmitriev d723ec4f04 [SLP][NFC] Assert that tree entry operands completed when scheduler looks for dependencies.
This change adds an assertion to prevent tricky bug related to recursive
approach of building vectorization tree. For loop below takes number of
operands directly from tree entry rather than from scalars.
If the entry at this moment turns out incomplete (i.e. not all operands set)
then not all the dependencies will be seen by the scheduler.
This can lead to failed scheduling (and thus failed vectorization)
for perfectly vectorizable tree.
Here is code example which is likely to fire the assertion:
for (i : VL0->getNumOperands()) {
  ...
  TE->setOperand(i, Operands);
  buildTree_rec(Operands, Depth + 1,...);
}

Correct way is two steps process: first set all operands to a tree entry
and then recursively process each operand.

Differential Revision: https://reviews.llvm.org/D75296
2020-02-28 10:34:48 -08:00
Hiroshi Yamauchi f16d2bec40 Devirtualize a call on alloca without waiting for post inline cleanup and next DevirtSCCRepeatedPass iteration.
This aims to fix a missed inlining case.

If there's a virtual call in the callee on an alloca (stack allocated object) in
the caller, and the callee is inlined into the caller, the post-inline cleanup
would devirtualize the virtual call, but if the next iteration of
DevirtSCCRepeatedPass doesn't happen (under the new pass manager), which is
based on a heuristic to determine whether to reiterate, we may miss inlining the
devirtualized call.

This enables inlining in clang/test/CodeGenCXX/member-function-pointer-calls.cpp.

This is a second commit after a revert
https://reviews.llvm.org/rG4569b3a86f8a4b1b8ad28fe2321f936f9d7ffd43 and a fix
https://reviews.llvm.org/rG41e06ae7ba91.

Differential Revision: https://reviews.llvm.org/D69591
2020-02-28 09:43:32 -08:00
Hiroshi Yamauchi 41e06ae7ba [CallPromotionUtils] Add missing promotion legality check to tryPromoteCall.
Summary: This fixes the crash that led to the revert of D69591.

Reviewers: davidxl

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75307
2020-02-28 09:35:09 -08:00
Valery N Dmitriev 02e5e47e17 [SLP][NFC] Delete some unreachable code.
This patch deletes some dead code out of SLP vectorizer.
Couple of changes taken out of D57059 to slightly lighten it
plus one more similar case fixed.

Differential Revision: https://reviews.llvm.org/D75276
2020-02-28 09:22:51 -08:00
Teresa Johnson f9ca75f19b [Inliner] Inlining should honor nobuiltin attributes
Summary:
Final patch in series to fix inlining between functions with different
nobuiltin attributes/options, which was specifically an issue in LTO.
See discussion on D61634 for background.

The prior patch in this series (D67923) enabled per-Function TLI
construction that identified the nobuiltin attributes.

Here I have allowed inlining to proceed if the callee's nobuiltins are a
subset of the caller's nobuiltins, but not in the reverse case, which
should be conservatively correct. This is controlled by a new option,
-inline-caller-superset-nobuiltin, which is enabled by default.

Reviewers: hfinkel, gchatelet, chandlerc, davidxl

Subscribers: arsenm, jvesely, nhaehnle, mehdi_amini, eraman, hiraditya, haicheng, dexonsmith, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D74162
2020-02-28 07:34:14 -08:00
Pierre-vh 2809abbd98 [Transform][MemCpyOpt] Add missing DebugLoc to %tmpbitcast
Fix for https://bugs.llvm.org/show_bug.cgi?id=37967

Differential Revision: https://reviews.llvm.org/D75173
2020-02-28 15:20:51 +00:00
Juneyoung Lee cc28a75467 Let EarlyCSE fold equivalent freeze instructions
Summary:
This patch makes EarlyCSE fold equivalent freeze instructions.

Another optimization that I think will be useful is to remove freeze if its operand is used as a branch condition or at llvm.assume:

```
  %c = ...
  br i1 %c, label %A, ..
A:
  %d = freeze %c ; %d can be optimized to %c because %c cannot be poison or undef (or 'br %c' would be UB otherwise)
```

If it make sense for EarlyCSE to support this as well, I will make a patch for this.

Reviewers: spatel, reames, lebedev.ri

Reviewed By: lebedev.ri

Subscribers: lebedev.ri, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75334
2020-02-28 20:35:20 +09:00
Hans Wennborg d48c981697 SROA: Don't drop atomic load/store alignments (PR45010)
SROA will drop the explicit alignment on allocas when the ABI guarantees
enough alignment. Because the alignment on new load/store instructions
are set based on the alloca's alignment, that means SROA would end up
dropping the alignment from atomic loads and stores, which is not
allowed (see bug). For those, make sure to always carry over the
alignment from the previous instruction.

Differential revision: https://reviews.llvm.org/D75266
2020-02-28 10:38:40 +01:00
Jun Ma 43c8307c6c [Coroutines] CoroElide enhancement
Fix regression of CoreElide pass when current function is
coroutine.

Differential Revision: https://reviews.llvm.org/D71663
2020-02-28 10:41:59 +08:00
Juneyoung Lee 2b5a897651 Revert "[SimpleLoopUnswitch] Fix introduction of UB when hoisted condition may be undef or poison"
.. due to performance regression.

This patch is reverted until infrastructore for CSE/LICM support for freeze is
added.

This reverts commit 181628b
2020-02-28 11:10:46 +09:00
Eli Friedman b299926453 [IndVars] Fix sort comparator.
std::sort will compare an element to itself in some cases.  We should
not crash if this happens.

Differential Revision: https://reviews.llvm.org/D75000
2020-02-27 17:25:18 -08:00
Matt Morehouse 470db54cbd [DFSan] Add flag to insert event callbacks.
Summary:
For now just insert the callback for stores, similar to how MSan tracks
origins.  In the future we may want to add callbacks for loads, memcpy,
function calls, CMPs, etc.

Reviewers: pcc, vitalybuka, kcc, eugenis

Reviewed By: vitalybuka, kcc, eugenis

Subscribers: eugenis, hiraditya, #sanitizers, llvm-commits, kcc

Tags: #sanitizers, #llvm

Differential Revision: https://reviews.llvm.org/D75312
2020-02-27 17:14:19 -08:00
Matt Morehouse 2a29617b9d [DFSan] Remove unused IRBuilder. NFC
Reviewers: pcc, vitalybuka, kcc

Reviewed By: kcc

Subscribers: hiraditya, llvm-commits, kcc

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75190
2020-02-27 16:27:20 -08:00
Artur Pilipenko 02e3d5c3a2 Fix DSE miscompile when store is clobbered across loop iterations
DSE would mistakenly remove store (2):

  a = calloc(n+1)
  for (int i = 0; i < n; i++) {
    store 1, a[i+1] // (1)
    store 0, a[i]   // (2)
  }

The fix is to do PHI transaltion while looking for clobbering
instructions between the store and the calloc.

Reviewed By: efriedma, bjope

Differential Revision: https://reviews.llvm.org/D68006
2020-02-27 14:43:01 -08:00
Nikita Popov 4ef272ec9c [InstCombine] DCE instructions earlier
When InstCombine initially populates the worklist, it already
performs constant folding and DCE. However, as the instructions
are initially visited in program order, this DCE can pick up only
the last instruction of a dead chain, the rest would only get
picked up in the main InstCombine run.

To avoid this, we instead perform the DCE in separate pass over the
collected instructions in reverse order, which will allow us to
pick up full dead instruction chains. We already need to do this
reverse iteration anyway to populate the worklist, so this
shouldn't add extra cost.

This by itself only fixes a small part of the problem though:
The same basic issue also applies during the main InstCombine loop.
We generally always want DCE to occur as early as possible,
because it will allow one-use folds to happen. Address this by also
performing DCE while adding deferred instructions to the main worklist.

This drops the number of tests that perform more than 2 InstCombine
iterations from ~80 to ~40. There's some spurious test changes due
to operand order / icmp toggling.

Differential Revision: https://reviews.llvm.org/D75008
2020-02-27 18:45:59 +01:00
Simon Moll ddd11273d9 Remove BinaryOperator::CreateFNeg
Use UnaryOperator::CreateFNeg instead.

Summary:
With the introduction of the native fneg instruction, the
fsub -0.0, %x idiom is obsolete. This patch makes LLVM
emit fneg instead of the idiom in all places.

Reviewed By: cameron.mcinally

Differential Revision: https://reviews.llvm.org/D75130
2020-02-27 09:06:03 -08:00
Pierre-vh f64e457cb7 [Transforms][Debugify] Ignore PHI nodes when checking for DebugLocs
Fix for: https://bugs.llvm.org/show_bug.cgi?id=37964

Differential Revision: https://reviews.llvm.org/D75242
2020-02-27 16:14:11 +00:00
Kirill Bobyrev 4569b3a86f
Revert "Devirtualize a call on alloca without waiting for post inline cleanup and next"
This reverts commit 59fb9cde7a.

The patch caused internal miscompilations.
2020-02-27 15:58:39 +01:00
Jay Foad f41e82c82c [InstCombine] Fix confusing variable name. 2020-02-27 11:27:49 +00:00
Nemanja Ivanovic c46b85aaf4 [LoopVectorize] Fix cost for calls to functions that have vector versions
A recent commit
(https://reviews.llvm.org/rG66c120f02560ef528a60924104ead66f330190f1) changed
the cost for calls to functions that have a vector version for some
vectorization factor. However, no check is performed for whether the
vectorization factor matches the current one being cost modeled. This leads to
attempts to widen call instructions to a vectorization factor for which such a
function does not exist, which in turn leads to an assertion failure.

This patch adds the check for vectorization factor (i.e. not just that the
called function has a vector version for some VF, but that it has a vector
version for this VF).

Differential revision: https://reviews.llvm.org/D74944
2020-02-26 21:39:11 -06:00
Sanjay Patel 25c6544f32 [VectorCombine] add a debug flag to skip all transforms
As suggested in D75145 -

I'm not sure why, but several passes have this kind of disable/enable flag
implemented at the pass manager level. But that means we have to duplicate
the flag for both pass managers and add code to check the flag every time
the pass appears in the pipeline.

We want a debug option to see if this pass is misbehaving regardless of the
pass managers, so just add a disablement check at the single point before
any transforms run.

Differential Revision: https://reviews.llvm.org/D75204
2020-02-26 15:15:42 -05:00
Nikita Popov 00f54050f7 [SimpleLoopUnswitch] Remove unnecessary include; NFC 2020-02-26 20:40:43 +01:00
Nikita Popov 9d9633fb70 [CVP] Simplify cmp of local phi node
CVP currently does not simplify cmps with instructions in the same
block, because LVI getPredicateAt() currently does not provide
much useful information for that case (D69686 would change that,
but is stuck.) However, if the instruction is a Phi node, then
LVI can compute the result of the predicate by threading it into
the predecessor blocks, which allows it simplify some conditions
that nothing else can handle. Relevant code:
6d6a4590c5/llvm/lib/Analysis/LazyValueInfo.cpp (L1904-L1927)

Differential Revision: https://reviews.llvm.org/D72169
2020-02-26 20:36:41 +01:00
Nikita Popov 7da3b5e45c [InstCombine] Simplify DCE code; NFC
As pointed out on D75008, MadeIRChange is already set by
eraseInstFromFunction(), which also already does a debug print.
2020-02-26 20:33:00 +01:00
Nikita Popov 56f7de5baa [InstCombine] Remove trivially empty ranges from end
InstCombine removes pairs of start+end intrinsics that don't
have anything in between them. Currently this is done by starting
at the start intrinsic and scanning forwards. This patch changes
it to start at the end intrinsic and scan backwards.

The motivation here is as follows: When we process the start
intrinsic, we have not yet looked at the following instructions,
which may still get folded/removed. If they do, we will only be
able to remove the start/end pair on the next iteration. When we
process the end intrinsic, all the instructions before it have
already been visited, and we don't run into this problem.

Differential Revision: https://reviews.llvm.org/D75011
2020-02-26 20:04:11 +01:00
Hiroshi Yamauchi 59fb9cde7a Devirtualize a call on alloca without waiting for post inline cleanup and next
DevirtSCCRepeatedPass iteration.  Needs ReviewPublic

This aims to fix a missed inlining case.

If there's a virtual call in the callee on an alloca (stack allocated object) in
the caller, and the callee is inlined into the caller, the post-inline cleanup
would devirtualize the virtual call, but if the next iteration of
DevirtSCCRepeatedPass doesn't happen (under the new pass manager), which is
based on a heuristic to determine whether to reiterate, we may miss inlining the
devirtualized call.

This enables inlining in clang/test/CodeGenCXX/member-function-pointer-calls.cpp.
2020-02-26 09:51:24 -08:00
Reid Kleckner 465dca79b3 Avoid SmallString.h include in MD5.h, NFC
Saves 200 includes, which is mostly immaterial.
2020-02-26 09:10:24 -08:00
Hans Wennborg 546918cbb4 Revert "[compiler-rt] Add a critical section when flushing gcov counters"
See discussion on PR44792.

This reverts commit 02ce9d8ef5.

It also reverts the follow-up commits
8f46269f0 "[profile] Don't dump counters when forking and don't reset when calling exec** functions"
62c7d8402 "[profile] gcov_mutex must be static"
2020-02-26 13:27:44 +01:00
Juneyoung Lee 1cb7ec870d [SimpleLoopUnswitch] Canonicalize variable names 2020-02-26 15:33:02 +09:00
Juneyoung Lee 181628b52d [SimpleLoopUnswitch] Fix introduction of UB when hoisted condition may be undef or poison
Summary:
Loop unswitch hoists branches on loop-invariant conditions. However, if this
condition is poison/undef and the branch wasn't originally reachable, loop
unswitch introduces UB (since the optimized code will branch on poison/undef and
the original one didn't)).
We fix this problem by freezing the condition to ensure we don't introduce UB.

We will now transform the following:
  while (...) {
    if (C) { A }
    else   { B }
  }

Into:
  C' = freeze(C)
  if (C') {
    while (...) { A }
  } else {
    while (...) { B }
  }

This patch fixes the root cause of the following bug reports (which use the old loop unswitch, but can be reproduced with minor changes in the code and -enable-nontrivial-unswitch):
- https://llvm.org/bugs/show_bug.cgi?id=27506
- https://llvm.org/bugs/show_bug.cgi?id=31652

Reviewers: reames, majnemer, chenli, sanjoy, hfinkel

Reviewed By: reames

Subscribers: hiraditya, jvesely, nhaehnle, filcab, regehr, trentxintong, nlopes, llvm-commits, mzolotukhin

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D29015
2020-02-26 13:47:33 +09:00
Johannes Doerfert 396b725394 [OpenMP][Opt] Combine `struct ident_t*` during deduplication
If we deduplicate OpenMP runtime calls we have multiple `ident_t*` that
represent information like source location. So far, we simply kept the
one used by the replacement call. However, as exposed by PR44893, that
can cause problems if we have stack allocated `ident_t` objects. While
we need to revisit the use of these as well, it is clear that we
eventually want to merge source location information in some way. With
this patch we add the infrastructure to do so but without doing the
actual merge. Instead we pick a global `ident_t` from the replaced
calls, if possible, or create a new one with an unknown location
instead.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D74925
2020-02-25 14:07:14 -08:00
Akira Hatanaka 430512ed7d [ObjC][ARC] Don't move a retain call living outside a loop into the loop
body

We started seeing cases where ARC optimizer would move retain calls into
loop bodies, causing imbalance in the number of retain and release
calls, after changes were made to delete inert ARC calls since the inert
calls that used to block code motion are gone.

Fix the bug by setting the CFG hazard flag when visiting a loop header.

rdar://problem/56908836
2020-02-25 13:00:10 -08:00
Roman Lebedev 400ceda425
[SCEV][IndVars] Always provide insertion point to the SCEVExpander::isHighCostExpansion()
Summary: This addresses the `llvm/test/Transforms/IndVarSimplify/elim-extend.ll` `@nestedIV` regression from D73728

Reviewers: reames, mkazantsev, wmi, sanjoy

Reviewed By: mkazantsev

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73777
2020-02-25 23:05:59 +03:00
Roman Lebedev 44edc6fd2c
[SCEV] rewriteLoopExitValues(): even if have hard uses, still rewrite if cheap (PR44668)
Summary:
Replacing uses of IV outside of the loop is likely generally useful,
but `rewriteLoopExitValues()` is cautious, and if it isn't told to always
perform the replacement, and there are hard uses of IV in loop,
it doesn't replace.

In [[ https://bugs.llvm.org/show_bug.cgi?id=44668 | PR44668 ]],
that prevents `-indvars` from replacing uses of induction variable
after the loop, which might be one of the optimization failures
preventing that code from being vectorized.

Instead, now that the cost model is fixed, i believe we should be
a little bit more optimistic, and also perform replacement
if we believe it is within our budget.

Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=44668 | PR44668 ]].

Reviewers: reames, mkazantsev, asbirlea, fhahn, skatkov

Reviewed By: mkazantsev

Subscribers: nikic, hiraditya, zzheng, javed.absar, dmgreen, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73501
2020-02-25 23:05:59 +03:00
Roman Lebedev b99c91a087
[NFC][SCEV] Piping to pass new SCEVCheapExpansionBudget option into SCEVExpander::isHighCostExpansionHelper()
Summary:
In future patches`SCEVExpander::isHighCostExpansionHelper()` will respect the budget allocated by performing TTI cost modelling.
This is a fully NFC patch to make things reviewable.

Reviewers: reames, mkazantsev, wmi, sanjoy

Reviewed By: mkazantsev

Subscribers: hiraditya, zzheng, javed.absar, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73705
2020-02-25 23:05:57 +03:00
Roman Lebedev 0789f28048
[NFC][SCEV] Piping to pass TTI into SCEVExpander::isHighCostExpansionHelper()
Summary:
Future patches will make use of TTI to perform cost-model-driven `SCEVExpander::isHighCostExpansionHelper()`
This is a fully NFC patch to make things reviewable.

Reviewers: reames, mkazantsev, wmi, sanjoy

Reviewed By: mkazantsev

Subscribers: hiraditya, zzheng, javed.absar, dmgreen, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73704
2020-02-25 23:05:56 +03:00
Philip Reames 14845b2c45 Revert "[LICM] Support hosting of dynamic allocas out of loops"
This reverts commit 8d22100f66.

There was a functional regression reported (https://bugs.llvm.org/show_bug.cgi?id=44996).  I'm not actually sure the patch is wrong, but I don't have time to investigate currently, and this line of work isn't something I'm likely to get back to quickly.
2020-02-25 09:05:31 -08:00
Roman Lebedev 2855c8fed9
[InstCombine] foldShiftIntoShiftInAnotherHandOfAndInICmp(): fix miscompile (PR44802)
Much like with reassociateShiftAmtsOfTwoSameDirectionShifts(),
as input, we have the following pattern:
  icmp eq/ne (and ((x shift Q), (y oppositeshift K))), 0
We want to rewrite that as:
  icmp eq/ne (and (x shift (Q+K)), y), 0  iff (Q+K) u< bitwidth(x)

While we know that originally (Q+K) would not overflow
(because  2 * (N-1) u<= iN -1), we may have looked past extensions of
shift amounts. so it may now overflow in smaller bitwidth.

To ensure that does not happen, we need to ensure that the total maximal
shift amount is still representable in that smaller bitwidth.
If the overflow would happen, (Q+K) u< bitwidth(x) check would be bogus.

https://bugs.llvm.org/show_bug.cgi?id=44802
2020-02-25 18:23:58 +03:00
Roman Lebedev 781d077afb
[InstCombine] reassociateShiftAmtsOfTwoSameDirectionShifts(): fix miscompile (PR44802)
As input, we have the following pattern:
  Sh0 (Sh1 X, Q), K
We want to rewrite that as:
  Sh x, (Q+K)  iff (Q+K) u< bitwidth(x)
While we know that originally (Q+K) would not overflow
(because  2 * (N-1) u<= iN -1), we may have looked past extensions of
shift amounts. so it may now overflow in smaller bitwidth.

To ensure that does not happen, we need to ensure that the total maximal
shift amount is still representable in that smaller bitwidth.
If the overflow would happen, (Q+K) u< bitwidth(x) check would be bogus.

https://bugs.llvm.org/show_bug.cgi?id=44802
2020-02-25 18:23:51 +03:00
Sanjay Patel 10ea01d80d [VectorCombine] make cost calc consistent for binops and cmps
Code duplication (subsequently removed by refactoring) allowed
a logic discrepancy to creep in here.

We were being conservative about creating a vector binop -- but
not a vector cmp -- in the case where a vector op has the same
estimated cost as the scalar op. We want to be more aggressive
here because that can allow other combines based on reduced
instruction count/uses.

We can reverse the transform in DAGCombiner (potentially with a
more accurate cost model) if this causes regressions.

AFAIK, this does not conflict with InstCombine. We have a
scalarize transform there, but it relies on finding a constant
operand or a matching insertelement, so that means it eliminates
an extractelement from the sequence (so we won't have 2 extracts
by the time we get here if InstCombine succeeds).

Differential Revision: https://reviews.llvm.org/D75062
2020-02-25 08:41:59 -05:00
Florian Hahn b8d638d337 [DSE,MSSA] Do not attempt to remove un-removable memdefs.
We have to skip MemoryDefs that cannot be removed. This fixes a crash in
the newly added test case and fixes a wrong case in
memset-and-memcpy.ll.
2020-02-25 13:31:46 +00:00
Hideto Ueno 2c0edbf19c [Attributor] Use AssumptionCache in AANonNullFloating::initialize 2020-02-25 13:00:03 +09:00
Ayke van Laethem 2a7a989c3e
[LLVM-C] Add bindings for addCoroutinePassesToExtensionPoints
This patch adds bindings to C and Go for
addCoroutinePassesToExtensionPoints, which is used to add coroutine
passes to the correct locations in PassManagerBuilder.

Differential Revision: https://reviews.llvm.org/D51642
2020-02-24 20:15:51 +01:00
Calixte Denizet 8f46269f0c [profile] Don't dump counters when forking and don't reset when calling exec** functions
Summary:
There is no need to write out gcdas when forking because we can just reset the counters in the parent process.
Let say a counter is N before the fork, then fork and this counter is set to 0 in the child process.
In the parent process, the counter is incremented by P and in the child process it's incremented by C.
When dump is ran at exit, parent process will dump N+P for the given counter and the child process will dump 0+C, so when the gcdas are merged the resulting counter will be N+P+C.
About exec** functions, since the current process is replaced by an another one there is no need to reset the counters but just write out the gcdas since the counters are definitely lost.
To avoid to have lists in a bad state, we just lock them during the fork and the flush (if called explicitely) and lock them when an element is added.

Reviewers: marco-c

Reviewed By: marco-c

Subscribers: hiraditya, cfe-commits, #sanitizers, llvm-commits, sylvestre.ledru

Tags: #clang, #sanitizers, #llvm

Differential Revision: https://reviews.llvm.org/D74953
2020-02-24 10:38:33 +01:00
Florian Hahn 7769030b93 Recommit "[PatternMatch] Match XOR variant of unsigned-add overflow check."
This version fixes a buildbot failure cause by picking the wrong insert
point for XORs. We cannot pick the XOR binary operator as insert point,
as it is not guaranteed that both input operands for the overflow
intrinsic are defined before it.

This reverts the revert commit
c7fc0e5da6.
2020-02-23 18:33:18 +00:00
Florian Hahn af69d5e10e [DSE] Track overlapping stores.
Add a map from BasicBlocks to overlap intervals. For partial writes, we
can keep track of those in IOLs. We only add candidates that are valid
for eliminations.

Reviewers: dmgreen, bryant, asbirlea, Tyker

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D73757
2020-02-23 15:44:40 +00:00
Tyker 837d8129e9 [NFC] Remove some GCC warning from c9e93c84f6 2020-02-22 14:11:31 +01:00
Johannes Doerfert 9708279c72 [Attributor][FIX] Undo 16188f9 until SCC iterator bug is fixed
The buildbot
  http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-win
shows some strange SCC iterator bug since 16188f9 which we need to
investigate. This patch should remove the part of 16188f9 that could
have exposed the problem.
2020-02-21 14:20:42 -08:00
Whitney Tsang 0a70edd696 [CloneFunction] Update loop headers after cloning all blocks in loop.
Summary:
Blocks in a loop can be in any order as long as the loop header is the
first block in Blocks.
With some order of Blocks, cloneLoopWithPreheader would trigger the
assertion in addBasicBlockToLoop.

Example:

define void @test(i64 %N) {
preheader.i:
  br label %header.i

header.i:
  %i = phi i64 [ 0, %preheader.i ], [ %inc.i, %latch.i ]
  br label %header.j

header.j:
  %j = phi i64 [ 0, %header.i ], [ %inc.j, %latch.j ]
  br label %header.k

header.k:
  %k = phi i64 [ 0, %header.j ], [ %inc.k, %latch.k ]
  call void @baz(i64 %i, i64 %j, i64 %k)
  br label %latch.k

latch.k:
  %inc.k = add nsw i64 %k, 1
  %cmp.k = icmp slt i64 %inc.k, %N
  br i1 %cmp.k, label %header.k, label %latch.j

latch.j:
  %inc.j = add nsw i64 %j, 1
  %cmp.j = icmp slt i64 %inc.j, %N
  br i1 %cmp.j, label %header.j, label %latch.i

latch.i:
  %inc.i = add nsw i64 %i, 1
  %cmp.i = icmp slt i64 %inc.i, %N
  br i1 %cmp.i, label %header.i, label %exit.i

exit.i:
  ret void
}
declare void @baz(i64, i64, i64)
If the blocks of loop-i is in the order: header.i, latch.k, header.k,
header.j, latch.j, latch.i,
then cloneLoopWithPreheader would trigger the assertion in
addBasicBlockToLoop
assert(contains(SameHeader) && getHeader() == SameHeader->getHeader() &&
"Incorrect LI specified for this loop!");

As latch.k is in both loop-j and loop-k, it would be set as the header
of both loops after adding latch.k.
If we update loop headers during cloning blocks, then after adding
header.k,
the header of loop-k would be updated with header.k,
while the header of loop-j stays as latch.k.

When adding header.j, SameHeader is loop-k, SameHeader->getHeader() is
header.k, but getHeader() is latch.k, which trigger the assertion.
Reviewer: jdoerfert, Meinersbur, fhahn, kbarton, hfinkel, bmahjour,
etiotto
Reviewed By: Meinersbur
Subscribers: hiraditya, llvm-commits
Tag: LLVM
Differential Revision: https://reviews.llvm.org/D74382
2020-02-21 22:18:24 +00:00
Sanjay Patel e9c79a7aef [VectorCombine] refactor to reduce duplicated code; NFC
This should be the last step in the current cleanup.
Follow-ups should resolve the TODO about cost calc
and enable the more general case where we extract
different elements.
2020-02-21 15:56:00 -05:00
Sanjay Patel 34e3485560 [VectorCombine] refactor cost calcs to reduce duplication; NFC
More cleanup is possible now, but we probably need to
resolve the TODO about the existing difference between
compares and binops.
2020-02-21 15:12:00 -05:00
Krzysztof Parzyszek d2b7c09e79 [Hexagon] Simplify intrinsic (vandvrt (vandqrt q b) m) -> q if possible
When each byte in b&m is non-zero, this conversion Q->V->Q is a no-op.
2020-02-21 13:56:04 -06:00
Nikita Popov b178555318 [InstCombine] Improve simplify demanded bits worklist management
This fixes a small mistake from D72944: The worklist add should
happen before assigning the new operand, not after.

In case an actual replacement happens, the old operand needs to
be added for DCE. If no actual replacement happens, then old/new
are the same, so it doesn't matter.

This drops one iteration from the annotated test case.
2020-02-21 18:51:41 +01:00
Nikita Popov 656dff9af4 [InstCombine] Use replaceOperand() in more places
Followup to D73919 with another batch of replacements of
setOperand() -> replaceOperand(), to make sure the old
operand gets DCEd right away.

Differential Revision: https://reviews.llvm.org/D74932
2020-02-21 18:41:16 +01:00
Florian Hahn 98f5268a72 [VectorUtils] Move ToVectorTy to VectorUtils.h (NFC).
ToVectorTy is defined and used in multiple places. Hoist it to
VectorUtils.h to avoid duplication and improve re-usability.

Reviewers: rengolin, hsaito, Ayal, gilr, fpetrogalli

Reviewed By: fpetrogalli

Differential Revision: https://reviews.llvm.org/D74959
2020-02-21 17:31:24 +00:00
Nikita Popov a8db806d52 [SimplifyLibCalls][IRBuilder] Accept any IRBuilder in SimplifyLibCalls
This changes the SimplifyLibCalls utility to accept an IRBuilderBase,
which allows us to pass through the IRBuilder used by InstCombine.
This will ensure that new instructions get added to the worklist.
The annotated test-case drops from 4 to 2 InstCombine iterations thanks
to this.

To achieve this, I'm adding an IRBuilderBase::OperandBundlesGuard,
which is basically the same as the existing InsertPointGuard and
FastMathFlagsGuard, but for operand bundles. Also add a
setDefaultOperandBundles() method so these can be set outside the
constructor.

Differential Revision: https://reviews.llvm.org/D74792
2020-02-21 18:26:05 +01:00
Sanjay Patel fc4455891c [VectorCombine] refactor matching code to reduce duplication; NFC
cmp/binop were already diverging even though they are largely
the same logic.
2020-02-21 12:06:51 -05:00
Florian Hahn 134bab7cd5 [DSE,MSSA] Add debug counter.
Can be used like
-debug-counter=dse-memoryssa-skip=10,dse-memoryssa-counter-count=20

Reviewers: dmgreen, rnk, efriedma, bryant, asbirlea

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D72147
2020-02-21 17:04:37 +00:00
Bill Wendling 2fe457690d Filter callbr insts from critical edge splitting
Similarly to how splitting predecessors with an indirectbr isn't handled
in the generic way, we also shouldn't split callbrs, for similar
reasons.
2020-02-20 16:24:42 -08:00
Florian Hahn 99809f98d7 [SCCP] Do not mark unknown loads as overdefined.
For tracked globals that are unknown after solving, we expect all
non-store uses to be replaced.

This is a follow-up to f8045b250d, which removed forcedconstant.

We should not mark unknown loads as overdefined, as they either load
from an unknown pointer or an undef global. Restore the original logic
for loads.
2020-02-20 22:48:58 +01:00
dfukalov dbfc682e2b SpeculativeExecution: fixed ingoring free execution
Summary:
After updating cost model in AMDGPU target (47a5c36b37) the pass started to
ignore some BBs since they got all instructions estimated as free.

Reviewers: arsenm, chandlerc, nhaehnle

Reviewed By: nhaehnle

Subscribers: jvesely, wdng, nhaehnle, tpr, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D74825
2020-02-20 14:45:02 +03:00
Johannes Doerfert d95cb56649 [Attributor] Make sure abstract attributes are properly initialized 2020-02-20 02:46:40 -06:00
Johannes Doerfert 6185fb13d6 [Attributor][NFC] Refactor interface 2020-02-20 02:46:40 -06:00
Johannes Doerfert 8e76fec0ae [Attributor][NFC] Improve the debug output & add a TODO 2020-02-19 23:46:08 -06:00
Johannes Doerfert f8ad735729 [Attributor] Use existing `returned` information better
We can look through calls with `returned` argument attributes when we
collect subsuming positions. This allows us to get existing attributes
from more places.
2020-02-19 23:46:07 -06:00
Johannes Doerfert a801ee869d [Attributor][FIX] Avoid setting wrong load/store alignments 2020-02-19 23:46:07 -06:00
Johannes Doerfert e1eed6c5b9 [Attributor] Generalize `getAssumedConstantInt` interface
We are often interested in an assumed constant and sometimes it has to
be an integer constant. Before we only looked for the latter, now we can
ask for either.
2020-02-19 22:33:51 -06:00
Johannes Doerfert 16188f9d70 [Attributor][FIX] Do not create new calls edge we cannot handle
If we propagate function pointers across function boundaries we can
create new call edges. These need to be represented in the CG if we run
as a CGSCC pass. In the new pass manager that is currently not handled
by the CallGraphUpdater so we need to prevent the situation for now.
2020-02-19 22:33:51 -06:00
Johannes Doerfert 1e99fc9d58 [Attributor] Add initial AAIsDead for arguments
We usually will ask for liveness of an argument anyway so we ended up
lazily creating the attribute anyway. However, that is not always the
case and even if it is we should go the eager route here. Various tests
show how this can improve the outcome. One test exposed a problem with
type mismatches between argument and call site argument, a fix is
included. For liveness various more tests were added as well.
2020-02-19 21:39:45 -06:00
Johannes Doerfert c6ac717aa7 [Attributor] Allow multiple uses of a casted function pointer
If a function pointer is casted into a different type the resulting
expression can be a constant. If so, it can be used multiple times which
cannot be handled by the AbstractCallSite constructor alone. Instead, we
follow the cast expression uses now explicitly during the call site
traversal.
2020-02-19 20:43:38 -06:00
Michael Kruse e4d20ec8ad [IndVarSimply] Fix assert/release build difference.
In builds with assertions enabled (!NDEBUG), IndVarSimplify does an
additional query to ScalarEvolution which may change future SCEV queries
since it fills the internal cache differently. The result is actually
only used with the -verify-indvars command line option. We fix the issue
by only calling SE->getBackedgeTakenCount(L) if -verify-indvars is
enabled such that only -verify-indvars shows the behavior, but not debug
builds themselves. Also add a remark to the description of
-verify-indvars about this behavior.

Fixes llvm.org/PR44815

Differential Revision: https://reviews.llvm.org/D74810
2020-02-19 14:36:22 -06:00
Florian Hahn c7fc0e5da6 Revert "[PatternMatch] Match XOR variant of unsigned-add overflow check."
This reverts commit e01a3d49c2.
and commit a6a585b803.

This causes a failure on GreenDragon:
http://lab.llvm.org:8080/green/view/LLDB/job/lldb-cmake/9597
2020-02-19 19:37:08 +01:00
Florian Hahn e01a3d49c2 [PatternMatch] Match XOR variant of unsigned-add overflow check.
Instcombine folds (a + b <u a) to (a ^ -1 <u b) and that does not match
the expected pattern in CodeGenPerpare via UAddWithOverflow.

This causes a regression over Clang 7 on both X86 and AArch64:
https://gcc.godbolt.org/z/juhXYV

This patch extends UAddWithOverflow to also catch the XOR case, if the
XOR is only used in the ICMP. This covers just a single case, but I'd
like to make sure I am not missing anything before tackling the other
cases.

Reviewers: nikic, RKSimon, lebedev.ri, spatel

Reviewed By: nikic, lebedev.ri

Differential Revision: https://reviews.llvm.org/D74228
2020-02-19 15:25:18 +01:00
Brian Gesiak 5a187d8ed1 [Coroutines][4/6] New pass manager: coro-cleanup
Summary:
Depends on https://reviews.llvm.org/D71900.

The fourth in a series of patches that ports the LLVM coroutines passes
to the new pass manager infrastructure. This patch implements
'coro-cleanup'.

No existing regression tests check the behavior of coro-cleanup on its
own, so this patch adds one. (A test named 'coro-cleanup.ll' exists, but
it relies on the entire coroutines pipeline being run. It's updated to
test the new pass manager in the 5th patch of this series.)

Reviewers: GorNishanov, lewissbaker, chandlerc, junparser, deadalnix, wenlei

Reviewed By: wenlei

Subscribers: wenlei, EricWF, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71901
2020-02-19 00:30:27 -05:00
Brian Gesiak 2365238b9d Re-land new pass manager coro-split and coro-elide
This re-applies patches https://reviews.llvm.org/D71899 and
https://reviews.llvm.org/D71900, which were reverted in
https://reviews.llvm.org/rG11053a1cc61 and
https://reviews.llvm.org/rGe999aa38d16. The underlying problem that
caused two buildbots to fail with these patches is explained in
https://reviews.llvm.org/rG26f356350bd -- older compliers disagree with
the order in which the left- and right-hand side of an assignment in
LazyCallGraph ought to be evaluated, which caused an assertion in
SmallVector::operator[] to fire when the test suite was run.
2020-02-19 00:11:23 -05:00
Reid Kleckner 0c2b09a9b6 [IR] Lazily number instructions for local dominance queries
Essentially, fold OrderedBasicBlock into BasicBlock, and make it
auto-invalidate the instruction ordering when new instructions are
added. Notably, we don't need to invalidate it when removing
instructions, which is helpful when a pass mostly delete dead
instructions rather than transforming them.

The downside is that Instruction grows from 56 bytes to 64 bytes.  The
resulting LLVM code is substantially simpler and automatically handles
invalidation, which makes me think that this is the right speed and size
tradeoff.

The important change is in SymbolTableTraitsImpl.h, where the numbering
is invalidated. Everything else should be straightforward.

We probably want to implement a fancier re-numbering scheme so that
local updates don't invalidate the ordering, but I plan for that to be
future work, maybe for someone else.

Reviewed By: lattner, vsk, fhahn, dexonsmith

Differential Revision: https://reviews.llvm.org/D51664
2020-02-18 14:44:24 -08:00
Fangrui Song 13a97305ba [JumpThreading] Skip unconditional PredBB when threading jumps through two basic blocks
Fixes https://bugs.llvm.org/show_bug.cgi?id=44922 (caused by 4698bf145d)

ThreadThroughTwoBasicBlocks assumes PredBBBranch is conditional. The following code can segfault.

  AddPHINodeEntriesForMappedBlock(PredBBBranch->getSuccessor(1), PredBB, NewBB,
                                  ValueMapping);

We can also allow unconditional PredBB, but the produced code is not
better.

Reviewed By: kazu

Differential Revision: https://reviews.llvm.org/D74747
2020-02-18 11:01:46 -08:00
Tyker c9e93c84f6 Add Query API for llvm.assume holding attributes
Reviewers: jdoerfert, sstefan1, uenoku

Reviewed By: jdoerfert

Subscribers: mgorny, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D72885
2020-02-18 19:42:07 +01:00
Huihui Zhang 8ee0e1dc02 [NFC] Silence compiler warning [-Wmissing-braces]. 2020-02-18 10:37:12 -08:00
Florian Hahn e32522ca17 [SLPVectorizer] Do not assume extracelement idx is a ConstantInt.
The index of an ExtractElementInst is not guaranteed to be a
ConstantInt. It can be any integer value. Check explicitly for
ConstantInts.

The new test cases illustrate scenarios where we crash without
this patch. I've also added another test case to check the matching
of extractelement vector ops works.

Reviewers: RKSimon, ABataev, dtemirbulatov, vporpo

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D74758
2020-02-18 18:16:06 +01:00
Nikita Popov ec6c623ff9 [SimplifyLibCalls] Accept IRBuilderBase; NFC 2020-02-18 17:59:07 +01:00
Nikita Popov 28ffe38bba [LoopUtils] Accept IRBuilderBase; NFC 2020-02-18 17:58:46 +01:00
Nikita Popov ed6d30b517 [BuildLibCalls] Accept IRBuilderBase; NFC
Accept IRBuilderBase instead of IRBuilder<>. Remove dependency
on IRBuilder from header.
2020-02-18 17:58:16 +01:00
Nikita Popov 1ab37fad61 [InstCombine] Fix worklist management when simplifying demanded bits
When simplifying demanded bits, we currently only report the
instruction on which SimplifyDemandedBits was called as changed.
However, this is a recursive call, and the actually modified
instruction will usually be further up the chain. Additionally,
all the intermediate instructions should also be revisited,
as additional combines may be possible after the demanded bits
simplification. We fix this by explicitly adding them back to the
worklist.

Differential Revision: https://reviews.llvm.org/D72944
2020-02-18 17:55:40 +01:00
Nikita Popov c9540fe59b [InstCombine] Fix multi-use handling in cttz transform
The select-of-cttz transform can currently duplicate cttz intrinsics
and zext/trunc ops. The cause is that it unnecessarily duplicates
the intrinsic and the zext/trunc when setting the "undef_on_zero"
flag to false. However, it's always legal to set the flag from true
to false, so we can make this replacement even if there are extra users.

Differential Revision: https://reviews.llvm.org/D74685
2020-02-18 17:55:00 +01:00
Nikita Popov 9adedd146d [InstCombine] Relax preconditions for ashr+and+icmp fold (PR44754)
Fix for https://bugs.llvm.org/show_bug.cgi?id=44754. We already have
a fold that converts icmp (and (ashr X, C3), C2), C1 into
icmp (and C2'), C1', but it imposed overly strict requirements on the
transform.

Relax this by checking that both C2 and C1 don't shift out bits
(in a signed sense) when forming the new constants.

Alive proofs (https://rise4fun.com/Alive/PTz0):

    Name: ashr_legal
    Pre: ((C2 << C3) >> C3) == C2 && ((C1 << C3) >> C3) == C1
    %a = ashr i16 %x, C3
    %b = and i16 %a, C2
    %c = icmp i16 %b, C1
    =>
    %d = and i16 %x, C2 << C3
    %c = icmp i16 %d, C1 << C3

    Name: ashr_shiftout_eq
    Pre: ((C2 << C3) >> C3) == C2 && ((C1 << C3) >> C3) != C1
    %a = ashr i16 %x, C3
    %b = and i16 %a, C2
    %c = icmp eq i16 %b, C1
    =>
    %c = false

Note that >> corresponds to ashr here. The case of an equality
comparison has some special handling in this transform, because
it will form to a true/false result if the condition on the comparison
constant it violated.

Differential Revision: https://reviews.llvm.org/D74294
2020-02-18 17:49:46 +01:00
Florian Hahn 9063022573 [InstCombin] Avoid nested Create calls, to guarantee order.
The original code allowed creating the != checks in unpredictable order,
causing http://lab.llvm.org:8011/builders/clang-cmake-x86_64-sde-avx512-linux/builds/34014
to fail.
2020-02-18 09:44:11 +01:00
Florian Hahn 6c85e92bcf [InstCombine] Simplify a umul overflow check to a != 0 && b != 0.
This patch adds a simplification if an OR weakens the overflow condition
for umul.with.overflow by treating any non-zero result as overflow. In that
case, we overflow if both umul.with.overflow operands are != 0, as in that
case the result can only be 0, iff the multiplication overflows.

Code like this is generated by code using __builtin_mul_overflow with
negative integer constants, e.g.
   bool test(unsigned long long v, unsigned long long *res) {
     return __builtin_mul_overflow(v, -4775807LL, res);
   }

```
----------------------------------------
Name: D74141
  %res = umul_overflow {i8, i1} %a, %b
  %mul = extractvalue {i8, i1} %res, 0
  %overflow = extractvalue {i8, i1} %res, 1
  %cmp = icmp ne %mul, 0
  %ret = or i1 %overflow, %cmp
  ret i1 %ret
=>
  %t0 = icmp ne i8 %a, 0
  %t1 = icmp ne i8 %b, 0
  %ret = and i1 %t0, %t1
  ret i1 %ret
  %res = umul_overflow {i8, i1} %a, %b
  %mul = extractvalue {i8, i1} %res, 0
  %cmp = icmp ne %mul, 0
  %overflow = extractvalue {i8, i1} %res, 1

Done: 1
Optimization is correct!

```

Reviewers: nikic, lebedev.ri, spatel, Bigcheese, dexonsmith, aemerson

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D74141
2020-02-18 09:11:55 +01:00
Brian Gesiak 11053a1cc6 Revert new pass manager coro-split and coro-elide
This reverts
https://reviews.llvm.org/rG7125d66f9969605d886b5286780101a45b5bed67 and
https://reviews.llvm.org/rG00fec8004aca6588d8d695a2c3827c3754c380a0 due
to buildbot failures:
http://lab.llvm.org:8011/builders/clang-cmake-x86_64-sde-avx512-linux/builds/34004
2020-02-17 23:55:10 -05:00
Brian Gesiak 00fec8004a [Coroutines][3/6] New pass manager: coro-elide
Summary:
Depends on https://reviews.llvm.org/D71899.

The third in a series of patches that ports the LLVM coroutines passes
to the new pass manager infrastructure. This patch implements 'coro-elide'.

The new pass manager infrastructure does not implicitly repeat CGSCC
pass pipelines when a function is devirtualized, and so the tests
for the new pass manager that rely on that behavior now explicitly
specify `repeat<2>`.

Reviewers: GorNishanov, lewissbaker, chandlerc, jdoerfert, junparser, deadalnix, wenlei

Reviewed By: wenlei

Subscribers: wenlei, EricWF, Prazek, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71900
2020-02-17 23:41:57 -05:00
Brian Gesiak 7125d66f99 [Coroutines][2/6] New pass manager: coro-split
Summary:
This patch has four dependencies:

1. The first in this series of patches that implement coroutine passes in the
   new pass manager: https://reviews.llvm.org/D71898.
2. A patch that introduces an API for CGSCC passes to add new reference
   edges to a `LazyCallGraph`, `updateCGAndAnalysisManagerForCGSCCPass`:
   https://reviews.llvm.org/D72025.
3. A patch that introduces a `CallGraphUpdater` helper class that is
   capable of mutating internal `LazyCallGraph` state in order to insert
   new function nodes into a specific SCC: https://reviews.llvm.org/D70927.
4. And finally, a small edge case fix for updating `LazyCallGraph` that
   patch 3 above happens to run into: https://reviews.llvm.org/D72226.

This is the second in a series of patches that ports the LLVM coroutines
passes to the new pass manager infrastructure. This patch implements
'coro-split'.

Some notes:
* Using the new CGSCC pass manager resulted in IR being printed in the
  reverse order in some tests. To prevent FileCheck checks from failing due
  to these reversed orders, this patch splits up test files that test
  multiple different coroutine functions: specifically
  coro-alloc-with-param.ll, coro-split-eh.ll, and coro-eh-aware-edge-split.ll.
* CoroSplit.cpp contained 2 overloads of `splitCoroutine`, one of which
  dispatched to the other based on the coroutine ABI being used (C++20
  switch-based versus Swift returned-continuation-based). I found this
  confusing, especially with the additional branching based on `CallGraph`
  vs. `LazyCallGraph`, so I removed the ABI-checking overload of
  `splitCoroutine`.

Reviewers: GorNishanov, lewissbaker, chandlerc, jdoerfert, junparser, deadalnix, wenlei

Reviewed By: wenlei

Subscribers: wenlei, qcolombet, EricWF, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71899
2020-02-17 23:35:27 -05:00
Vedant Kumar c74026daf3 [HotColdSplit] Mark entire function cold when entry block is cold
rdar://58855712
2020-02-17 15:57:50 -08:00
Nicolai Hähnle 58297e4d8f LowerMatrixIntrinsics: Avoid use of deprecated CreateCall methods
Reviewers: t.p.northover

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D74675
2020-02-18 00:24:09 +01:00
Tim Northover 464d4cf7e6 Coroutines: avoid use of deprecated CreateLoad and CreateCall methods
Summary: Patch originally by Tim Northover

Reviewers: t.p.northover

Subscribers: EricWF, hiraditya, modocache, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D74674
2020-02-18 00:24:09 +01:00
Brian Gesiak e9849d5195 [Coroutines][1/6] New pass manager: coro-early
Summary:
The first in a series of patches that ports the LLVM coroutines passes
to the new pass manager infrastructure. This patch implements
'coro-early'.

NB: All coroutines passes begin by checking that coroutine intrinsics are
declared within the LLVM IR module they're operating on. To do so, they call
`coro::declaresIntrinsics`. The next 3 patches in this series, which add new
pass manager implementations of the 'coro-split', 'coro-elide', and
'coro-cleanup' passes, use a similar pattern as the one used here: a static
function is shared across both old and new passes to detect if relevant
coroutine intrinsics are delcared. To make this pattern easier to read, this
patch adds `const` keywords to the parameters of `coro::declaresIntrinsics`.

Reviewers: GorNishanov, lewissbaker, junparser, chandlerc, deadalnix, wenlei

Reviewed By: wenlei

Subscribers: ychen, wenlei, EricWF, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71898
2020-02-17 13:27:48 -05:00
Nikita Popov 3eaa53e805 Reapply "[IRBuilder] Virtualize IRBuilder"
Relative to the original commit, this fixes some warnings,
and is based on the deletion of the IRBuilder copy constructor
in D74693. The automatic copy constructor would no longer be
safe.

-----

Related llvm-dev thread:
http://lists.llvm.org/pipermail/llvm-dev/2020-February/138951.html

This patch moves the IRBuilder from templating over the constant
folder and inserter towards making both of these virtual.
There are a couple of motivations for this:

1. It's not possible to share code between use-sites that use
different IRBuilder folders/inserters (short of templating the code
and moving it into headers).
2. Methods currently defined on IRBuilderBase (which is not templated)
do not use the custom inserter, resulting in subtle bugs (e.g.
incorrect InstCombine worklist management). It would be possible to
move those into the templated IRBuilder, but...
3. The vast majority of the IRBuilder implementation has to live
in the header, because it depends on the template arguments.
4. We have many unnecessary dependencies on IRBuilder.h,
because it is not easy to forward-declare. (Significant parts of
the backend depend on it via TargetLowering.h, for example.)

This patch addresses the issue by making the following changes:

* IRBuilderDefaultInserter::InsertHelper becomes virtual.
  IRBuilderBase accepts a reference to it.
* IRBuilderFolder is introduced as a virtual base class. It is
 implemented by ConstantFolder (default), NoFolder and TargetFolder.
  IRBuilderBase has a reference to this as well.
* All the logic is moved from IRBuilder to IRBuilderBase. This means
  that methods can in the future replace their IRBuilder<> & uses
  (or other specific IRBuilder types) with IRBuilderBase & and thus
  be usable with different IRBuilders.
* The IRBuilder class is now a thin wrapper around IRBuilderBase.
  Essentially it only stores the folder and inserter and takes care
  of constructing the base builder.

What this patch doesn't do, but should be simple followups after this change:

* Fixing use of the inserter for creation methods originally defined
  on IRBuilderBase.
* Replacing IRBuilder<> uses in arguments with IRBuilderBase, where useful.
* Moving code from the IRBuilder header to the source file.

From the user perspective, these changes should be mostly transparent:
The only thing that consumers using a custom inserted may need to do is
inherit from IRBuilderDefaultInserter publicly and mark their InsertHelper
as public.

Differential Revision: https://reviews.llvm.org/D73835
2020-02-17 19:04:11 +01:00
Nikita Popov 80397d2d12 [IRBuilder] Delete copy constructor
D73835 will make IRBuilder no longer trivially copyable. This patch
deletes the copy constructor in advance, to separate out the breakage.

Currently, the IRBuilder copy constructor is usually used by accident,
not by intention.  In rG7c362b25d7a9 I've fixed a number of cases where
functions accepted IRBuilder rather than IRBuilder &, thus performing
an unnecessary copy. In rG5f7b92b1b4d6 I've fixed cases where an
IRBuilder was copied, while an InsertPointGuard should have been used
instead.

The only non-trivial use of the copy constructor is the
getIRBForDbgInsertion() helper, for which I separated construction and
setting of the insertion point in this patch.

Differential Revision: https://reviews.llvm.org/D74693
2020-02-17 18:14:48 +01:00
Benjamin Kramer 564a9de28e Hide implementation details. NFC> 2020-02-17 17:55:23 +01:00
Benjamin Kramer 5fc5c7db38 Strength reduce vectors into arrays. NFCI. 2020-02-17 15:37:35 +01:00
Nikita Popov 5f7b92b1b4 [IRBuilder] Prefer InsertPointGuard over full copy; NFC
Don't copy the IRBuilder when an InsertPointGuard would also do.
2020-02-16 18:02:29 +01:00
Nikita Popov 7c362b25d7 [IRBuilder] Fix unnecessary IRBuilder copies; NFC
Fix a few cases where an IRBuilder is passed to a helper function
by value, while a by reference pass was intended.
2020-02-16 17:57:18 +01:00
Nikita Popov af480e8c63 Revert "[IRBuilder] Virtualize IRBuilder"
This reverts commit 0765d3824d.
This reverts commit 1b04866a3d.

Relevant looking crashes observed on:
http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-win
2020-02-16 17:01:10 +01:00
Sanjay Patel 62dd44d76d [VectorCombine] fix cost calc for extract-cmp
getOperationCost() is not the cost we wanted; that's not the
throughput value that the rest of the calculation uses.

We may want to switch everything in this code to use the
getInstructionThroughput() wrapper to avoid these kinds of
problems, but I'll look at that as a follow-up because that
can create other logical diffs via using optional parameters
(we'd need to speculatively create the vector instruction to
make a fair(er) comparison).
2020-02-16 10:40:28 -05:00
Nikita Popov 893c630fbe [InstCombine] Create new log2 intrinsic; NFCI
Rather than mixing creation of new instructions and in-place
modification here, create a new log2 intrinsic. This should be
NFC apart from worklist order changes.
2020-02-16 15:52:09 +01:00
Nikita Popov 1b04866a3d [IRBuilder] Try to fix warnings
Try to fix -Wnon-virtual-dtor warnings that cause build failure
on clang-pcc64le-rhel.
2020-02-16 15:32:11 +01:00
Nikita Popov 0765d3824d [IRBuilder] Virtualize IRBuilder
Related llvm-dev thread:
http://lists.llvm.org/pipermail/llvm-dev/2020-February/138951.html

This patch moves the IRBuilder from templating over the constant
folder and inserter towards making both of these virtual.
There are a couple of motivations for this:

1. It's not possible to share code between use-sites that use
different IRBuilder folders/inserters (short of templating the code
and moving it into headers).
2. Methods currently defined on IRBuilderBase (which is not templated)
do not use the custom inserter, resulting in subtle bugs (e.g.
incorrect InstCombine worklist management). It would be possible to
move those into the templated IRBuilder, but...
3. The vast majority of the IRBuilder implementation has to live
in the header, because it depends on the template arguments.
4. We have many unnecessary dependencies on IRBuilder.h,
because it is not easy to forward-declare. (Significant parts of
the backend depend on it via TargetLowering.h, for example.)

This patch addresses the issue by making the following changes:

* IRBuilderDefaultInserter::InsertHelper becomes virtual.
  IRBuilderBase accepts a reference to it.
* IRBuilderFolder is introduced as a virtual base class. It is
 implemented by ConstantFolder (default), NoFolder and TargetFolder.
  IRBuilderBase has a reference to this as well.
* All the logic is moved from IRBuilder to IRBuilderBase. This means
  that methods can in the future replace their IRBuilder<> & uses
  (or other specific IRBuilder types) with IRBuilderBase & and thus
  be usable with different IRBuilders.
* The IRBuilder class is now a thin wrapper around IRBuilderBase.
  Essentially it only stores the folder and inserter and takes care
  of constructing the base builder.

What this patch doesn't do, but should be simple followups after this change:

* Fixing use of the inserter for creation methods originally defined
  on IRBuilderBase.
* Replacing IRBuilder<> uses in arguments with IRBuilderBase, where useful.
* Moving code from the IRBuilder header to the source file.

From the user perspective, these changes should be mostly transparent:
The only thing that consumers using a custom inserted may need to do is
inherit from IRBuilderDefaultInserter publicly and mark their InsertHelper
as public.

Differential Revision: https://reviews.llvm.org/D73835
2020-02-16 13:48:55 +01:00
Johannes Doerfert 1d5da8cd30 [Attributor][FIX] Use pointer not reference as it can be null 2020-02-15 20:38:49 -06:00
Florian Hahn f8045b250d Recommit "[SCCP] Remove forcedconstant, go to overdefined instead"
This includes a fix for cases where things get marked as overdefined in
ResolvedUndefsIn, but we later discover a constant. To avoid crashing,
we consistently bail out on overdefined values in the visitors. This is
similar to the previous behavior with forcedconstant.

This reverts the revert commit 02b72f564c.
2020-02-15 18:36:44 +01:00
Simon Pilgrim 8a48c4a97c Fix boolean/bitwise operator precedence warnings. NFCI. 2020-02-15 13:53:18 +00:00
Johannes Doerfert ef746aa11f [Attributor] Collect memory accesses with their respective kind and location
In addition to a single bit per memory locations, e.g., globals and
arguments, we now collect more information about the actual accesses,
e.g., what instruction caused it, was it a read/write/read+write, and
what the underlying base pointer was. Follow up patches will make
explicit use of this.

Reviewed By: uenoku

Differential Revision: https://reviews.llvm.org/D73527
2020-02-15 02:12:04 -06:00
Fangrui Song fd5665af2c [Attributor] Fix -Wunused-variable for -DLLVM_ENABLE_ASSERTIONS=off builds after b4352e43d8 2020-02-14 21:47:19 -08:00
Johannes Doerfert b70297a39a [Attributor][FIX] Ensure abstract attributes are existing before manifest
While the function return updateImpl did only look at call sites the
manifest method looked at return values. If we don't do this during the
updateImpl we might create new abstract attributes during manifest. This
is a problem when it comes to liveness information.
2020-02-14 21:44:46 -06:00
Johannes Doerfert ad121ea14d [Attributor] Manifest simplified (return) values properly
If we simplify a function return value we have to modify the return
instructions.
2020-02-14 21:44:46 -06:00
Johannes Doerfert b53af0e7f9 [Attributor][FIX] Collapse `undef` to a proper value
If we see an undef we cannot assume it's the same as "no value". For now
we just collapse it to 0.
2020-02-14 21:44:46 -06:00
Johannes Doerfert 137c99a6a5 [Attributor][FIX] Restrict cross-SCC call deletion
If we know a call was not needed we might have ended up deleting it even
if it was in a different SCC. This prevents us from doing so.
2020-02-14 21:44:46 -06:00
Johannes Doerfert 32e98a7089 [Attributor][FIX] Carefully strip casts in AANoAlias
We can strip casts in AANoAlias but that might cause us to end up with a
non-pointer type. We do properly handle that case now.
2020-02-14 21:44:46 -06:00
Johannes Doerfert b4352e43d8 [Attributor][FIX] Do not RAUW void values
This caused an error when passes iterated over cached assumptions in the
tracker and assumed them to be `null` or an instruction. I failed to
create a test case so far.
2020-02-14 21:44:46 -06:00
Johannes Doerfert 282f5d7ad1 [Attributor] Derive memory location attributes (argmemonly, ...)
In addition to memory behavior attributes (readonly/writeonly) we now
derive memory location attributes (argmemonly/inaccessiblememonly/...).
The former is part of AAMemoryBehavior and the latter part of
AAMemoryLocation. While they are similar in nature it got messy when
they were put in a single AA. Location attributes for arguments and
floating values will follow later.

Note that both memory attributes kinds can derive readnone. If there are
no accesses AAMemoryBehavior will derive readnone. If there are accesses
but only to stack (=local) locations AAMemoryLocation will derive
readnone.

Reviewed By: uenoku

Differential Revision: https://reviews.llvm.org/D73426
2020-02-14 19:05:51 -06:00
Johannes Doerfert 7cbb107feb [Attributor][FIX] Validate the type for AAValueConstantRange as needed
Due to the genericValueTraversal we might visit values for which we did
not create an AAValueConstantRange object, e.g., as they are behind a
PHI or select or call with `returned` argument. As a consequence we need
to validate the types as we are about to query AAValueConstantRange for
operands.
2020-02-14 17:22:40 -06:00
Alina Sbirlea 1326a5a4cf [LoopRotate] Get and update MSSA only if available in legacy pass manager.
Summary:
Potential fix for: https://bugs.llvm.org/show_bug.cgi?id=44889 and https://bugs.llvm.org/show_bug.cgi?id=44408

In the legacy pass manager, loop rotate need not compute MemorySSA when not being in the same loop pass manager with other loop passes.
There isn't currently a way to differentiate between the two cases, so this attempts to limit the usage in LoopRotate to only update MemorySSA when the analysis is already available.
The side-effect of this is that it will split the Loop pipeline.

This issue does not apply to the new pass manager, where we have a flag specifying if all loop passes in that loop pass manager preserve MemorySSA.

Reviewers: dmgreen, fedor.sergeev, nikic

Subscribers: Prazek, hiraditya, george.burgess.iv, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D74574
2020-02-14 10:47:26 -08:00
Kadir Cetinkaya 1674f772b4
[VecotrCombine] Fix unused variable for assertion disabled builds 2020-02-14 09:30:29 +01:00
Vedant Kumar 8e77b33b3c [Local] Do not move around dbg.declares during replaceDbgDeclare
replaceDbgDeclare is used to update the descriptions of stack variables
when they are moved (e.g. by ASan or SafeStack). A side effect of
replaceDbgDeclare is that it moves dbg.declares around in the
instruction stream (typically by hoisting them into the entry block).
This behavior was introduced in llvm/r227544 to fix an assertion failure
(llvm.org/PR22386), but no longer appears to be necessary.

Hoisting a dbg.declare generally does not create problems. Usually,
dbg.declare either describes an argument or an alloca in the entry
block, and backends have special handling to emit locations for these.
In optimized builds, LowerDbgDeclare places dbg.values in the right
spots regardless of where the dbg.declare is. And no one uses
replaceDbgDeclare to handle things like VLAs.

However, there doesn't seem to be a positive case for moving
dbg.declares around anymore, and this reordering can get in the way of
understanding other bugs. I propose getting rid of it.

Testing: stage2 RelWithDebInfo sanitized build, check-llvm

rdar://59397340

Differential Revision: https://reviews.llvm.org/D74517
2020-02-13 14:35:02 -08:00
Sanjay Patel 19b62b79db [VectorCombine] try to form vector binop to eliminate an extract element
binop (extelt X, C), (extelt Y, C) --> extelt (binop X, Y), C

This is a transform that has been considered for canonicalization (instcombine)
in the past because it reduces instruction count. But as shown in the x86 tests,
it's impossible to know if it's profitable without a cost model. There are many
potential target constraints to consider.

We have implemented similar transforms in the backend (DAGCombiner and
target-specific), but I don't think we have this exact fold there either (and if
we did it in SDAG, it wouldn't work across blocks).

Note: this patch was intended to handle the more general case where the extract
indexes do not match, but it got too big, so I scaled it back to this pattern
for now.

Differential Revision: https://reviews.llvm.org/D74495
2020-02-13 17:23:27 -05:00
Vedant Kumar 02b72f564c Revert "Recommit "[SCCP] Remove forcedconstant, go to overdefined instead""
This reverts commit bb310b3f73. This
breaks the stage2 ASan build, see:

https://bugs.llvm.org/show_bug.cgi?id=44898

rdar://59431448
2020-02-13 11:55:18 -08:00
stozer 9bda7ab835 Re-revert: Recover debug intrinsics when killing duplicated/empty blocks
This reverts commit 61b35e4111.

This commit causes a timeout in chromium builds; likely to have a
similar cause to the previous timeout issue caused by this commit (see
6ded69f294 for more details). It is possible that there is no way to
fix this bug that will not cause this issue; further investigations as
to the efficiency of handling large amounts of debug info will be
necessary.
2020-02-13 11:48:19 +00:00
Johannes Doerfert 70cac41a2b Reapply "[OpenMP][IRBuilder] Perform finalization (incl. outlining) late"
Reapply 8a56d64d76 with minor fixes.

The problem was that cancellation can cause new edges to the parallel
region exit block which is not outlined. The CodeExtractor will encode
the information which "exit" was taken as a return value. The fix is to
ensure we do not return any value from the outlined function, to prevent
control to value conversion we ensure a single exit block for the
outlined region.

This reverts commit 3aac953afa.
2020-02-12 22:29:07 -06:00
Johannes Doerfert 3aac953afa Revert "[OpenMP][IRBuilder] Perform finalization (incl. outlining) late"
This reverts commit 8a56d64d76.

Will be recommitted once the clang test problem is addressed.
2020-02-12 18:50:43 -06:00
Johannes Doerfert 8a56d64d76 [OpenMP][IRBuilder] Perform finalization (incl. outlining) late
In order to fix PR44560 and to prepare for loop transformations we now
finalize a function late, which will also do the outlining late. The
logic is as before but the actual outlining step happens now after the
function was fully constructed. Once we have loop transformations we
can apply them in the finalize step before the outlining.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D74372
2020-02-12 17:55:01 -06:00
Johannes Doerfert 23f41f16d4 [Attributor] Use fine-grained liveness in all helpers
We used coarse-grained liveness before, thus we looked if the
instruction was executed, but we did not use fine-grained liveness,
hence if the instruction was needed or could be deleted even if the
surrounding ones are live. This patches introduces this level of
liveness checks together with other liveness queries, e.g., for uses.

For more control we enforce that all liveness queries go through the
Attributor.

Test have been adjusted to reflect the changes or augmented to prevent
deletion of the parts we want to check.

Reviewed By: sstefan1

Differential Revision: https://reviews.llvm.org/D73313
2020-02-12 17:36:38 -06:00
Johannes Doerfert b2c76002ca [Attributor] Ignore uses if a value is simplified
If we have a replacement for a value, via AAValueSimplify, the original
value will lose all its uses. Thus, as long as a value is simplified we
can skip the uses in checkForAllUses, given that these uses are
transitive uses for the simplified version and will therefore affect the
simplified version as necessary.

Since this allowed us to remove calls without side-effects and a known
return value, we need to make sure not to eliminate `musttail` calls.
Those we keep around, or later remove the entire `musttail` call chain.
2020-02-12 17:36:38 -06:00
Johannes Doerfert 86509e8c3b [Attributor] Use assumed information to determine side-effects
We relied on wouldInstructionBeTriviallyDead before but that functions
does not take assumed information, especially for calls, into account.
The replacement, AAIsDead::isAssumeSideEffectFree, does.

This change makes AAIsDeadCallSiteReturn more complex as we can have
a dead call or only dead users.

The test have been modified to include a side effect where there was
none in order to keep the coverage.

Reviewed By: sstefan1

Differential Revision: https://reviews.llvm.org/D73311
2020-02-12 17:36:38 -06:00
Ehud Katz d8a2ea9fd5 [LoopExtractor] Fix legacy pass dependencies
Fixes a memory leak of allocating `LoopInfoWrapperPass` and `DominatorTreeWrapperPass`.
2020-02-12 22:39:21 +02:00
Vedant Kumar 34d9f93977 [AddressSanitizer] Ensure only AllocaInst is passed to dbg.declare
Various parts of the LLVM code generator assume that the address
argument of a dbg.declare is not a `ptrtoint`-of-alloca. ASan breaks
this assumption, and this results in local variables sometimes being
unavailable at -O0.

GlobalISel, SelectionDAG, and FastISel all do not appear to expect
dbg.declares to have a `ptrtoint` as an operand. This means that they do
not place entry block allocas in the usual side table reserved for local
variables available in the whole function scope. This isn't always a
problem, as LLVM can try to lower the dbg.declare to a DBG_VALUE, but
those DBG_VALUEs can get dropped for all the usual reasons DBG_VALUEs
get dropped. In the ObjC test case I'm looking at, the cause happens to
be that `replaceDbgDeclare` has hoisted dbg.declares into the entry
block, causing LiveDebugValues to "kill" the DBG_VALUEs because the
lexical dominance check fails.

To address this, I propose:

1) Have ASan (always) pass an alloca to dbg.declares (this patch). This
   is a narrow bugfix for -O0 debugging.

2) Make replaceDbgDeclare not move dbg.declares around. This should be a
   generic improvement for optimized debug info, as it would prevent the
   lexical dominance check in LiveDebugValues from killing as many
   variables.

   This means reverting llvm/r227544, which fixed an assertion failure
   (llvm.org/PR22386) but no longer seems to be necessary. I was able to
   complete a stage2 build with the revert in place.

rdar://54688991

Differential Revision: https://reviews.llvm.org/D74369
2020-02-12 11:24:02 -08:00
Jay Foad 32aac25637 [KnownBits] Introduce anyext instead of passing a flag into zext
Summary:
This was a very odd API, where you had to pass a flag into a zext
function to say whether the extended bits really were zero or not. All
callers passed in a literal true or false.

I think it's much clearer to make the function name reflect the
operation being performed on the value we're tracking (rather than on
the KnownBits Zero and One fields), so zext means the value is being
zero extended and new function anyext means the value is being extended
with unknown bits.

NFC.

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D74482
2020-02-12 19:06:53 +00:00
Florian Hahn bb310b3f73 Recommit "[SCCP] Remove forcedconstant, go to overdefined instead"
This version includes a fix for a set of crashes caused by marking
values depending on a yet unknown & tracked call as overdefined.

In some cases, we would later discover that the call has a constant
result and try to mark a user of it as constant, although it was already
marked as overdefined. Most instruction handlers bail out early if the
instruction is already overdefined. But that is not necessary for
CastInsts for example. By skipping values that depend on skipped
calls, we resolve the crashes and also improve the precision in some
cases (see resolvedundefsin-tracked-fn.ll).

Note that we may not skip PHI nodes that may depend on a skipped call,
but they can be safely marked as overdefined, as we bail out early if
the PHI node is overdefined.

This reverts the revert commit
a74b31a3e9cd844c7ce2087978568e3f5ec8519.
2020-02-12 18:02:18 +00:00
Anh Tuyen Tran a5b6480d05 [NFC] Remove extra headers included in Loop Unroll and LoopUnrollAndJam files
Summary:
This refactor patch removes some header files which are not needed and also add some to meet IWYU principles.

Reviewers: rnk (Reid Kleckner), Meinersbur (Michael Kruse), dmgreen (Dave Green)

Reviewed By: dmgreen (Dave Green), rnk (Reid Kleckner), Meinersbur (Michael Kruse)

Subscribers: dmgreen (Dave Green), Whitney (Whitney Tsang), hiraditya (Aditya Kumar), zzheng (Z. Zheng), llvm-commits, LLVM

Tag: LLVM

Differential Revision: https://reviews.llvm.org/D73498
2020-02-12 17:57:56 +00:00
Alina Sbirlea 4f33a68973 Compute ORE, BPI, BFI in Loop passes.
Summary:
Passes ORE, BPI, BFI are not being preserved by Loop passes, hence it
is incorrect to retrieve these passes as cached.
This patch makes the loop passes in question compute a new instance.

In some of these cases, however, it may be beneficial to change the Loop pass to
a Function pass instead, similar to the change for LoopUnrollAndJam.

Reviewers: chandlerc, dmgreen, jdoerfert, reames

Subscribers: mehdi_amini, hiraditya, zzheng, steven_wu, dexonsmith, Whitney, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D72891
2020-02-12 09:15:18 -08:00
Sven van Haastregt 665dcdacc0 Add missing newlines at EOF; NFC 2020-02-12 15:57:25 +00:00
stozer 61b35e4111 Re-reapply: Recover debug intrinsics when killing duplicated/empty blocks
This reverts commit 636c93ed11.

The original patch caused build failures on TSan buildbots. Commit 6ded69f294
fixes this issue by reducing the rate at which empty debug intrinsics
propagate, reducing the memory footprint and preventing a fatal spike.
2020-02-12 14:36:30 +00:00
Florian Hahn 81dbb6aec6 Recommit "[DSE] Add first version of MemorySSA-backed DSE (Bottom up walk)."
This includes a fix for the santizier failures.

This reverts the revert commit
42f8b915eb.
2020-02-12 14:17:50 +00:00
Ayman Musa 35f02aa021 Revert "[AggressiveInstCombine] Add support for ICmp instr that feeds a select intsr's condition operand."
This reverts commit cf155150f9.
2020-02-12 15:04:49 +02:00
Ayman Musa cf155150f9 [AggressiveInstCombine] Add support for ICmp instr that feeds a select intsr's condition operand. 2020-02-12 15:01:27 +02:00
stozer ffeb64db35 Reapply "[DebugInfo] Prevent explosion of debug intrinsics during jump threading"
This reverts commit 6ded69f294.
2020-02-12 12:39:54 +00:00
Ayman Musa 3bda9059b8 [AggressiveInstCombine] Add support for select instruction.
Differential Revision: https://reviews.llvm.org/D72837
2020-02-12 13:59:34 +02:00
stozer 6ded69f294 Revert "[DebugInfo] Prevent explosion of debug intrinsics during jump threading"
This reverts commit fe6f6cd6b8.

Found test failure on several buildbots.
2020-02-12 11:48:00 +00:00
Ayman Musa 49a4d85f6d [NFC][AggressiveInstCombine] Remove redundant std::max.
Differential Revision: https://reviews.llvm.org/D74476
2020-02-12 13:47:40 +02:00
stozer fe6f6cd6b8 [DebugInfo] Prevent explosion of debug intrinsics during jump threading
This patch is a fix following the revert of 72ce759
(https://reviews.llvm.org/rG72ce759928e6dfee6a9efa310b966c19722352ba)
and fixes the failure that it caused.

The above patch failed on the Thread Sanitizer buildbot with an out of
memory error. After an investigation, the cause was identified as an
explosion in debug intrinsics while running the Jump Threading pass on
ModuleMap.ll. The above patched prevented debug intrinsics from being
dropped when their Basic Block was deleted due to being "empty". In this
case, one of the functions in ModuleMap.ll had (after many optimization
passes) a very large number of debug intrinsics representing a set of
repeatedly inlined variables. Previously the vast majority of these were
silently dropped during Jump Threading when their blocks were deleted,
but as of the above patch they survived for longer, causing a large
increase in the number of debug intrinsics. These intrinsics were then
repeatedly cloned by the Jump Threading pass as edges were threaded,
multiplying the intrinsic count further. The memory consumed by this
process spiralled out of control, crashing the buildbot that uses TSan
(which has an estimated 5-10x memory overhead compared to non-sanitized
builds).

This patch adds RemoveRedundantDbgInstrs to the Jump Threading pass, in
order to reduce the number of debug intrinsics down to a manageable
amount in cases where many intrinsics for the same variable end up
bunched together contiguously, as in this case.

Differential Revision: https://reviews.llvm.org/D73054
2020-02-12 11:22:54 +00:00
Florian Hahn fa74b31a3e Revert "[SCCP] Remove forcedconstant, go to overdefined instead"
This causes a crash for the reproducer below

 enum { a };
 enum b { c, d };
 e;
 static _Bool g(struct f *h, enum b i) {
   i &&j();
   return a;
 }
 static k(char h, enum b i) {
   _Bool l = g(e, i);
   l;
 }
 m(h) {
   k(h, c);
   g(h, d);
 }

This reverts commit aadb635e04.
2020-02-12 09:41:19 +00:00
Matt Arsenault 86f9117d47 AMDGPU: Don't report 2-byte alignment as fast
This is apparently worse than 1-byte alignment. This does not attempt
to decompose 2-byte aligned wide stores, but will stop trying to
produce them.

Also fix bug in LoadStoreVectorizer which was decreasing the alignment
and vectorizing stack accesses. It was assuming a stack object was an
alloca that could have its base alignment changed, which is not true
if the pointer is derived from a function argument.
2020-02-11 18:35:00 -05:00
Johannes Doerfert 52aec3221f [Attributor][NFC] Clarify the documentation a bit more 2020-02-11 15:11:55 -06:00
Johannes Doerfert 8e62968d45 [Attributor] Identify dead uses in PHIs (almost) based on dead edges
As an approximation to a dead edge we can check if the terminator is
dead. If so, the corresponding operand use in a PHI node is dead even if
the PHI node itself is not.
2020-02-11 15:11:55 -06:00
Teresa Johnson 80d0a137a5 Restore "[WPD/LowerTypeTests] Delay lowering/removal of type tests until after ICP"
This restores commit 748bb5a0f1, along
with a fix for a Chromium test suite build issue (and a new test for
that case).

Differential Revision: https://reviews.llvm.org/D73242
2020-02-11 10:48:05 -08:00
Johannes Doerfert 185e9b083e [Attributor][NFC] Improve documentation 2020-02-11 11:19:34 -06:00
Johannes Doerfert f95553923f [Attributor] Return uses do not free pointers
If a pointer is returned that does not mean it is freed in the current
(function) scope. We can ignore such uses in AANoFree.
2020-02-11 11:02:59 -06:00
Johannes Doerfert 4c62a35860 [Attributor][FIX] Remove duplicate, half-broken functionality
The changeXXXAfterManifest functions are better suited to deal with
changes so we should prefer them. These functions also recursively
delete dead instructions which is why we see test changes.
2020-02-11 11:02:59 -06:00
Johannes Doerfert 77a9e61c9a [Attributor][NFC] Improve debug message 2020-02-11 11:02:59 -06:00
Nikita Popov 5a8819b216 [InstCombine] Use replaceOperand() in more places
This is a followup to D73803, which uses the replaceOperand()
helper in more places.

This should be NFC apart from changes to worklist order.

Differential Revision: https://reviews.llvm.org/D73919
2020-02-11 17:38:23 +01:00
Florian Hahn aadb635e04 [SCCP] Remove forcedconstant, go to overdefined instead
This patch removes forcedconstant to simplify things for the
move to ValueLattice, which includes constant ranges, but no
forced constants.

This patch removes forcedconstant and changes ResolvedUndefsIn
to mark instructions with unknown operands as overdefined. This
means we do not do simplifications based on undef directly in SCCP
any longer, but this seems to hardly come up in practice (see stats
below), presumably because InstCombine & others take care
of most of the relevant folds already.

It is still beneficial to keep ResolvedUndefIn, as it allows us delaying
going to overdefined until we propagated all known information.

I also built MultiSource, SPEC2000 and SPEC2006 and compared
sccp.IPNumInstRemoved and sccp.NumInstRemoved. It looks like the impact
is quite low:

Tests: 244
Same hash: 238 (filtered out)
Remaining: 6
Metric: sccp.IPNumInstRemoved

Program                                        base     patch    diff
 test-suite...arks/VersaBench/dbms/dbms.test     4.00    3.00  -25.0%
 test-suite...TimberWolfMC/timberwolfmc.test    38.00   34.00  -10.5%
 test-suite...006/453.povray/453.povray.test   158.00  155.00  -1.9%
 test-suite.../CINT2000/176.gcc/176.gcc.test   668.00  668.00   0.0%
 test-suite.../CINT2006/403.gcc/403.gcc.test   1209.00 1209.00  0.0%
 test-suite...arks/mafft/pairlocalalign.test    76.00   76.00   0.0%

Tests: 244
Same hash: 238 (filtered out)
Remaining: 6
Metric: sccp.NumInstRemoved

Program                                        base    patch     diff
 test-suite...arks/mafft/pairlocalalign.test   185.00  175.00  -5.4%
 test-suite.../CINT2006/403.gcc/403.gcc.test   2059.00 2056.00 -0.1%
 test-suite.../CINT2000/176.gcc/176.gcc.test   2358.00 2357.00 -0.0%
 test-suite...006/453.povray/453.povray.test   317.00  317.00   0.0%
 test-suite...TimberWolfMC/timberwolfmc.test    12.00   12.00   0.0%

Reviewers: davide, efriedma, mssimpso

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D61314
2020-02-11 15:24:15 +00:00
Kadir Cetinkaya 42f8b915eb
Revert "[DSE] Add first version of MemorySSA-backed DSE (Bottom up walk)."
This reverts commit d0c4d4fe09.

Revert "[DSE,MSSA] Move more passing test cases from todo to simple.ll."

This reverts commit 02266e64bb.

Revert "[DSE,MSSA] Adjust mda-with-dbg-values.ll to MSSA backed DSE."

This reverts commit 74f03e4ff0.
2020-02-11 15:34:48 +01:00
Sanjay Patel a2a0f9a43a [VectorCombine] remove unused debug counter; NFC
The variable was added to the initial commit via copy/paste of existing
code, but it wasn't actually used in the code. We can add it back with
the proper usage if/when that is needed.
2020-02-11 08:24:07 -05:00
Sanjay Patel b8ebc11f03 [EarlyCSE] avoid crashing when detecting min/max/abs patterns (PR41083)
As discussed in PR41083:
https://bugs.llvm.org/show_bug.cgi?id=41083
...we can assert/crash in EarlyCSE using the current hashing scheme and
instructions with flags.

ValueTracking's matchSelectPattern() may rely on overflow (nsw, etc) or
other flags when detecting patterns such as min/max/abs composed of
compare+select. But the value numbering / hashing mechanism used by
EarlyCSE intersects those flags to allow more CSE.

Several alternatives to solve this are discussed in the bug report.
This patch avoids the issue by doing simple matching of min/max/abs
patterns that never requires instruction flags. We give up some CSE
power because of that, but that is not expected to result in much
actual performance difference because InstCombine will canonicalize
these patterns when possible. It even has this comment for abs/nabs:

  /// Canonicalize all these variants to 1 pattern.
  /// This makes CSE more likely.

(And this patch adds PhaseOrdering tests to verify that the expected
transforms are still happening in the standard optimization pipelines.

I left this code to use ValueTracking's "flavor" enum values, so we
don't have to change the callers' code. If we decide to go back to
using the ValueTracking call (by changing the hashing algorithm
instead), it should be obvious how to replace this chunk.

Differential Revision: https://reviews.llvm.org/D74285
2020-02-10 17:25:34 -05:00
Hiroshi Yamauchi bb383ae612 [CallPromotionUtils] Add tryPromoteCall.
Summary: It attempts to devirtualize a call on alloca through vtable loads.

Reviewers: davidxl

Subscribers: mgorny, Prazek, hiraditya, llvm-commits

Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71308
2020-02-10 13:43:16 -08:00
Sanjay Patel 62ce7e650a [InstCombine] fix use check when canonicalizing abs/nabs
We were checking for extra uses of the negated operand even
if we were not going to create it as part of this canonicalization.

This was showing up as a regression when we limit EarlyCSE as
proposed in D74285.
2020-02-10 14:57:37 -05:00
David Stenberg 982944525c Revert "[InstCombine][DebugInfo] Fold constants wrapped in metadata"
This reverts commit b54a8ec1bc.

The commit triggered debug invariance (different output with/without
-g). The patch seems to have exposed a pre-existing invariance problem
in GlobalOpt, which I'll write a bug report for.
2020-02-10 17:58:33 +01:00
Bill Wendling c55cf4afa9 Revert "Remove redundant "std::move"s in return statements"
The build failed with

  error: call to deleted constructor of 'llvm::Error'

errors.

This reverts commit 1c2241a793.
2020-02-10 07:07:40 -08:00
Bill Wendling 1c2241a793 Remove redundant "std::move"s in return statements 2020-02-10 06:39:44 -08:00
Mikael Holmen a50c0b0df7 Fix compiler warning when compiling without asserts [NFC] 2020-02-10 13:55:52 +01:00
Florian Hahn d0c4d4fe09 [DSE] Add first version of MemorySSA-backed DSE (Bottom up walk).
This patch adds a first version of a MemorySSA based DSE. It is missing
a lot of features, which will get added as follow-ups, to help to keep
the review manageable.

The patch uses the following general approach: given a MemoryDef, walk
upwards to find clobbering MemoryDefs that may be killed by the
starting def. Then check that there are no uses that may read the
location of the original MemoryDef in between both MemoryDefs. A bit
more concretely:

For all MemoryDefs StartDef:
1. Get the next dominating clobbering MemoryDef (DomAccess) by walking upwards.
2. Check that there no reads between DomAccess and the StartDef by checking
   all uses starting at DomAccess and walking until we see StartDef.
3. For each found DomDef, check that:
  1. There are no barrier instructions between DomDef and StartDef (like
     throws or stores with ordering constraints).
  2. StartDef is executed whenever DomDef is executed.
3. StartDef completely overwrites DomDef.
4. Erase DomDef from the function and MemorySSA.

The patch uses a very simple approach to guarantee that no throwing
instructions are between 2 stores: We only allow accesses to stack
objects, access that are in the same basic block if the block does not
contain any throwing instructions or accesses in functions that do
not contain any throwing instructions. This will get lifted later.

Besides adding support for the missing cases, there is plenty of additional
potential for improvements as follow-up work, e.g. the way we visit stores
(could be just a traversal of the MemorySSA, rather than collecting them
up-front), using the alias information discovered during walking to optimize
the MemorySSA.

This is loosely based on D40480 by Dave Green.

Reviewers: dmgreen, rnk, efriedma, bryant, asbirlea, Tyker

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D72700
2020-02-10 11:52:11 +00:00
Florian Hahn da52b9c118 [DSE] Add tests for MemorySSA based DSE.
This copies the DSE tests into a MSSA subdirectory to test the MemorySSA
backed DSE implementation, without disturbing the original tests.

Differential Revision: https://reviews.llvm.org/D72145
2020-02-10 10:28:43 +00:00
Johannes Doerfert 87ddf1f4fa [Attributor] Simple casts preserve no-alias property
This is a minimal but important advancement over the existing code. A
cast with an operand that is only used in the cast retains the no-alias
property of the operand.
2020-02-10 01:11:32 -06:00
Johannes Doerfert 8155439331 [Attributor] Allow PHI nodes in AAValueConstantRangeFloating
Traversing PHI nodes is natural with the genericValueTraversal but also
a bit tricky. The problem is similar to the ones we have seen in AAAlign
and AADereferenceable, namely that we continue to increase the range in
each iteration. We use a pessimistic approach here to stop the
iterations. Nevertheless, optimistic information can now be propagated
through a PHI node.
2020-02-10 00:55:10 -06:00
Johannes Doerfert 63adbb9a0e [Attributor][FIX] Remove FIXME that seems outdated
The change is performed as stated by the FIXME and the tests are
adjusted. All changes look fine to me and values can be inferred as
undef without it being an error.
2020-02-10 00:55:10 -06:00
Johannes Doerfert 7e7e6594b3 [Attributor] Allow SelectInst in AAValueConstantRangeFloating
The genericValueTraversal will already handle SelectInst properly and we
just needed to allow them in the initialize method.
2020-02-10 00:55:09 -06:00
Johannes Doerfert ffdbd2a06c [Attributor] Look through (some) casts in AAValueConstantRangeFloating
Casts can be handled natively by the ConstantRange class. We do limit it
to extends for now as we assume an integer type in different locations.
A TODO and a test case with a FIXME was added to remove that restriction
in the future.
2020-02-10 00:38:01 -06:00
Johannes Doerfert 028db8c490 [Attributor][FIX] Call right base method in AAValueConstantRangeFloating
We now call the base class method as we should.
2020-02-10 00:38:01 -06:00
Michael Liao ab3da5dd66 Fix `-Wparentheses` warning. NFC. 2020-02-10 00:45:02 -05:00
Sanjay Patel a17f03bd93 [VectorCombine] new IR transform pass for partial vector ops
We have several bug reports that could be characterized as "reducing scalarization",
and this topic was also raised on llvm-dev recently:
http://lists.llvm.org/pipermail/llvm-dev/2020-January/138157.html
...so I'm proposing that we deal with these patterns in a new, lightweight IR vector
pass that runs before/after other vectorization passes.

There are 4 alternate options that I can think of to deal with this kind of problem
(and we've seen various attempts at all of these), but they all have flaws:

    InstCombine - can't happen without TTI, but we don't want target-specific
                  folds there.
    SDAG - too late to assist other vectorization passes; TLI is not equipped
           for these kind of cost queries; limited to a single basic block.
    CGP - too late to assist other vectorization passes; would need to re-implement
          basic cleanups like CSE/instcombine.
    SLP - doesn't fit with existing transforms; limited to a single basic block.

This initial patch/transform is based on existing code in AggressiveInstCombine:
we walk backwards through the function looking for a pattern match. But we diverge
from that cost-independent IR canonicalization pass by using TTI to decide if the
vector alternative is profitable.

We probably have at least 10 similar bug reports/patterns (binops, constants,
inserts, cheap shuffles, etc) that would fit in this pass as follow-up enhancements.
It's possible that we could iterate on a worklist to fix-point like InstCombine does,
but it's safer to start with a most basic case and evolve from there, so I didn't
try to do anything fancy with this initial implementation.

Differential Revision: https://reviews.llvm.org/D73480
2020-02-09 10:04:41 -05:00
Ehud Katz 3b70ee27a5 [LoopExtractor] Convert LoopExtractor from LoopPass to ModulePass
The LoopExtractor created new functions (by definition), which violates
the restrictions of a LoopPass.
The correct implementation of this pass should be as a ModulePass.
Includes reverting rL82990 implications on the LoopExtractor.

Fixes PR3082 and PR8929.

Differential Revision: https://reviews.llvm.org/D69069
2020-02-09 12:25:21 +02:00
Johannes Doerfert b0c77c36d2 [Attributor] Add an Attributor CGSCC pass and run it
In addition to the module pass, this patch introduces a CGSCC pass that
runs the Attributor on a strongly connected component of the call graph
(both old and new PM). The Attributor was always design to be used on a
subset of functions which makes this patch mostly mechanical.

The one change is that we give up `norecurse` deduction in the module
pass in favor of doing it during the CGSCC pass. This makes the
interfaces simpler but can be revisited if needed.

Reviewed By: hfinkel

Differential Revision: https://reviews.llvm.org/D70767
2020-02-08 21:27:34 -06:00
Johannes Doerfert e565db49c6 [OpenMP][Opt] Delete terminating and read-only parallel regions
Parallel regions known to be read-only, e.g., after we removed all dead
write accesses, and terminating (`willreturn`) can be removed.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D69954
2020-02-08 18:52:04 -06:00
Johannes Doerfert e28936f613 [OpenMP][Opt] Annotate known runtime functions and deduplicate more
This adds ~27 more runtime calls to the OpenMPKinds.def file, all with
attributes. We deduplicate 16 of those automatically in function =
thread scope. And we annotate all of them automatically during the
OpenMPOpt discovery step. A test with all omp_XXXX runtime calls to
track annotation coverage is included.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D69984
2020-02-08 18:35:39 -06:00
Nikita Popov a05932931c [InstCombine] Refactor foldICmpAndShift(); NFCI
Separate out handling for shl, lshr and ashr. The combined handling
obscured some overly pessimistic requirements for the transform.
2020-02-08 22:27:43 +01:00
Johannes Doerfert 9548b74a83 [OpenMP] Introduce the OpenMPOpt transformation pass
The OpenMPOpt pass is a CGSCC pass in which OpenMP specific
optimizations can reside.

The OpenMPOpt pass uses the OpenMPKinds.def file to identify runtime
calls and their uses. This allows targeted transformations and eases
their implementation.

This initial patch deduplicates `__kmpc_global_thread_num` and
`omp_get_thread_num` calls. We can also identify arguments that are
equivalent to such a call result and use it instead. Later we can
determine "gtid" arguments based on the use in kernel functions etc.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D69930
2020-02-08 14:47:03 -06:00
Johannes Doerfert 72277ecd62 Introduce a CallGraph updater helper class
The CallGraphUpdater is a helper that simplifies the process of updating
the call graph, both old and new style, while running an CGSCC pass.

The uses are contained in different commits, e.g. D70767.

More functionality is added as we need it.

Reviewed By: modocache, hfinkel

Differential Revision: https://reviews.llvm.org/D70927
2020-02-08 14:16:48 -06:00
George Burgess IV f8c9ceb1ce [SimplifyLibCalls] Add __strlen_chk.
Bionic has had `__strlen_chk` for a while. Optimizing that into a
constant is quite profitable, when possible.

Differential Revision: https://reviews.llvm.org/D74079
2020-02-08 11:51:00 -08:00
Nikita Popov a148b9e990 [InstCombine] Fix infinite min/max canonicalization loop (PR44541)
While D72944 also fixes https://bugs.llvm.org/show_bug.cgi?id=44541,
it does so in a more roundabout manner and there might be other
loopholes to trigger the same issue. This is a more direct fix,
that prevents the transform if the min/max is based on a
non-canonical sub X, 0 instruction.

Differential Revision: https://reviews.llvm.org/D73849
2020-02-08 20:42:17 +01:00
Nikita Popov 5b2b67be8e [InstCombine] Remove unnecessary worklist push; NFCI
This is no longer needed after d4627b90a0,
should have dropped it there...
2020-02-08 17:09:28 +01:00
Nikita Popov d4627b90a0 [InstCombine] Avoid modifying instructions in-place
As discussed on D73919, this replaces a few cases where we were
modifying multiple operands of instructions in-place with the
creation of a new instruction, which we generally prefer nowadays.

This tends to be more readable and less prone to worklist management
bugs.

Test changes are only superficial (instruction naming and order).
2020-02-08 17:05:56 +01:00
Nikita Popov 9d03b7d0d0 [InstCombine] Use swapValues(); NFC
Less code, and makes it more obvious that these operands do not
need to be added back to the worklist.
2020-02-08 16:57:28 +01:00
Nikita Popov 23db9724d0 [InstCombine] Fix infinite loop in min/max load/store bitcast combine (PR44835)
Fixes https://bugs.llvm.org/show_bug.cgi?id=44835. Skip the transform
if it wouldn't actually do anything (apart from removing and reinserting
the same instructions).

Note that the test case doesn't loop on current master anymore, only
on the LLVM 10 release branch. The issue is already mitigated on master
due to worklist order fixes, but we should fix the root cause there as well.

As a side note, we should probably assert in combineLoadToNewType()
that it does not combine to the same type. Not doing this here, because
this assertion would also be triggered in another place right now.

Differential Revision: https://reviews.llvm.org/D74278
2020-02-08 16:55:22 +01:00
Akira Hatanaka 4dcc029edb [ObjC][ARC] Keep track of phis that have been discovered to avoid an
infinite loop

This fixes a bug introduced in 6770fbb314.

rdar://problem/59137105
2020-02-07 20:33:11 -08:00
Akira Hatanaka 6770fbb314 [ObjC][ARC] Delete ARC runtime calls that take inert phi values
This improves on the following patch, which removed ARC runtime calls
taking inert global variables:

https://reviews.llvm.org/D62433

rdar://problem/59137105
2020-02-07 16:31:36 -08:00
Hiroshi Yamauchi 4ed205c816 [PGO][PGSO] Enable profile guided size optimization for non-cold code under instrumentation PGO.
Summary:
This enables it for large working set size cases only.

This does not enable it under sample PGO.

Reviewers: davidxl

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D74073
2020-02-06 10:29:01 -08:00
Denis Antrushin 99a6e405ed [IRCE] Use SCEVExpander to modify loop bound
IRCE pass checks that it can calculate loop bounds by checking
SCEV availability at loop entry. However it is possible that loop
bound SCEV is loop invariant, but instruction used to compute it
resides within loop. In such case adjusting loop bound in preheader
using IRBuilder leads to malformed SSA.
Use SCEVExpander instead to generate proper instructions.

Reviewed-by: mkazantsev
Differential Revision: https://reviews.llvm.org/D73496
2020-02-06 12:44:43 +03:00
Teresa Johnson 25aa2eef99 Revert "[WPD/LowerTypeTests] Delay lowering/removal of type tests until after ICP"
This reverts commit 748bb5a0f1.

Due to Chromium CFI+ThinLTO test crashes reported on patch.
2020-02-05 19:27:32 -08:00
Juneyoung Lee 5687acf431 [MemCpyOpt] Simplify find*Alignment 2020-02-06 06:42:07 +09:00
Juneyoung Lee ad9ae6ee2b MemCpyOpt cannot use ABI alignment even if it was not given
Summary: This patch fixes https://bugs.llvm.org/show_bug.cgi?id=44388 which incorrectly assigns an ABI alignment to memset when there was no explicit alignment given.

Reviewers: gchatelet, lenary, nikic

Reviewed By: nikic

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D74083
2020-02-06 06:21:55 +09:00
Hiroshi Yamauchi b70f23f599 [PGO][PGSO] Tune flags for profile guided size optimization.
Summary:
Tune the profile threshold flag value for instrumentation PGO based on internal
benchmarks.

Also, add flags to allow profile guided size optimizations for non-cold code
to be enabled separately for instrumentation and sample PGSO.

Neither changes the default behavior (yet) as it's disabled for non-cold code.

Reviewers: davidxl

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D72937
2020-02-05 09:37:32 -08:00
Kazu Hirata 4698bf145d Resubmit^2: [JumpThreading] Thread jumps through two basic blocks
This reverts commit 41784bed01.

Since the original revision ead815924e,
this revision fixes three issues:

- This revision fixes the Windows build.  My original patch improperly
  copied EH pads on Windows.  This patch disregards jump threading
  opportunities having to do with EH pads.

- This revision fixes jump threading to a wrong destination.
  Specifically, my original patch treated any Constant other than 0 as 1
  while evaluating the branch condition.  This bug led to treating
  constant expressions like:

    icmp ugt i8* null, inttoptr (i64 4 to i8*)

  to "true".  This patch fixes the bug by calling isOneValue.

- This revision fixes the cost calculation of two basic blocks being
  threaded through.  Note that getJumpThreadDuplicationCost returns
  "(unsigned)~0" for those basic blocks that cannot be duplicated.  If
  we sum of two return values from getJumpThreadDuplicationCost, we
  could have an unsigned overflow like:

    (unsigned)~0 + 5 = 4

  and mistakenly determine that it's safe and profitable to proceed
  with the jump threading opportunity.  The patch fixes the bug by
  checking each return value before summing them up.

[JumpThreading] Thread jumps through two basic blocks

Summary:
This patch teaches JumpThreading.cpp to thread through two basic
blocks like:

  bb3:
    %var = phi i32* [ null, %bb1 ], [ @a, %bb2 ]
    %tobool = icmp eq i32 %cond, 0
    br i1 %tobool, label %bb4, label ...

  bb4:
    %cmp = icmp eq i32* %var, null
    br i1 %cmp, label bb5, label bb6

by duplicating basic blocks like bb3 above.  Once we duplicate bb3 as
bb3.dup and redirect edge bb2->bb3 to bb2->bb3.dup, we have:

  bb3:
    %var = phi i32* [ @a, %bb2 ]
    %tobool = icmp eq i32 %cond, 0
    br i1 %tobool, label %bb4, label ...

  bb3.dup:
    %var = phi i32* [ null, %bb1 ]
    %tobool = icmp eq i32 %cond, 0
    br i1 %tobool, label %bb4, label ...

  bb4:
    %cmp = icmp eq i32* %var, null
    br i1 %cmp, label bb5, label bb6

Then the existing code in JumpThreading.cpp can thread edge
bb3.dup->bb4 through bb4 and eventually create bb3.dup->bb5.

Reviewers: wmi

Subscribers: hiraditya, jfb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70247
2020-02-05 09:23:37 -08:00
Alina Sbirlea 67904db23c [IRCE] Make IRCE a Function pass.
Summary: Make InductiveRangeCheckElimination a FunctionPass.

Reviewers: reames, mkazantsev

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73592
2020-02-05 09:22:41 -08:00
Teresa Johnson 748bb5a0f1 [WPD/LowerTypeTests] Delay lowering/removal of type tests until after ICP
Summary:
Currently type test assume sequences inserted for devirtualization are
removed during WPD. This patch delays their removal until later in the
optimization pipeline. This is an enabler for upcoming enhancements to
indirect call promotion, for example streamlined promotion guard
sequences that compare against vtable address instead of the target
function, when there are small number of possible vtables (either
determined via WPD or by in-progress type profiling). We need the type
tests to correlate the callsites with the address point offset needed in
the compare sequence, and optionally to associated type summary info
computed during WPD.

This depends on work in D71913 to enable invocation of LowerTypeTests to
drop type test assume sequences, which will now be invoked following ICP
in the ThinLTO post-LTO link pipelines, and also after the existing
export phase LowerTypeTests invocation in regular LTO (which is already
after ICP). We cannot simply move the existing import phase
LowerTypeTests pass later in the ThinLTO post link pipelines, as the
comment in PassBuilder.cpp notes (it must run early because when
performing CFI other passes may disturb the sequences it looks for).

This necessitated adding a new type test resolution "Unknown" that we
can use on the type test assume sequences previously removed by WPD,
that we now want LTT to ignore.

Depends on D71913.

Reviewers: pcc, evgeny777

Subscribers: mehdi_amini, Prazek, hiraditya, steven_wu, dexonsmith, arphaman, davidxl, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D73242
2020-02-05 08:59:48 -08:00
Alina Sbirlea 1c03cc5a39 [NFCI] Update according to style.
clang-tidy + clang-format
2020-02-04 17:11:36 -08:00
Sanjay Patel dc42ff6697 [InstCombine] add FIXME comment to shuffle transform; NFC
Existing tests:
rG5d04e008f708
rG2a191cf8500f
...should verify that the underlying analysis doesn't improve
too much without updating this user code.
2020-02-04 13:02:06 -05:00
Matt Arsenault a3c814d234 Separately track input and output denormal mode
AMDGPU and x86 at least both have separate controls for whether
denormal results are flushed on output, and for whether denormals are
implicitly treated as 0 as an input. The current DAGCombiner use only
really cares about the input treatment of denormals.
2020-02-04 12:59:21 -05:00
Sanjay Patel 0cf0be993c [InstCombine] fix operands of shouldChangeType() for casted phi transform
This is a bug noted in the recent D72733 and seen
in the similar transform just above the changed source code.

I added tests with illegal types and zexts to show the bug -
we could transform legal phi ops to illegal, etc. I did not add
tests with trunc because we won't see any diffs on those patterns.
That is because InstCombiner::SliceUpIllegalIntegerPHI() appears to
do those transforms independently of datalayout. It can also create
more casts than are present in existing code.

There are some existing regression tests that do not include a
datalayout that would be altered by this fix. I assumed that the
lack of a datalayout in those regression files is an oversight, so
I added the minimal layout (make i32 legal) necessary to preserve
behavior on those tests.

Differential Revision: https://reviews.llvm.org/D73907
2020-02-04 07:45:48 -05:00
Thomas Raoux e53bbf1213 [GVN] Add GVNOption to control load-pre more fine-grained.
Adds the global (cl::opt) GVNOption enable-load-in-loop-pre in order
to control whether the optimization will be performed if the load
is part of a loop.

Patch by Hendrik Greving!

Differential Revision: https://reviews.llvm.org/D73804
2020-02-03 23:00:58 -08:00
Tyker 15f54d348b [NFC] Factor out function to detect if an attribute has an argument. 2020-02-03 22:27:24 +01:00
Alina Sbirlea 388de9dfcd [LoopUtils] Make duplicate method a utility. [NFCI]
Summary:
Method appendLoopsToWorklist is duplicate in LoopUnroll and in the
LoopPassManager as an internal method. Make it an utility.

Reviewers: dmgreen, chandlerc, fedor.sergeev, yamauchi

Subscribers: mehdi_amini, hiraditya, zzheng, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73569
2020-02-03 10:24:18 -08:00
Nikita Popov 575a975afd [SimplifyLibCalls] Remove unused IRBuilder argument; NFC
isLocallyOpenedFile() does not use IRBuilder.
2020-02-03 19:12:57 +01:00
Nikita Popov 878cb38a5c [InstCombine] Add replaceOperand() helper
Adds a replaceOperand() helper, which is like Instruction.setOperand()
but adds the old operand to the worklist. This reduces the amount of
missing or incorrect worklist management.

This only applies the helper to a relatively small subset of
setOperand() calls in InstCombine, namely those of the pattern
`I.setOperand(); return &I;`, where it is most obviously applicable.

Differential Revision: https://reviews.llvm.org/D73803
2020-02-03 19:00:17 +01:00
Nikita Popov e6c9ab4fb7 [InstCombine] Rename worklist methods; NFC
This renames Worklist.AddDeferred() to Worklist.add() and
Worklist.Add() to Worklist.push(). The intention here is that
Worklist.add() should be the go-to method for explicit worklist
management, while the raw Worklist.push() is mostly for
InstCombine internals. I will then migrate uses of Worklist.push()
to Worklist.add() in followup changes.

As suggested by spatel on D73411 I'm also changing the remaining
method names to lowercase first character, in line with current
coding standards.

Differential Revision: https://reviews.llvm.org/D73745
2020-02-03 18:56:51 +01:00
Nikita Popov a59954051e [InstCombine] Fix unused variable warning; NFC 2020-02-03 18:47:38 +01:00
Teresa Johnson bed4d9c897 [ThinLTO] More efficient export computation (NFC)
Summary:
A recent change to enable more importing of global variables with
references exposed some efficiency issues with export computation.
See D73724 for more information and detailed analysis.

The first was specific to variable importing. The code was marking every
copy of a referenced value (from possibly thousands of files in the case
of linkonce_odr) as exported, and we only need to mark the copy in the
module containing the variable def being imported as exported. The
reason is that this is tracking what values are newly exported as a
result of importing. Anything that was defined in another module and
simply used in the exporting module is already exported, and would have
been identified by the caller (e.g. the LTO API implementations).

The second issue is that the code was re-adding previously exported
values (along with all references). It is easy to identify when a
variable was already imported into the same module (via the
import list insert call return value), and we already did this for
function importing. However, what we weren't doing for either function
or variable importing was avoiding a re-insertion when it was previously
exported into a different importing module. The reason we couldn't do
this is there was no way of telling from the export list whether it was
previously inserted there because its definition was exported (in which
case we already marked all its references as exported) from when it was
inserted there because it was referenced by another exported value (in
which case we haven't yet inserted its own references).

To address this we can restructure the way the export list is
constructed. This patch only adds the actual imported definitions
(variable or function) to the export list for its module during the
import computation. After import computation is complete, where we were
already post-processing the export list we go ahead and add all
references made by those exported values to the export list.

These changes speed up the thin link not only with constant variable
importing enabled, but also without (due to the efficiency improvement
in function importing).

Some thin link user time measurements for one large application, average
of 5 runs:

With constant variable importing enabled:
- without this patch: 479.5s
- with this patch: 74.6s

Without constant variable importing enabled:
- without this patch: 80.6s
- with this patch: 70.3s

Note I have not re-enabled constant variable importing here, as I would
like to do additional compile time measurements with these fixes first.

Reviewers: evgeny777

Subscribers: mehdi_amini, inglorion, hiraditya, dexonsmith, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73851
2020-02-03 09:15:33 -08:00
Sanjay Patel e78fb556c5 [InstCombine] reassociate splatted vector ops
bo (splat X), (bo Y, OtherOp) --> bo (splat (bo X, Y)), OtherOp

This patch depends on the splat analysis enhancement in D73549.
See the test with comment:
; Negative test - mismatched splat elements
...as the motivation for that first patch.

The motivating case for reassociating splatted ops is shown in PR42174:
https://bugs.llvm.org/show_bug.cgi?id=42174

In that example, a slight change in order-of-associative math results
in a big difference in IR and codegen. This patch gets all of the
unnecessary shuffles out of the way, but doesn't address the potential
scalarization (see D50992 or D73480 for that).

Differential Revision: https://reviews.llvm.org/D73703
2020-02-03 09:08:36 -05:00
Sam Parker 2663a25fad [JumpThreading] Half the duplicate threshold at Oz
Duplicating instructions can lead to code size increases but using
a threshold of 3 is good for reducing code size.

Differential Revision: https://reviews.llvm.org/D72916
2020-02-03 08:40:20 +00:00
Johannes Doerfert 26d02b0f28 [Attributor] AANoRecurse check all call sites for `norecurse`
If all call sites are in `norecurse` functions we can derive `norecurse`
as the ReversePostOrderFunctionAttrsPass does. This should make
ReversePostOrderFunctionAttrsLegacyPass obsolete once the Attributor is
enabled.

Reviewed By: uenoku

Differential Revision: https://reviews.llvm.org/D72017
2020-02-02 23:57:17 -06:00
Johannes Doerfert 368f7ee7a5 [Attributor] Propagate known information from `checkForAllCallSites`
If we know that all call sites have been processed we can derive an
early fixpoint. The use in this patch is likely not to trigger right now
but a follow up patch will make use of it.

Reviewed By: uenoku, baziotis

Differential Revision: https://reviews.llvm.org/D72016
2020-02-02 23:57:17 -06:00
Juneyoung Lee 578d2e2cb1 [llvm-extract] Add -keep-const-init commandline option
Summary:
This adds -keep-const-init option to llvm-extract which preserves initializers of
used global constants.

For example:

```
$ cat a.ll
@g = constant i32 0
define i32 @f() {
  %v = load i32, i32* @g
  ret i32 %v
}

$ llvm-extract --func=f a.ll -S -o -
@g = external constant i32
define i32 @f() { .. }

$ llvm-extract --func=f a.ll -keep-const-init -S -o -
@g = constant i32 0
define i32 @f() { .. }
```

This option is useful in checking whether a function that uses a constant global is optimized correctly.

Reviewers: jsji, MaskRay, david2050

Reviewed By: MaskRay

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73833
2020-02-03 14:30:28 +09:00
Johannes Doerfert 342357c568 [Inliner][NoAlias] Use call site attributes too
If we had `noalias` on an argument the inliner created alias scope
metadata already. However, the call site `noalias` annotation was not
considered. Since the Attributor can derive such call site `noalias`
annotation we should treat them the same as argument annotations.

Reviewed By: hfinkel

Differential Revision: https://reviews.llvm.org/D73528
2020-02-02 23:21:29 -06:00
Tyker a7bbe45a3e Build assume from call
Fix attempt

this is part of the implementation of http://lists.llvm.org/pipermail/llvm-dev/2019-December/137632.html

this patch gives the basis of building an assume to preserve all information from an instruction and add support for building an assume that preserve the information from a call.
2020-02-02 19:43:36 +01:00
Tyker 7cb5d96fbe Revert "[WIP] Build assume from call"
casued buildbot failure

This reverts commit 8ebe001553.
2020-02-02 18:35:19 +01:00
Tyker 8ebe001553 [WIP] Build assume from call
Summary:
this is part of the implementation of http://lists.llvm.org/pipermail/llvm-dev/2019-December/137632.html

this patch gives the basis of building an assume to preserve all information from an instruction and add support for building an assume that preserve the information from a call.

Reviewers: jdoerfert

Reviewed By: jdoerfert

Subscribers: mgrang, fhahn, mgorny, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D72475
2020-02-02 18:15:50 +01:00
Tyker c2d0336208 Revert "[WIP] Build assume from call"
caused build bot failure

This reverts commit 780d2c532f.
2020-02-02 18:09:06 +01:00
Tyker 780d2c532f [WIP] Build assume from call
Summary:
this is part of the implementation of http://lists.llvm.org/pipermail/llvm-dev/2019-December/137632.html

this patch gives the basis of building an assume to preserve all information from an instruction and add support for building an assume that preserve the information from a call.

Reviewers: jdoerfert

Reviewed By: jdoerfert

Subscribers: mgrang, fhahn, mgorny, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D72475
2020-02-02 17:54:31 +01:00
Tyker ad8ffc5010 Revert "[WIP] Build assume from call"
This reverts commit 355e4bfd78.
2020-02-02 17:49:23 +01:00
Tyker 355e4bfd78 [WIP] Build assume from call
Summary:
this is part of the implementation of http://lists.llvm.org/pipermail/llvm-dev/2019-December/137632.html

this patch gives the basis of building an assume to preserve all information from an instruction and add support for building an assume that preserve the information from a call.

Reviewers: jdoerfert

Reviewed By: jdoerfert

Subscribers: mgrang, fhahn, mgorny, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D72475
2020-02-02 17:17:46 +01:00
Tyker 0adda3df92 Revert "[WIP] Build assume from call"
This reverts commit 2ff5602cb5.
2020-02-02 15:05:33 +01:00
Tyker d431c5d9af Revert "[NFC] Factor out function to detect if an attribute has an argument."
This reverts commit ff1b9add2f.
2020-02-02 15:03:06 +01:00
Tyker ff1b9add2f [NFC] Factor out function to detect if an attribute has an argument.
Reviewers: jdoerfert

Reviewed By: jdoerfert

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D72884
2020-02-02 14:50:31 +01:00
Tyker 2ff5602cb5 [WIP] Build assume from call
Summary:
this is part of the implementation of http://lists.llvm.org/pipermail/llvm-dev/2019-December/137632.html

this patch gives the basis of building an assume to preserve all information from an instruction and add support for building an assume that preserve the information from a call.

Reviewers: jdoerfert

Reviewed By: jdoerfert

Subscribers: mgrang, fhahn, mgorny, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D72475
2020-02-02 14:50:31 +01:00
Fangrui Song ba3a1774a9 [Transforms] Simplify with make_early_inc_range 2020-02-02 00:54:32 -08:00
Artur Pilipenko 34547ac959 NFC. Comments cleanup in DSE::memoryIsNotModifiedBetween
Separated from https://reviews.llvm.org/D68006 review.
2020-01-31 15:22:33 -08:00
Nikita Popov ff17da3f75 [InstCombine] Push negation through multiply (PR44234)
Fixes https://bugs.llvm.org/show_bug.cgi?id=44234 by adding
multiply support to freelyNegateValue(). Only one of the operands
needs to be negatible, so this still fits within the framework.

Differential Revision: https://reviews.llvm.org/D73410
2020-01-31 20:58:55 +01:00
Hiroshi Yamauchi ac8da31a0f [PGO][PGSO] Handle MBFIWrapper
Some code gen passes use MBFIWrapper to keep track of the frequency of new
blocks. This was not taken into account and could lead to incorrect frequencies
as MBFI silently returns zero frequency for unknown/new blocks.

Add a variant for MBFIWrapper in the PGSO query interface.

Depends on D73494.
2020-01-31 09:36:55 -08:00
Sanjay Patel bc1148e7bc [PATCH] D73727: [SLP] drop poison-generating flags for shuffle reduction ops (PR44536)
We may calculate reassociable math ops in arbitrary order when creating a shuffle reduction,
so there's no guarantee that things like 'nsw' hold on those intermediate values. Drop all
poison-generating flags for safety.

This change is limited to shuffle reductions because I don't think we have a problem in the
general case (where we intersect flags of each scalar op that goes into a vector op), but if
there's evidence of other cases being wrong, we can extend this fix to cover those cases.

https://bugs.llvm.org/show_bug.cgi?id=44536

Differential Revision: https://reviews.llvm.org/D73727
2020-01-31 09:54:35 -05:00
Nikita Popov 480391035c [InstCombine] Remove unnecessary worklist add; NFCI
Again, this will already be added by IRBuilder.
2020-01-30 23:24:59 +01:00
Nikita Popov 90b5ed996b [InstCombine] Remove unnecessary worklist add; NFCI
The IRBuilder will automatically add instructions to the worklist.
Adding it manually is unnecessary, but may mess up worklist order.
2020-01-30 23:06:28 +01:00
Nikita Popov cad91074a6 [InstCombine] Create new insts in foldICmpEqIntrinsicWithConstant; NFCI
In line with current conventions, create new instructions rather
than modify two operands in place and performing manual worklist
management.

This should be NFC apart from possible worklist order changes.
2020-01-30 23:03:16 +01:00
Whitney Tsang e44f4a8a54 [LoopFusion] Move instructions from FC1.GuardBlock to FC0.GuardBlock and
from FC0.ExitBlock to FC1.ExitBlock when proven safe.

Summary:
Currently LoopFusion give up when the second loop nest guard
block or the first loop nest exit block is not empty. For example:

if (0 < N) {
  for (int i = 0; i < N; ++i) {}
  x+=1;
}
y+=1;
if (0 < N) {
  for (int i = 0; i < N; ++i) {}
}
The above example should be safe to fuse.
This PR moves instructions in FC1 guard block (e.g. y+=1;) to
FC0 guard block, or instructions in FC0 exit block (e.g. x+=1;) to
FC1 exit block, which then LoopFusion is able to fuse them.
Reviewer: kbarton, jdoerfert, Meinersbur, dmgreen, fhahn, hfinkel,
bmahjour, etiotto
Reviewed By: jdoerfert
Subscribers: hiraditya, llvm-commits
Tag: LLVM
Differential Revision: https://reviews.llvm.org/D73641
2020-01-30 18:02:22 +00:00
David Stenberg b54a8ec1bc [InstCombine][DebugInfo] Fold constants wrapped in metadata
Summary:
When constant folding, constants that are wrapped in metadata were not
folded. This could lead to dbg.values being the only user of a constant
expression, due to the non-dbg uses having been rewritten, resulting in
the constant later on being removed by some other pass. This occurred
with the attached test case, in which the non-rewritten GEP in the
dbg.value intrinsic was later on removed by globalopt.

This patch makes the code look through metadata and fold such constants.

I guess that we in the future may want to allow dbg.values using GEPs and
other constant expressions to be emittable even if there are no non-dbg
uses, but for example SelectionDAG does not support that.

Reviewers: jmorse, aprantl, vsk, davide

Reviewed By: aprantl, vsk, davide

Subscribers: hiraditya, llvm-commits

Tags: #debug-info, #llvm

Differential Revision: https://reviews.llvm.org/D73630
2020-01-30 15:50:16 +01:00
Piotr Sobczak dd7148822b [InstCombine][AMDGPU] Trim components of s_buffer_load
Summary:
Add trimming of unused components of s_buffer_load.

For s_buffer_load and unformatted buffer_load also trim unused
components at the beginning of vector and update offset accordingly.

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71785
2020-01-30 10:48:25 +01:00
Michael Forster 676c29694c Inline debug variable.
Summary:
In a release build this variable becomes unused and may break the build
with `-Werror,-Wunused-variable`.

Reviewers: gribozavr2, jdoerfert, sstefan1

Reviewed By: gribozavr2

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73683
2020-01-30 10:29:24 +01:00
Nikita Popov 8058196677 [InstCombine] Process newly inserted instructions in the correct order
InstCombine operates on the basic premise that the operands of the
currently processed instruction have already been simplified. It
achieves this by pushing instructions to the worklist in reverse
program order, so that instructions are popped off in program order.
The worklist management in the main combining loop also makes sure
to uphold this invariant.

However, the same is not true for all the code that is performing
manual worklist management. The largest problem (addressed in this
patch) are instructions inserted by InstCombine's IRBuilder. These
will be pushed onto the worklist in order of insertion (generally
matching program order), which means that a) the users of the
original instruction will be visited first, as they are pushed later
in the main loop and b) the newly inserted instructions will be
visited in reverse program order.

This causes a number of problems: First, folds operate on instructions
that have not had their operands simplified, which may result in
optimizations being missed (ran into this in
https://reviews.llvm.org/D72048#1800424, which was the original
motivation for this patch). Additionally, this increases the amount
of folds InstCombine has to perform, both within one iteration, and
by increasing the number of total iterations.

This patch addresses the issue by adding a Worklist.AddDeferred()
method, which is used for instructions inserted by IRBuilder. These
will only be added to the real worklist after the combine finished,
and in reverse order, so they will end up processed in program order.
I should note that the same should also be done to nearly all other
uses of Worklist.Add(), but I'm starting with just this occurrence,
which has by far the largest test fallout.

Most of the test changes are due to
https://bugs.llvm.org/show_bug.cgi?id=44521 or other cases where
we don't canonicalize something. These are neutral. One regression
has been addressed in D73575 and D73647. The remaining regression
in an shl+sdiv fold can't really be fixed without dropping another
transform, but does not seem particularly problematic in the first
place.

Differential Revision: https://reviews.llvm.org/D73411
2020-01-30 09:40:10 +01:00
Francesco Petrogalli 623cff81fe [llvm][VectorUtils] Tweak VFShape for scalable vector functions.
Summary:
This patch makes sure that the field VFShape.VF is greater than zero
when demangling the vector function name of scalable vector functions
encoded in the "vector-function-abi-variant" attribute.

This change is required to be able to provide instances of VFShape
that can be used to query the VFDatabase for the vectorization passes,
as such passes always require a positive value for the Vectorization Factor (VF)
needed by the vectorization process.

It is not possible to extract the value of VFShape.VF from the mangled
name of scalable vector functions, because it is encoded as
`x`. Therefore, the VFABI demangling function has been modified to
extract such information from the IR declaration of the vector
function, under the assumption that _all_ vectors in the signature of
the vector function have the same number of lanes. Such assumption is
valid because it is also assumed by the Vector Function ABI
specifications supported by the demangling function (x86, AArch64, and
LLVM internal one).

The unit tests that demangle scalable names have been modified by
adding the IR module that carries the declaration of the vector
function name being demangled.

In particular, the demangling function fails in the following cases:

1. When the declaration of the scalable vector function is not
    present in the module.

2. When the value of VFSHape.VF is not greater than 0.

Reviewers: jdoerfert, sdesmalen, andwar

Reviewed By: jdoerfert

Subscribers: mgorny, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73286
2020-01-30 05:53:56 +00:00
Johannes Doerfert 89c2e733e8 [Attributor] Pointer privatization attribute (argument promotion)
A pointer is privatizeable if it can be replaced by a new, private one.
Privatizing pointer reduces the use count, interaction between unrelated
code parts. This is a first step towards replacing argument promotion.
While we can already handle recursion (unlike argument promotion!) we
are restricted to stack allocations for now because we do not analyze
the uses in the callee.

Reviewed By: uenoku

Differential Revision: https://reviews.llvm.org/D68852
2020-01-29 21:31:04 -06:00
Johannes Doerfert 791c9f1145 [Attributor] Fix TODO to avoid recomputation of results
The helpers AAReturnedFromReturnedValues and
AACallSiteReturnedFromReturned are useful not only to avoid code
duplication but also to avoid recomputation of results. If we have N
call sites we should not recompute the function return information N
times but once. These are mostly straightforward usages with some minor
improvements on the helpers and addition of a new one
(IRPosition::getAssociatedType) that knows about function return types.
2020-01-29 19:24:34 -06:00
Nikita Popov e086e23024 [InstCombine] Support non-splat vectors in icmp eq + add/sub fold
For the

    icmp eq (add X, C1), C2 => icmp eq X, C2-C1
    icmp eq (sub C1, X), C2 => icmp eq X, C1-C2

folds, this allows C1 to be non-splat and contain undefs.
C2 is still splat, due to the structure of the code.

This is to address the remaining part of the regression in D73411,
where demanded element analysis replaces some elements with undef.

Differential Revision: https://reviews.llvm.org/D73647
2020-01-29 20:56:58 +01:00
Elia Geretto ab2300bc15 [PassManagerBuilder] Remove global extension when a plugin is unloaded
This commit fixes PR39321.

GlobalExtensions is not guaranteed to be destroyed when optimizer plugins are unloaded. If it is indeed destroyed after a plugin is dlclose-d, the destructor of the corresponding ExtensionFn is not mapped anymore, causing a call to unmapped memory during destruction.

This commit guarantees that extensions coming from external plugins are removed from GlobalExtensions when the plugin is unloaded if GlobalExtensions has not been destroyed yet.

Differential Revision: https://reviews.llvm.org/D71959
2020-01-29 16:15:45 +00:00
Simon Pilgrim 79748add70 Fix MSVC lamdba default capture mode warning. NFCI. 2020-01-29 15:47:04 +00:00
Whitney Tsang da58e68fdf [LoopFusion] Move instructions from FC1.Preheader to FC0.Preheader when
proven safe.

Summary:
Currently LoopFusion give up when the second loop nest preheader is
not empty. For example:

for (int i = 0; i < 100; ++i) {}
x+=1;
for (int i = 0; i < 100; ++i) {}
The above example should be safe to fuse.
This PR moves instructions in FC1 preheader (e.g. x+=1; ) to
FC0 preheader, which then LoopFusion is able to fuse them.
Reviewer: kbarton, Meinersbur, jdoerfert, dmgreen, fhahn, hfinkel,
bmahjour, etiotto
Reviewed By: jdoerfert
Subscribers: hiraditya, llvm-commits
Tag: LLVM
Differential Revision: https://reviews.llvm.org/D71821
2020-01-29 15:06:11 +00:00
Sanjay Patel 87f6314f8c [InstCombine] canonicalize splat shuffle after cmp
cmp (splat V1, M), SplatC --> splat (cmp V1, SplatC'), M

As discussed in PR44588:
https://bugs.llvm.org/show_bug.cgi?id=44588
...we try harder to push shuffles after binops than after compares.

This patch handles the special (but presumably most common case) of
splat shuffles. If both operands are splats, then we can do the
comparison on the non-splat inputs followed by splat of the compare.
That should take care of the regression noted in D73411.

There's another potential fold requested in PR37463 to scalarize the
compare, but that's another patch (and it's not clear if we can do
that without the ability to undo it later):
https://bugs.llvm.org/show_bug.cgi?id=37463

Differential Revision: https://reviews.llvm.org/D73575
2020-01-29 08:34:29 -05:00
Johannes Doerfert 76843ba37f [Attributor][Fix] Initialize unused but loaded variable
This hopefully un-breaks:
  http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/38333
2020-01-28 23:52:16 -06:00
Johannes Doerfert ea5fabe60c [Attributor] Reuse existing logic to avoid duplication
There was a TODO in AAValueConstantRangeArgument to reuse
AAArgumentFromCallSiteArguments. We now do this by allowing new States
to be build from the bestState.
2020-01-28 23:45:59 -06:00
Johannes Doerfert 224085409d [Attributor][FIX] Treat invalidated attributes as changed
If we invalidate an attribute we need to inform all dependent ones even
if the fixpoint state is not invalid. Before we only continued
invalidation if the fixpoint state was invalid, now we signal a change
in case the fixpoint state is valid.

The test case was already included in D71620 but the problem was hiding
because it only manifested with the old PM (for that input).
2020-01-28 23:40:41 -06:00
Johannes Doerfert 53992c7bf7 [Attributor] Modularize AANoAliasCallSiteArgument to simplify extensions
This patch modularizes the way we check for no-alias call site arguments
by putting the existing logic into helper functions. The reasoning was
not changed but special cases for readonly/readnone were added.
2020-01-28 23:39:29 -06:00
Johannes Doerfert 24ae77eebf [Attributor] Mark a non-defined `null` pointer as `noalias`
If `null` is not defined we cannot access it, hence the pointer is
`noalias`. While this is not helpful on it's own it simplifies later
deductions that can skip over already known `noalias` pointers in
certain situations.
2020-01-28 23:09:37 -06:00
Johannes Doerfert 6626d1b7c0 [Attributor][NFC] Remove ugly and unneeded cast 2020-01-28 22:54:31 -06:00
Johannes Doerfert 02bd8180fc [Attributor][NFC] Improve debug messages 2020-01-28 22:53:19 -06:00
Johannes Doerfert b6dbd0f71f [Attributor][NFC] Internalize helper function 2020-01-28 22:50:34 -06:00
Vedant Kumar 8359511c62 [CodeExtractor] Remove stale llvm.assume calls from extracted region
During extraction, stale llvm.assume handles may be retained in the
original function. The setup is:

1) CodeExtractor unregisters assumptions in the blocks that are to be
   extracted.

2) Extraction happens. There are now two functions: f1 and f1.extracted.

3) Leftover assumptions in f1 (/not/ removed as they were not in the set of
   blocks to be extracted) now have affected-value llvm.assume handles in
   f1.extracted.

When assumptions for a value used in f1 are looked up, ValueTracking can assert
as some of the handles are in the wrong function. To fix this, simply erase the
llvm.assume calls in the extracted function.

Alternatives include flushing the assumption cache in the original function, or
walking all values used in the original function to prune stale affected-value
handles. Both seem more expensive.

Testing: check-llvm, LNT run with -mllvm -hot-cold-split enabled

rdar://58460728
2020-01-28 17:18:01 -08:00
Eli Friedman 2f6b9edfa8 [AliasAnalysis] Add missing FMRB_* enums.
Previously, the enums didn't account for all the possible cases, which
could cause misleading results (particularly for a "switch" on
FunctionModRefBehavior).

Fixes regression in polly from recent patch to add writeonly to memset.

While I'm here, also fix a few dubious uses of the FMRB_* enum values.

Differential Revision: https://reviews.llvm.org/D73154
2020-01-28 15:47:08 -08:00
Benjamin Kramer adcd026838 Make llvm::StringRef to std::string conversions explicit.
This is how it should've been and brings it more in line with
std::string_view. There should be no functional change here.

This is mostly mechanical from a custom clang-tidy check, with a lot of
manual fixups. It uncovers a lot of minor inefficiencies.

This doesn't actually modify StringRef yet, I'll do that in a follow-up.
2020-01-28 23:25:25 +01:00
Whitney Tsang cd0cff4392 [NFCI][LoopUnrollAndJam] Minor changes.
Summary:
1. Add assertions.
2. Verify more analyses.
These changes are moved out of https://reviews.llvm.org/D73129 to
simplify that review.

Reviewer: dmgreen, jdoerfert, Meinersbur, kbarton, bmahjour, etiotto
Reviewed By: dmgreen
Subscribers: fhahn, hiraditya, zzheng, llvm-commits, prithayan, anhtuyen
Tag: LLVM
Differential Revision: https://reviews.llvm.org/D73204
2020-01-28 20:24:23 +00:00
Petr Hosek 127d3abf25 [Instrumentation] Set hidden visibility for the bias variable
We have to avoid using a GOT relocation to access the bias variable,
setting the hidden visibility achieves that.

Differential Revision: https://reviews.llvm.org/D73529
2020-01-28 12:07:03 -08:00
Sanjay Patel 7a717d82ff [InstCombine] refactor foldVectorCmp(); NFC
We can handle other patterns here as shown in PR44588.
2020-01-28 14:40:48 -05:00
Florian Hahn 5d0ffbeb4d [Matrix] Mark expressions shared between multiple remarks.
This patch adds support for explicitly highlighting sub-expressions
shared by multiple leaf nodes. For example consider the following
code

  %shared.load = tail call <8 x double> @llvm.matrix.columnwise.load.v8f64.p0f64(double* %arg1, i32 %stride, i32 2, i32 4), !dbg !10, !noalias !10
  %trans = tail call <8 x double> @llvm.matrix.transpose.v8f64(<8 x double> %shared.load, i32 2, i32 4), !dbg !10
  tail call void @llvm.matrix.columnwise.store.v8f64.p0f64(<8 x double> %trans, double* %arg3, i32 10, i32 4, i32 2), !dbg !10
  %load.2 = tail call <30 x double> @llvm.matrix.columnwise.load.v30f64.p0f64(double* %arg3, i32 %stride, i32 2, i32 15), !dbg !10, !noalias !10
  %mult = tail call <60 x double> @llvm.matrix.multiply.v60f64.v8f64.v30f64(<8 x double> %trans, <30 x double> %load.2, i32 4, i32 2, i32 15), !dbg !11
  tail call void @llvm.matrix.columnwise.store.v60f64.p0f64(<60 x double> %mult, double* %arg2, i32 10, i32 4, i32 15), !dbg !11

We have two leaf nodes (the 2 stores) and the first store stores %trans
which is also used by the matrix multiply %mult. We generate separate
remarks for each leaf (stores). To denote that parts are shared, the
shared expressions are marked as shared (), with a reference to the
other remark that shares it. The operation summary also denotes the
shared operations separately.

Reviewers: anemet, Gerolf, thegameg, hfinkel, andrew.w.kaylor, LuoYuanke

Reviewed By: anemet

Differential Revision: https://reviews.llvm.org/D72526
2020-01-28 09:27:55 -08:00
Florian Hahn d1f849a284 [LV] Hoist code to mark conditional assumes as dead to caller (NFC).
This is a follow-up suggested in D73423. It is sufficient to just add
the conditional assumes to DeadInstructions once.
2020-01-28 08:50:44 -08:00
Michael Liao 9c54b42338 Fix warning of `-Wcast-qual`. NFC. 2020-01-28 11:36:43 -05:00
Florian Hahn a911fef3dd [LV] Do not try to sink dead instructions.
Dead instructions do not need to be sunk. Currently we try and record
the recipies for them, but there are no recipes emitted for them and
there's nothing to sink. They can be removed from SinkAfter while
marking them for recording.

Fixes PR44634.

Reviewers: rengolin, hsaito, fhahn, Ayal, gilr

Reviewed By: gilr

Differential Revision: https://reviews.llvm.org/D73423
2020-01-28 08:28:03 -08:00
Whitney Tsang 78dc64989c [CodeMoverUtils] Improve IsControlFlowEquivalent.
Summary:
Currently IsControlFlowEquivalent determine if two blocks are control
flow equivalent by checking if A dominates B and B post dominates A.
There exists blocks that are control flow equivalent even if they don't
satisfy the A dominates B and B post dominates A condition.
For example,

if (cond)
  A
if (cond)
  B
In the PR, we determine if two blocks are control flow equivalent by
also checking if the two sets of conditions A and B depends on are
equivalent.
Reviewer: jdoerfert, Meinersbur, dmgreen, etiotto, bmahjour, fhahn,
hfinkel, kbarton
Reviewed By: fhahn
Subscribers: hiraditya, llvm-commits
Tag: LLVM
Differential Revision: https://reviews.llvm.org/D71578
2020-01-28 14:18:00 +00:00
Florian Hahn 62e228f8fd [Matrix] Add info about number of operations to remarks.
This patch updates the remark to also include a summary of the number of
vector operations generated for each matrix expression.

Reviewers: anemet, Gerolf, thegameg, hfinkel, andrew.w.kaylor, LuoYuanke

Reviewed By: anemet

Differential Revision: https://reviews.llvm.org/D72480
2020-01-27 17:43:39 -08:00
Florian Hahn 949294f396 [Matrix] Add optimization remarks for matrix expression.
Generate remarks for matrix operations in a function. To generate remarks
for matrix expressions, the following approach is used:
1. Collect leafs of matrix expressions (done in
   RemarkGenerator::getExpressionLeafs).  Leafs are lowered matrix
   instructions without other matrix users (like stores).

2. For each leaf, create a remark containing a linearizied version of the
   matrix expression.

The following improvements will be submitted as follow-ups:
* Summarize number of vector instructions generated for each expression.
* Account for shared sub-expressions.
* Propagate matrix remarks up the inlining chain.

The information provided by the matrix remarks helps users to spot cases
where matrix expression got split up, e.g. due to inlining not
happening. The remarks allow users to address those issues, ensuring
best performance.

Reviewers: anemet, Gerolf, thegameg, hfinkel, andrew.w.kaylor, LuoYuanke

Reviewed By: anemet

Differential Revision: https://reviews.llvm.org/D72453
2020-01-27 16:39:29 -08:00
Sanjay Patel 747242af8d [InstCombine] allow more narrowing of casted select
D47163 created a rule that we should not change the casted
type of a select when we have matching types in its compare condition.
That was intended to help vector codegen, but it also could create
situations where we miss subsequent folds as shown in PR44545:
https://bugs.llvm.org/show_bug.cgi?id=44545

By using shouldChangeType(), we can continue to get the vector folds
(because we always return false for vector types). But we also solve
the motivating bug because it's ok to narrow the scalar select in that
example.

Our canonicalization rules around select are a mess, but AFAICT, this
will not induce any infinite looping from the reverse transform (but
we'll need to watch for that possibility if committed).

Side note: there's a similar use of shouldChangeType() for phi ops
just below this diff, and the source and destination types appear to
be reversed.

Differential Revision: https://reviews.llvm.org/D72733
2020-01-27 16:35:50 -05:00
Sanjay Patel 242fed9d7f [InstCombine] convert fsub nsz with fneg operand to -(X + Y)
This was noted in D72521 - we need to match fneg specifically to
consistently handle that pattern along with (-0.0 - X).
2020-01-27 14:49:15 -05:00
Nikita Popov bcfa0f592f [InstCombine] Move negation handling into freelyNegateValue()
Followup to D72978. This moves existing negation handling in
InstCombine into freelyNegateValue(), which make it composable.
In particular, root negations of div/zext/sext/ashr/lshr/sub can
now always be performed through a shl/trunc as well.

Differential Revision: https://reviews.llvm.org/D73288
2020-01-27 20:46:23 +01:00
Teresa Johnson 2f63d549f1 Restore "[LTO/WPD] Enable aggressive WPD under LTO option"
This restores 59733525d3 (D71913), along
with bot fix 19c76989bb.

The bot failure should be fixed by D73418, committed as
af954e441a.

I also added a fix for non-x86 bot failures by requiring x86 in new test
lld/test/ELF/lto/devirt_vcall_vis_public.ll.
2020-01-27 07:55:05 -08:00
Whitney Tsang 2b335e9aae [LoopUnroll] Remove remapInstruction().
Summary:
LoopUnroll can reuse the RemapInstruction() in ValueMapper, or
remapInstructionsInBlocks() in CloneFunction, depending on the needs.
There is no need to have its own version in LoopUnroll.

By calling RemapInstruction() without TypeMapper or Materializer and
with Flags (RF_NoModuleLevelChanges | RF_IgnoreMissingLocals), it does
the same as remapInstruction(). remapInstructionsInBlocks() calls
RemapInstruction() exactly as described.

Looking at the history, I cannot find any obvious reason to have its own
version.
Reviewer: dmgreen, jdoerfert, Meinersbur, kbarton, bmahjour, etiotto,
foad, aprantl
Reviewed By: jdoerfert
Subscribers: hiraditya, zzheng, llvm-commits, prithayan, anhtuyen
Tag: LLVM
Differential Revision: https://reviews.llvm.org/D73277
2020-01-27 15:42:13 +00:00
Guillaume Chatelet 07c9d53266 [Alignment][NFC] Use Align with CreateAlignedLoad
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet, bollu

Subscribers: hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D73449
2020-01-27 10:58:36 +01:00
Guillaume Chatelet d0a7cc7177 [Alignment][NFC] Use Align with CreateMaskedScatter/Gather
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

This patch shows that CreateMaskedScatter/CreateMaskedGather can only take positive non zero alignment values.

Reviewers: courbet

Subscribers: hiraditya, llvm-commits, delena

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73361
2020-01-27 10:17:14 +01:00
Evgenii Stepanov 1df8549b26 [msan] Instrument x86.pclmulqdq* intrinsics.
Summary:
These instructions ignore parts of the input vectors which makes the
default MSan handling too strict and causes false positive reports.

Reviewers: vitalybuka, RKSimon, thakis

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73374
2020-01-24 14:31:06 -08:00
Andy Kaylor b35b7da460 [PGO] Attach appropriate funclet operand bundles to value profiling instrumentation calls
Patch by Chris Chrulski

When generating value profiling instrumentation, ensure the call gets the
correct funclet token, otherwise WinEHPrepare will turn the call (and all
subsequent instructions) into unreachable.

Differential Revision: https://reviews.llvm.org/D73221
2020-01-24 11:20:53 -08:00
Simon Pilgrim abd1927d44 Fix some comment typos. NFC. 2020-01-24 18:18:42 +00:00
Alina Sbirlea 0d90d2457c [LoopStrengthReduce] Teach LoopStrengthReduce to preserve MemorySSA is available. 2020-01-24 10:13:52 -08:00
Andy Kaylor a33accde95 [PGO] Early detection regarding whether pgo counter promotion is possible
Patch by Chris Chrulski

This fixes a problem with the current behavior when assertions are enabled.
A loop that exits to a catchswitch instruction is skipped for the counter
promotion, however this check was being done after the PGOCounterPromoter
tried to collect an insertion point for the exit block. A call to
getFirstInsertionPt() on a block that begins with a catchswitch instruction
triggers an assertion. This change performs a check whether the counter
promotion is possible prior to collecting the ExitBlocks and InsertPts.

Differential Revision: https://reviews.llvm.org/D73222
2020-01-24 09:55:41 -08:00
Guillaume Chatelet 805c157e8a [Alignment][NFC] Deprecate Align::None()
Summary:
This is a follow up on https://reviews.llvm.org/D71473#inline-647262.
There's a caveat here that `Align(1)` relies on the compiler understanding of `Log2_64` implementation to produce good code. One could use `Align()` as a replacement but I believe it is less clear that the alignment is one in that case.

Reviewers: xbolva00, courbet, bollu

Subscribers: arsenm, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, jsji, Jim, kerbowa, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D73099
2020-01-24 12:53:58 +01:00
Evgeny Leviant 8973fae195 [WPD] Allow load/save bitcoded index when running opt -wholeprogramdevirt
Differential revision: https://reviews.llvm.org/D73094
2020-01-24 00:31:39 -08:00
Andy Kaylor c467faf23c [WinEH] Ignore lifetime.end PHI nodes in empty cleanuppads
This fixes a bug where a PHI node that is only referenced by a lifetime.end intrinsic in an otherwise empty cleanuppad can cause SimplyCFG to create an SSA violation while removing the empty cleanuppad. Theoretically the same problem can occur with debug intrinsics.

Differential Revision: https://reviews.llvm.org/D72540
2020-01-23 18:18:50 -08:00
Teresa Johnson 90e630a95e Revert "[LTO/WPD] Enable aggressive WPD under LTO option"
This reverts commit 59733525d3.

There is a windows sanitizer bot failure in one of the cfi tests
that I will need some time to figure out:
http://lab.llvm.org:8011/builders/sanitizer-windows/builds/57155/steps/stage%201%20check/logs/stdio
2020-01-23 17:29:24 -08:00
Johannes Doerfert 7ad17e008b [Attributor] Avoid REQUIRED dependences in favor of OPTIONAL ones
When we use information only to short-cut deduction or improve it, we
can use OPTIONAL dependences instead of REQUIRED ones to avoid cascading
pessimistic fixpoints.

We also need to track dependences only when we use assumed information,
e.g., we act on assumed liveness information.
2020-01-23 18:42:46 -06:00
Johannes Doerfert 214ed3f676 [Attributor] Record dependences only when necessary
If we use assumed information from AAValueSimplify we need to record
an OPTIONAL dependence, otherwise we do not.
2020-01-23 18:42:45 -06:00
Johannes Doerfert 5429c82db2 [Attributor][FIX] Avoid dangling pointers during code deletion
It can happen that we have instructions in the ToBeDeletedInsts set
which are deleted earlier already. To avoid dangling pointers we use
weak tracking handles.
2020-01-23 18:42:45 -06:00
Johannes Doerfert ff6254dc26 [Attributor][FIX] Handle non-pointers when following uses
When we follow uses, e.g., in AAMemoryBehavior or AANoCapture, we need
to make sure the value is a pointer before we ask for abstract
attributes only valid for pointers. This happens because we follow
pointers through calls that do not capture but may return the value.
2020-01-23 18:42:45 -06:00
Johannes Doerfert 9dcf889d15 [Attributor][NFC] Do not (try to) simplify void values
We might accidentally ask AAValueSimplify to simplify a void value. That
can lead to very interesting, and very wrong, results. We now handle
this case gracefully.
2020-01-23 18:42:45 -06:00
Alina Sbirlea 1d09174290 [LoopStrengthReduce] Reuse utility method to clean dead instructions. [NFCI]
Create a utility wrapper for the RecursivelyDeleteTriviallyDeadInstructions utility
method, which sets to nullptr the instructions that are not trivially
dead. Use the new method in LoopStrengthReduce.
Alternative: add a bool to the same method; this option adds a marginal
amount of overhead to the other callers, and the method needs to be
updated to return a bool status when it removes/doesn't remove
instructions.
2020-01-23 16:27:32 -08:00
Johannes Doerfert 30179d7ecf [Attributor][FIX][Alignment] Do not report a change if there was none
If alignment was manifested but it is actually only as good as the
data-layout provided one we should not report it as a change.

For testing purposes we still manifest the information.
2020-01-23 18:13:52 -06:00
Johannes Doerfert e273ac4d88 [Attributor][NFC] Add an assertion 2020-01-23 18:13:52 -06:00
Johannes Doerfert d07b5a5525 [Attributor][NFC] Fix spelling 2020-01-23 18:13:52 -06:00
Johannes Doerfert 2baf000ecc [Attributor] `byval` arguments are always `noalias`
`byval` introduces a local copy of the argument. That copy cannot alias
anything.
2020-01-23 18:13:52 -06:00
Johannes Doerfert 30ae859c69 [Attributor][FIX] Store alignment only holds for the pointer value
We accidentally used the store alignment for the value operand as well,
which is incorrect and crashed the SPASS application in the test suite.
2020-01-23 18:13:52 -06:00
Teresa Johnson 59733525d3 [LTO/WPD] Enable aggressive WPD under LTO option
Summary:
Third part in series to support Safe Whole Program Devirtualization
Enablement, see RFC here:
http://lists.llvm.org/pipermail/llvm-dev/2019-December/137543.html

This patch adds type test metadata under -fwhole-program-vtables,
even for classes without hidden visibility. It then changes WPD to skip
devirtualization for a virtual function call when any of the compatible
vtables has public vcall visibility.

Additionally, internal LLVM options as well as lld and gold-plugin
options are added which enable upgrading all public vcall visibility
to linkage unit (hidden) visibility during LTO. This enables the more
aggressive WPD to kick in based on LTO time knowledge of the visibility
guarantees.

Support was added to all flavors of LTO WPD (regular, hybrid and
index-only), and to both the new and old LTO APIs.

Unfortunately it was not simple to split the first and second parts of
this part of the change (the unconditional emission of type tests and
the upgrading of the vcall visiblity) as I needed a way to upgrade the
public visibility on legacy WPD llvm assembly tests that don't include
linkage unit vcall visibility specifiers, to avoid a lot of test churn.

I also added a mechanism to LowerTypeTests that allows dropping type
test assume sequences we now aggressively insert when we invoke
distributed ThinLTO backends with null indexes, which is used in testing
mode, and which doesn't invoke the normal ThinLTO backend pipeline.

Depends on D71907 and D71911.

Reviewers: pcc, evgeny777, steven_wu, espindola

Subscribers: emaste, Prazek, inglorion, arichardson, hiraditya, MaskRay, dexonsmith, dang, davidxl, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D71913
2020-01-23 16:09:44 -08:00
Alina Sbirlea 9e66c4ec12 [Utils] Use WeakTrackingVH in vector used as scratch storage.
The utility method RecursivelyDeleteTriviallyDeadInstructions receives
as input a vector of Instructions, where all inputs are valid
instructions. This same vector is used as a scratch storage (per the
header comment) to recursively delete instructions. If an instruction is
added as an operand of multiple other instructions, it may be added twice,
then deleted once, then the second reference in the vector is invalid.
Switch to using a Vector<WeakTrackingVH>.
This change facilitates a clean-up in LoopStrengthReduction.
2020-01-23 16:04:57 -08:00
Florian Hahn 4ed7355e44 [IPSCCP] Use ParamState for arguments at call sites.
We currently use integer ranges to merge concrete function arguments.
We use the ParamState range for those, but we only look up concrete
values in the regular state. For concrete function arguments that are
themselves arguments of the containing function, we can use the param
state directly and improve the precision in some cases.

Besides improving the results in some cases, this is also a small step towards
switching to ValueLatticeElement, by allowing D60582 to be a NFC.

Reviewers: efriedma, davide

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D71836
2020-01-23 13:55:42 -08:00
Teresa Johnson 458676db6e [WPD/VFE] Always emit vcall_visibility metadata for -fwhole-program-vtables
Summary:
First patch to support Safe Whole Program Devirtualization Enablement,
see RFC here: http://lists.llvm.org/pipermail/llvm-dev/2019-December/137543.html

Always emit !vcall_visibility metadata under -fwhole-program-vtables,
and not just for -fvirtual-function-elimination. The vcall visibility
metadata will (in a subsequent patch) be used to communicate to WPD
which vtables are safe to devirtualize, and we will optionally convert
the metadata to hidden visibility at link time. Subsequent follow on
patches will help enable this by adding vcall_visibility metadata to the
ThinLTO summaries, and always emit type test intrinsics under
-fwhole-program-vtables (and not just for vtables with hidden
visibility).

In order to do this safely with VFE, since for VFE all vtable loads must
be type checked loads which will no longer be the case, this patch adds
a new "Virtual Function Elim" module flag to communicate to GlobalDCE
whether to perform VFE using the vcall_visibility metadata.

One additional advantage of using the vcall_visibility metadata to drive
more WPD at LTO link time is that we can use the same mechanism to
enable more aggressive VFE at LTO link time as well. The link time
option proposed in the RFC will convert vcall_visibility metadata to
hidden (aka linkage unit visibility), which combined with
-fvirtual-function-elimination will allow it to be done more
aggressively at LTO link time under the same conditions.

Reviewers: pcc, ostannard, evgeny777, steven_wu

Subscribers: mehdi_amini, Prazek, hiraditya, dexonsmith, davidxl, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D71907
2020-01-23 11:36:01 -08:00
Alina Sbirlea 6770de9b8d [LoopIdiomRecognize] Teach LoopIdiomRecognize to preserve MemorySSA. 2020-01-23 11:31:12 -08:00
Alina Sbirlea a0f627d584 [IndVarSimplify] Fix for MemorySSA preserve. 2020-01-23 11:06:16 -08:00
Justin Bogner b81a337be7 [LoopUnroll] Avoid UB when converting from WeakVH to `Value *`
Calling `operator*` on a WeakVH with a null value yields a null
reference, which is UB. Avoid this by implicitly converting the WeakVH
to a `Value *` rather than dereferencing and then taking the address
for the type conversion.

Differential Revision: https://reviews.llvm.org/D73280
2020-01-23 10:36:39 -08:00
Guillaume Chatelet 59f95222d4 [Alignment][NFC] Use Align with CreateAlignedStore
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet, bollu

Subscribers: arsenm, jvesely, nhaehnle, hiraditya, kerbowa, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D73274
2020-01-23 17:34:32 +01:00
Kazu Hirata 41784bed01 Revert "Resubmit: [JumpThreading] Thread jumps through two basic blocks"
This reverts commit 53b68e676f.

Our internal tests are showing breakage with this patch.
2020-01-23 06:34:03 -08:00
Fedor Sergeev 2f6987ba61 [LoopRotate] add ability to repeat loop rotation until non-deoptimizing exit is found
In case of loops with multiple exit where all-but-one exit are deoptimizing
it might happen that the first rotation will end up with latch having a deoptimizing
exit. This makes the loop unsuitable for trip-count analysis (say, getLoopEstimatedTripCount)
as well as for loop transformations that know how to handle multple deoptimizing exits.

It pretty much means that canonical form in multple-deoptimizing-exits case should be
with non-deoptimizing exit at latch.
Teach loop-rotation to reach this canonical form by repeating rotation.

-loop-rotate-multi option introduced to control this behavior, currently disabled by default.

Reviewers: skatkov, asbirlea, reames, fhahn
Reviewed By: skatkov

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73058
2020-01-23 15:56:24 +03:00
Guillaume Chatelet 279fa8e006 [Alignement][NFC] Deprecate untyped CreateAlignedLoad
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: arsenm, jvesely, nhaehnle, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73260
2020-01-23 13:34:32 +01:00
Daniil Suchkov 4a8dbc617d [SSAUpdater] Don't call ValueIsRAUWd upon single use replacement
It is incorrect to call ValueHandleBase::ValueIsRAUWd when only one use
is replaced since it simply violates semantics of the callback and leads
to bugs like PR44320.

Previously this call was used specifically to keep LICM's cache of
AliasSetTrackers up to date across passes (as PR36801 showed, even for
that purpose it didn't work properly), but since LICM doesn't have that
cache anymore, we can safely remove this incorrect call with no
repercussions.

This patch fixes https://bugs.llvm.org/show_bug.cgi?id=44320

Reviewers: asbirlea, fhahn, efriedma, reames

Reviewed-By: asbirlea

Differential Revision: https://reviews.llvm.org/D73089
2020-01-23 15:53:53 +07:00
Daniil Suchkov 6fc9e60149 NFC. Remove obsolete SimpleAnalysis infrastructure
Apparently cache of AliasSetTrackers held by LICM was the only user of
SimpleAnalysis infrastructure. Now, given that we no longer have that
cache, this infrastructure is obsolete and, taking into account its
nature, we don't want any new solutions to be based on it.

Reviewers: asbirlea, fhahn, efriedma, reames

Reviewed-By: asbirlea

Differential Revision: https://reviews.llvm.org/D73085
2020-01-23 13:58:30 +07:00
Daniil Suchkov 53a28bd891 [LICM] NFC. Remove AST caching infrastructure
Since LICM doesn't use AST caching any more (see D73081), this
infrastructure is now obsolete and we can remove it.

Reviewers: asbirlea, fhahn, efriedma, reames

Reviewed-By: asbirlea

Differential Revision: https://reviews.llvm.org/D73084
2020-01-23 12:33:50 +07:00
Florian Hahn f14f2a8568 [LV] Fix predication for branches with matching true and false succs.
Currently due to the edge caching, we create wrong predicates for
branches with matching true and false successors. We will cache the
condition for the edge from the true successor, and then lookup the same
edge (src and dst are the same) for the edge to the false successor.

If both successors match, the condition should always be true. At the
moment, we cannot really create constant VPValues, but we can just
create a true condition as X | !X. Later passes will clean that up.

Fixes PR44488.

Reviewers: rengolin, hsaito, fhahn, Ayal, dorit, gilr

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D73079
2020-01-22 18:34:11 -08:00
Jonas Devlieghere cf2b498d28 [llvm/Transforms] Fix warning: private field 'MSSA' is not used 2020-01-22 18:07:53 -08:00
Alina Sbirlea adc4faf532 [IndVarSimplify] Teach IndVarSimplify to preserve MemorySSA. 2020-01-22 16:33:17 -08:00
Alina Sbirlea b5b6126d97 [IndVarSimplify] Cleanup spaces and reduce variable scope [NFCI]
Minor clean-ups + clang-format.
2020-01-22 15:32:20 -08:00
Alina Sbirlea 6baf31b7c1 [LoopIdiomRecognize] Reduce variable scope. [NFCI] 2020-01-22 15:30:08 -08:00
Nikita Popov 0b83c5a78f [InstCombine] Combine neg of shl of sub (PR44529)
Fixes https://bugs.llvm.org/show_bug.cgi?id=44529. We already have
a combine to sink a negation through a left-shift, but it currently
only works if the shift operand is negatable without creating any
instructions. This patch introduces freelyNegateValue() as a more
powerful extension of dyn_castNegVal(), which allows negating a
value as long as this doesn't end up increasing instruction count.
Specifically, this patch adds support for negating A-B to B-A.

This mechanism could in the future be extended to handle general
negation chains that a) start at a proper 0-X negation and b) only
require one operand to be freely negatable. This would end up as a
weaker form of D68408 aimed at the most obviously profitable subset
that eliminates a negation entirely.

Differential Revision: https://reviews.llvm.org/D72978
2020-01-22 23:03:58 +01:00
Nikita Popov efba7ed05e [PatternMatch] Make m_c_ICmp swap the predicate (PR42801)
This addresses https://bugs.llvm.org/show_bug.cgi?id=42801.
The m_c_ICmp() matcher is changed to provide the swapped predicate
if the operands are swapped.

Existing uses of m_c_ICmp() fall in one of two categories: Working
on equality predicates only, where swapping is irrelevant.
Or performing a manual swap, in which case this patch removes it.

The only exception is the foldICmpWithLowBitMaskedVal() fold, which
does not swap the predicate, and instead reasons about whether
a swap occurred or not for each predicate. Getting the swapped
predicate allows us to merge the logic for pairs of predicates,
instead of duplicating it.

Differential Revision: https://reviews.llvm.org/D72976
2020-01-22 22:56:26 +01:00
Alina Sbirlea efb130fc93 [LoopDeletion] Teach LoopDeletion to preserve MemorySSA if available.
If MemorySSA analysis is analysis, LoopDeletion now preserves it.
2020-01-22 11:38:38 -08:00
Sanjay Patel 0ade2abdb0 [InstCombine] fneg(X + C) --> -C - X
This is 1 of the potential folds uncovered by extending D72521.

We don't seem to do this in the backend either (unless I'm not
seeing some target-specific transform).

icc and gcc (appears to be target-specific) do this transform.

Differential Revision: https://reviews.llvm.org/D73057
2020-01-22 09:48:43 -05:00
Guillaume Chatelet 0957233320 [Alignment][NFC] Use Align with CreateMaskedStore
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D73106
2020-01-22 11:04:39 +01:00
Daniil Suchkov 7bdc83f340 [LICM] Don't cache AliasSetTrackers when run under legacy PM
Summary:
This is the first step towards complete removal of AST caching from
LICM. Attempts to keep LICM's AST cache up to date across passes can lead
to miscompiles like this one: https://bugs.llvm.org/show_bug.cgi?id=44320.

LICM has already switched to using MemorySSA to do sinking and hoisting
and only builds an AliasSetTracker on demand for the promoteToScalars
step, without caching it from one LICM instance to the next. Given this,
we don't have compile-time reasons to keep AST caching any more.
The only scenario where the caching would be used currently is when
using the LegacyPassManager and setting -enable-mssa-loop-dependency=false.

This switch should help us to surface any possible issues that may arise
along this way, also it turns subsequent removal of AST caching into NFC.

Reviewers: asbirlea, fhahn, efriedma, reames

Reviewed By: asbirlea

Subscribers: hiraditya, george.burgess.iv, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73081
2020-01-22 13:16:45 +07:00
Andrei Elovikov e1d6d36852 [SLP] Don't allow Div/Rem as alternate opcodes
Summary:
We don't have control/verify what will be the RHS of the division, so it might
happen to be zero, causing UB.

Reviewers: Vasilis, RKSimon, ABataev

Reviewed By: ABataev

Subscribers: vporpo, ABataev, hiraditya, llvm-commits, vdmitrie

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D72740
2020-01-21 15:21:17 -08:00
Florian Hahn f42994f228 [Matrix] Hide and describe matrix-propagate-shape option. 2020-01-21 14:28:47 -08:00
Guillaume Chatelet bc8a1ab26f [Alignment][NFC] Use Align with CreateMaskedLoad
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D73087
2020-01-21 14:13:22 +01:00
Sanjay Patel 7bee94410c [InstCombine] form copysign from select of FP constants (PR44153)
This should be the last step needed to solve the problem in the
description of PR44153:
https://bugs.llvm.org/show_bug.cgi?id=44153

If we're casting an FP value to int, testing its signbit, and then
choosing between a value and its negated value, that's a
complicated way of saying "copysign":

(bitcast X) <  0 ? -TC :  TC --> copysign(TC,  X)

Differential Revision: https://reviews.llvm.org/D72643
2020-01-20 10:51:14 -05:00
Guillaume Chatelet 46b9563cf6 [Alignment][NFC] Use Align with CreateElementUnorderedAtomicMemCpy
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet, nicolasvasilache

Subscribers: hiraditya, jfb, mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, csigg, arpith-jacob, mgester, lucyrfox, herhut, liufengdb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73041
2020-01-20 15:39:45 +01:00
Evgeniy Brevnov af7e158872 [LV] Vectorizer should adjust trip count in profile information
Summary: Vectorized loop processes VFxUF number of elements in one iteration thus total number of iterations decreases proportionally. In addition epilog loop may not have more than VFxUF - 1 iterations. This patch updates profile information accordingly.

Reviewers: hsaito, Ayal, fhahn, reames, silvas, dcaballe, SjoerdMeijer, mkuper, DaniilSuchkov

Reviewed By: Ayal, DaniilSuchkov

Subscribers: fedor.sergeev, hiraditya, rkruppe, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67905
2020-01-20 18:36:28 +07:00
Evgeniy Brevnov cfe97681cd [NFC][LoopUtils] Minor change in comment according to review D71990. 2020-01-20 17:10:10 +07:00
Evgeniy Brevnov 10357e1c89 [LoopUtils] Better accuracy for getLoopEstimatedTripCount.
Summary: Current implementation of getLoopEstimatedTripCount returns 1 iteration less than it should. The reason is that in bottom tested loop first iteration is executed before first back branch is taken. For example for loop with !{!"branch_weights", i32 1 // taken, i32 1 // exit} metadata getLoopEstimatedTripCount gives 1 while actual number of iterations is 2.

Reviewers: Ayal, fhahn

Reviewed By: Ayal

Subscribers: mgorny, hiraditya, zzheng, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71990
2020-01-20 16:58:07 +07:00
Sjoerd Meijer 93175a5caa [IndVarSimplify][LoopUtils] rewriteLoopExitValues. NFCI
This moves `rewriteLoopExitValues()` from IndVarSimplify to LoopUtils thus
making it a generic loop utility function.  This allows to rewrite loop exit
values by just calling this function without running the whole IndVarSimplify
pass.

We use this in D72714 to rematerialise the iteration count in exit blocks, so
that we can clean-up loop update expressions inside the hardware-loops later.

Differential Revision: https://reviews.llvm.org/D72602
2020-01-20 09:05:00 +00:00
Matt Arsenault a4451d88ee Consolidate internal denormal flushing controls
Currently there are 4 different mechanisms for controlling denormal
flushing behavior, and about as many equivalent frontend controls.

- AMDGPU uses the fp32-denormals and fp64-f16-denormals subtarget features
- NVPTX uses the nvptx-f32ftz attribute
- ARM directly uses the denormal-fp-math attribute
- Other targets indirectly use denormal-fp-math in one DAGCombine
- cl-denorms-are-zero has a corresponding denorms-are-zero attribute

AMDGPU wants a distinct control for f32 flushing from f16/f64, and as
far as I can tell the same is true for NVPTX (based on the attribute
name).

Work on consolidating these into the denormal-fp-math attribute, and a
new type specific denormal-fp-math-f32 variant. Only ARM seems to
support the two different flush modes, so this is overkill for the
other use cases. Ideally we would error on the unsupported
positive-zero mode on other targets from somewhere.

Move the logic for selecting the flush mode into the compiler driver,
instead of handling it in cc1. denormal-fp-math/denormal-fp-math-f32
are now both cc1 flags, but denormal-fp-math-f32 is not yet exposed as
a user flag.

-cl-denorms-are-zero, -fcuda-flush-denormals-to-zero and
-fno-cuda-flush-denormals-to-zero will be mapped to
-fp-denormal-math-f32=ieee or preserve-sign rather than the old
attributes.

Stop emitting the denorms-are-zero attribute for the OpenCL flag. It
has no in-tree users. The meaning would also be target dependent, such
as the AMDGPU choice to treat this as only meaning allow flushing of
f32 and not f16 or f64. The naming is also potentially confusing,
since DAZ in other contexts refers to instructions implicitly treating
input denormals as zero, not necessarily flushing output denormals to
zero.

This also does not attempt to change the behavior for the current
attribute. The LangRef now states that the default is ieee behavior,
but this is inaccurate for the current implementation. The clang
handling is slightly hacky to avoid touching the existing
denormal-fp-math uses. Fixing this will be left for a future patch.

AMDGPU is still using the subtarget feature to control the denormal
mode, but the new attribute are now emitted. A future change will
switch this and remove the subtarget features.
2020-01-17 20:09:53 -05:00
Alina Sbirlea 9f6c6ee6b9 [MemDepAnalysis/VNCoercion] Move static method to its only use. [NFCI]
Static method MemoryDependenceResults::getLoadLoadClobberFullWidthSize
does not have or use any info specific to MemoryDependenceResults.
Move it to its only user: VNCoercion.
2020-01-17 15:18:42 -08:00
Petr Hosek d3db13af7e [profile] Support counter relocation at runtime
This is an alternative to the continous mode that was implemented in
D68351. This mode relies on padding and the ability to mmap a file over
the existing mapping which is generally only available on POSIX systems
and isn't suitable for other platforms.

This change instead introduces the ability to relocate counters at
runtime using a level of indirection. On every counter access, we add a
bias to the counter address. This bias is stored in a symbol that's
provided by the profile runtime and is initially set to zero, meaning no
relocation. The runtime can mmap the profile into memory at abitrary
location, and set bias to the offset between the original and the new
counter location, at which point every subsequent counter access will be
to the new location, which allows updating profile directly akin to the
continous mode.

The advantage of this implementation is that doesn't require any special
OS support. The disadvantage is the extra overhead due to additional
instructions required for each counter access (overhead both in terms of
binary size and performance) plus duplication of counters (i.e. one copy
in the binary itself and another copy that's mmapped).

Differential Revision: https://reviews.llvm.org/D69740
2020-01-17 15:02:23 -08:00
Peter Collingbourne cd40bd0a32 hwasan: Move .note.hwasan.globals note to hwasan.module_ctor comdat.
As of D70146 lld GCs comdats as a group and no longer considers notes in
comdats to be GC roots, so we need to move the note to a comdat with a GC root
section (.init_array) in order to prevent lld from discarding the note.

Differential Revision: https://reviews.llvm.org/D72936
2020-01-17 13:40:52 -08:00
Drew Wock 0bcfafc5e7 [SeparateConstOffsetFromGEP] Fix: sext(a) + sext(b) -> sext(a + b) matches add and sub instructions with one another
During the SeparateConstOffsetFromGEP pass, signed extensions are distributed
to the values that feed into them and then later recombined. The recombination
stage is somewhat problematic- it doesn't differ add and sub instructions
from another when matching the sext(a) +/- sext(b) -> sext(a +/- b) pattern
in some instances.

An example- the IR contains:
%unextendedA
%unextendedB
%subuAuB = unextendedA - unextendedB
%extA = extend A
%extB = extend B
%addeAeB = extA + extB

The problematic optimization will transform that into:

%unextendedA
%unextendedB
%subuAuB = unextendedA - unextendedB
%extA = extend A
%extB = extend B
%addeAeB = extend subuAuB ; Obviously not semantically equivalent to the IR input.

This patch fixes that.

Patch by Drew Wock <drew.wock@sas.com>
Differential Revision: https://reviews.llvm.org/D65967
2020-01-17 12:22:52 -05:00
Nikita Popov 522c030aa9 [InstCombine] Fix worklist management in DSE (PR44552)
Fixes https://bugs.llvm.org/show_bug.cgi?id=44552. We need to make
sure that the store is reprocessed, because performing DSE may
expose more DSE opportunities.

There is a slight caveat here though: We need to make sure that we
add back the store the worklist first, because that means it will
be processed after the operands of the removed store have been
processed. This is a general bug in InstCombine worklist management
that I hope to address at some point, but for now it means we need
to do this manually rather than just returning the instruction as
changed.

Differential Revision: https://reviews.llvm.org/D72807
2020-01-17 18:10:56 +01:00
Nikita Popov 77befe54f7 [InstCombine] Fix worklist management in return combine
There are two related bugs here: First, we don't add the operand
we're replacing to the worklist, which means it may not get DCEd
(see test change). Second, usually this would just get picked up
in the next iteration, but we also do not report the instruction
as changed. This means that we do not get that extra instcombine
iteration, and more importantly, may break the pass pipeline, as
the function is not marked as changed.

Differential Revision: https://reviews.llvm.org/D72864
2020-01-17 17:59:23 +01:00
Nikita Popov 2ca092f320 [InstCombine] Support disabling expensive combines in opt
Currently, there is no way to disable ExpensiveCombines when doing
a standalone opt -instcombine run, as that's the default, and the
opt option can currently only be used to force enable, not to force
disable. The only way to disable expensive combines is via -O1 or -O2,
but that of course also runs the rest of the kitchen sink...

This patch allows using opt -instcombine -expensive-combines=0 to
run InstCombine without ExpensiveCombines.

Differential Revision: https://reviews.llvm.org/D72861
2020-01-17 17:56:20 +01:00
Matt Arsenault 3ef8cdf666 AMDGPU: Do permlane16 vdst_in discard optimization in InstCombine
There's more potential value to discarding the source value earlier,
since we always know the value of the fi/bc bits.
2020-01-16 17:27:53 -05:00
Kazu Hirata 53b68e676f Resubmit: [JumpThreading] Thread jumps through two basic blocks
This reverts commit 2d258ed931.  This
revision fixes the Windows build and adds a testcase for it, namely
thread-two-bbs3.ll.  My original patch improperly copied EH pads on
Windows.  This patch disregards jump threading opportunities having to
do with EH pads.

[JumpThreading] Thread jumps through two basic blocks

Summary:
This patch teaches JumpThreading.cpp to thread through two basic
blocks like:

  bb3:
    %var = phi i32* [ null, %bb1 ], [ @a, %bb2 ]
    %tobool = icmp eq i32 %cond, 0
    br i1 %tobool, label %bb4, label ...

  bb4:
    %cmp = icmp eq i32* %var, null
    br i1 %cmp, label bb5, label bb6

by duplicating basic blocks like bb3 above.  Once we duplicate bb3 as
bb3.dup and redirect edge bb2->bb3 to bb2->bb3.dup, we have:

  bb3:
    %var = phi i32* [ @a, %bb2 ]
    %tobool = icmp eq i32 %cond, 0
    br i1 %tobool, label %bb4, label ...

  bb3.dup:
    %var = phi i32* [ null, %bb1 ]
    %tobool = icmp eq i32 %cond, 0
    br i1 %tobool, label %bb4, label ...

  bb4:
    %cmp = icmp eq i32* %var, null
    br i1 %cmp, label bb5, label bb6

Then the existing code in JumpThreading.cpp can thread edge
bb3.dup->bb4 through bb4 and eventually create bb3.dup->bb5.

Reviewers: wmi

Subscribers: hiraditya, jfb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70247
2020-01-16 12:33:37 -08:00
Arkady Shlykov c87982b467 Revert "[Loop Peeling] Add possibility to enable peeling on loop nests."
This reverts commit 3f3017e because there's a failure on peel-loop-nests.ll
with LLVM_ENABLE_EXPENSIVE_CHECKS on.

Differential Revision: https://reviews.llvm.org/D70304
2020-01-16 10:33:38 -08:00
Fedor Sergeev 3478551bf3 [GVN] introduce GVNOptions to control GVN pass behavior
There are a few global (cl::opt) controls that enable optional
behavior in GVN. Introduce GVNOptions that provide corresponding
per-pass instance controls.

That will allow to use GVN multiple times in pipeline each time
with different settings.

Reviewers: asbirlea, rnk, reames, skatkov, fhahn
Reviewed By: fhahn

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D72732
2020-01-16 20:21:08 +03:00
Mircea Trofin 7acfda633f [llvm] Make new pass manager's OptimizationLevel a class
Summary:
The old pass manager separated speed optimization and size optimization
levels into two unsigned values. Coallescing both in an enum in the new
pass manager may lead to unintentional casts and comparisons.

In particular, taking a look at how the loop unroll passes were constructed
previously, the Os/Oz are now (==new pass manager) treated just like O3,
likely unintentionally.

This change disallows raw comparisons between optimization levels, to
avoid such unintended effects. As an effect, the O{s|z} behavior changes
for loop unrolling and loop unroll and jam, matching O2 rather than O3.

The change also parameterizes the threshold values used for loop
unrolling, primarily to aid testing.

Reviewers: tejohnson, davidxl

Reviewed By: tejohnson

Subscribers: zzheng, ychen, mehdi_amini, hiraditya, steven_wu, dexonsmith, dang, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D72547
2020-01-16 09:00:56 -08:00
Francesco Petrogalli 66c120f025 [VectorUtils] Rework the Vector Function Database (VFDatabase).
Summary:
This commits is a rework of the patch in
https://reviews.llvm.org/D67572.

The rework was requested to prevent out-of-tree performance regression
when vectorizing out-of-tree IR intrinsics. The vectorization of such
intrinsics is enquired via the static function `isTLIScalarize`. For
detail see the discussion in https://reviews.llvm.org/D67572.

Reviewers: uabelho, fhahn, sdesmalen

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D72734
2020-01-16 15:08:26 +00:00
Simon Pilgrim 23a887b0dd Fix unused variable warning. NFCI. 2020-01-16 13:02:40 +00:00
Florian Hahn 23c113802e [LV] Allow assume calls in predicated blocks.
The assume intrinsic is intentionally marked as may reading/writing
memory, to avoid passes moving them around. When flattening the CFG
for predicated blocks, we have to drop the assume calls, as they
are control-flow dependent.

There are some cases where we can do better (when control flow is
preserved), but that is follow-up work.

Fixes PR43620.

Reviewers: hsaito, rengolin, dcaballe, Ayal

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D68814
2020-01-16 10:11:35 +00:00
Sameer Sahasrabuddhe ed181efa17 [HIP][AMDGPU] expand printf when compiling HIP to AMDGPU
Summary:
This change implements the expansion in two parts:
- Add a utility function emitAMDGPUPrintfCall() in LLVM.
- Invoke the above function from Clang CodeGen, when processing a HIP
  program for the AMDGPU target.

The printf expansion has undefined behaviour if the format string is
not a compile-time constant. As a sufficient condition, the HIP
ToolChain now emits -Werror=format-nonliteral.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D71365
2020-01-16 15:15:38 +05:30
Vedant Kumar 360abb7ee5 [CodeExtractor] Transfer debug info to extracted function
After extracting, fix up debug info in both the old and new functions by

1) Pointing line locations and debug intrinsics to the new subprogram
   scope, and

2) Deleting intrinsics which point to values outside of the new
   function.

Depends on https://reviews.llvm.org/D72795.

Testing: check-llvm, check-clang, a build of LNT in the `-Os -g` config
with "-mllvm -hot-cold-split=1" set, and end-to-end debugging of a toy
program which undergoes splitting to verify that lldb can find
variables, single step, etc. in extracted code.

rdar://45507940

Differential Revision: https://reviews.llvm.org/D72801
2020-01-15 15:38:36 -08:00
Fedor Sergeev 8a4d12ae5b [BasicBlock] add helper getPostdominatingDeoptimizeCall
It appears to be rather useful when analyzing Loops with multiple
deoptimizing exits, perhaps merged ones.
For now it is used in LoopPredication, will be adding more uses
in other loop passes.

Reviewers: asbirlea, fhahn, skatkov, spatel, reames
Reviewed By: reames

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D72754
2020-01-16 01:15:57 +03:00
Mircea Trofin 5466597fee [NFC] Refactor InlineResult for readability
Summary:
InlineResult is used both in APIs assessing whether a call site is
inlinable (e.g. llvm::isInlineViable) as well as in the function
inlining utility (llvm::InlineFunction). It means slightly different
things (can/should inlining happen, vs did it happen), and the
implicit casting may introduce ambiguity (casting from 'false' in
InlineFunction will default a message about hight costs,
which is incorrect here).

The change renames the type to a more generic name, and disables
implicit constructors.

Reviewers: eraman, davidxl

Reviewed By: davidxl

Subscribers: kerbowa, arsenm, jvesely, nhaehnle, eraman, hiraditya, haicheng, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D72744
2020-01-15 13:34:20 -08:00
Zhongduo Lin 34ba96a3d4 [NFC][IndVarSimplify] remove duplicate code in widenWithVariantLoadUseCodegen.
Summary: Duplicate code in widenWithVariantLoadUseCodegen is removed and also use assert to check unknown extension type as it should be filtered out by the pre condition check before calling this function.

Reviewers: az, sanjoy, sebpop, efriedma, javed.absar, sanjoy.google

Reviewed By: efriedma

Subscribers: hiraditya, llvm-commits, amehsan

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D72652
2020-01-15 16:27:58 -05:00
Vedant Kumar a2cc80bc95 DebugInfo: Factor out logic to update locations in MD_loop metadata, NFC
Factor out the logic needed to update debug locations contained within
MD_loop metadata.

This refactor is preparation for a future change that also needs to
rewrite MD_loop metadata.

rdar://45507940
2020-01-15 13:02:36 -08:00
Arkady Shlykov 3f3017e162 [Loop Peeling] Add possibility to enable peeling on loop nests.
Summary:
Current peeling implementation bails out in case of loop nests.
The patch introduces a field in TargetTransformInfo structure that
certain targets can use to relax the constraints if it's
profitable (disabled by default).
Also additional option is added to enable peeling manually for
experimenting and testing purposes.

Reviewers: fhahn, lebedev.ri, xbolva00

Reviewed By: xbolva00

Subscribers: xbolva00, hiraditya, zzheng, llvm-commits

Differential Revision: https://reviews.llvm.org/D70304
2020-01-15 08:25:21 -08:00
Sanjay Patel 3180af4362 [InstCombine] reassociate fsub+fsub into fsub+fadd
As discussed in the motivating PR44509:
https://bugs.llvm.org/show_bug.cgi?id=44509

...we can end up with worse code using fast-math than without.
This is because the reassociate pass greedily transforms fsub
into fneg/fadd and apparently (based on the regression tests
seen here) expects instcombine to clean that up if it wasn't
profitable. But we were missing this fold:

(X - Y) - Z --> X - (Y + Z)

There's another, more specific case that I think we should
handle as shown in the "fake" fneg test (but missed with a real
fneg), but that's another patch. That may be tricky to get
right without conflicting with existing transforms for fneg.

Differential Revision: https://reviews.llvm.org/D72521
2020-01-15 11:14:13 -05:00
Hideto Ueno 188f9a348d [Attributor] AAValueConstantRange: Value range analysis using constant range
Summary:
This patch introduces `AAValueConstantRange`, which answers a possible range for integer value in a specific program point.
One of the motivations is propagating existing `range` metadata. (I think we need to change the situation that `range` metadata cannot be put to Argument).

The state is a tuple of `ConstantRange` and it is initialized to (known, assumed) = ([-∞, +∞], empty).

Currently, AAValueConstantRange is created in `getAssumedConstant` method when `AAValueSimplify` returns `nullptr`(worst state).

Supported
 - BinaryOperator(add, sub, ...)
 - CmpInst(icmp eq, ...)
 - !range metadata

`AAValueConstantRange` is not intended to extend to polyhedral range value analysis.

Reviewers: jdoerfert, sstefan1

Reviewed By: jdoerfert

Subscribers: phosek, davezarzycki, baziotis, hiraditya, javed.absar, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71620
2020-01-15 16:34:23 +09:00
Nikita Popov 04e586151e [InstCombine] Fix worklist management when removing guard intrinsic
When multiple guard intrinsics are merged into one, currently the
result of eraseInstFromFunction() is returned -- however, this
should only be done if the current instruction is being removed.
In this case we're removing a different instruction and should
instead report that the current one has been modified by returning it.

For this test case, this reduces the number of instcombine iterations
from 5 to 2 (the minimum possible).

Differential Revision: https://reviews.llvm.org/D72558
2020-01-14 21:47:48 +01:00
Nikita Popov 410331869d [NewPM] Port MergeFunctions pass
This ports the MergeFunctions pass to the NewPM. This was rather
straightforward, as no analyses are used.

Additionally MergeFunctions needs to be conditionally enabled in
the PassBuilder, but I left that part out of this patch.

Differential Revision: https://reviews.llvm.org/D72537
2020-01-14 20:55:41 +01:00
Nikita Popov 65c0805be5 [InstCombine] Fix infinite loop due to bitcast <-> phi transforms
Fix for https://bugs.llvm.org/show_bug.cgi?id=44245.

The optimizeBitCastFromPhi() and FoldPHIArgOpIntoPHI() end up
fighting against each other, because optimizeBitCastFromPhi()
assumes that bitcasts of loads will get folded. This doesn't
happen here, because a dangling phi node prevents the one-use
fold in https://github.com/llvm/llvm-project/blob/master/llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp#L620-L628 from triggering.

This patch fixes the issue by explicitly performing the load
combine as part of the bitcast of phi transform. Other attempts
to force the load to be combined first were ultimately too
unreliable.

Differential Revision: https://reviews.llvm.org/D71164
2020-01-14 20:45:13 +01:00
Nikita Popov b4dd928ffb [InstCombine] Make combineLoadToNewType a method; NFC
So it can be reused as part of other combines.
In particular for D71164.
2020-01-14 20:40:03 +01:00
Nikita Popov 652cd7c100 [InstCombine] Fix user iterator invalidation in bitcast of phi transform
This fixes the issue encountered in D71164. Instead of using a
range-based for, manually iterate over the users and advance the
iterator beforehand, so we do not skip any users due to iterator
invalidation.

Differential Revision: https://reviews.llvm.org/D72657
2020-01-14 20:38:10 +01:00
Teresa Johnson 2cefb93951 [ThinLTO/WPD] Remove an overly-aggressive assert
Summary:
An assert added to the index-based WPD was trying to verify that we only
have multiple vtables for a given guid when they are all non-external
linkage. This is too conservative because we may have multiple external
vtable with the same guid when they are in comdat. Remove the assert,
as we don't have comdat information in the index, the linker should
issue an error in this case.

See discussion on D71040 for more information.

Reviewers: evgeny777, aganea

Subscribers: mehdi_amini, inglorion, hiraditya, steven_wu, dexonsmith, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D72648
2020-01-14 10:57:14 -08:00
Juneyoung Lee 3e32b7e127 [InstCombine] Let combineLoadToNewType preserve ABI alignment of the load (PR44543)
Summary:
If aligment on `LoadInst` isn't specified, load is assumed to be ABI-aligned.
And said aligment may be different for different types.
So if we change load type, but don't pay extra attention to the aligment
(i.e. keep it unspecified), we may either overpromise (if the default aligment
of the new type is higher), or underpromise (if the default aligment
of the new type is smaller).

Thus, if no alignment is specified, we need to manually preserve the implied ABI alignment.

This addresses https://bugs.llvm.org/show_bug.cgi?id=44543 by making combineLoadToNewType preserve ABI alignment of the load.

Reviewers: spatel, lebedev.ri

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D72710
2020-01-15 03:20:53 +09:00
Dmitri Gribenko 2948ec5ca9 Removed PointerUnion3 and PointerUnion4 aliases in favor of the variadic template 2020-01-14 18:56:29 +01:00
Florian Hahn 192cce10f6 Revert "Recommit "[GlobalOpt] Pass DTU to removeUnreachableBlocks instead of recomputing.""
This reverts commit a03d7b0f24.

As discussed in D68298, this causes a compile-time regression, in case
the DTs requested are not used elsewhere in GlobalOpt. We should only
get the DTs if they are available here, but this seems not possible with
the legacy pass manager from a module pass.
2020-01-14 14:50:07 +00:00
Benjamin Kramer df186507e1 Make helper functions static or move them into anonymous namespaces. NFC. 2020-01-14 14:06:37 +01:00
Hiroshi Yamauchi 7b9f8e17d1 [PGO][CHR] Guard against 0-to-0 branch weight and avoid division by zero crash.
Summary: This fixes a crash in internal builds under SamplePGO.

Reviewers: davidxl

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D72653
2020-01-13 14:38:58 -08:00
Teresa Johnson 31441a3e00 [ThinLTO/WPD] Fix index-based WPD for alias vtables
Summary:
A recent fix in D69452 fixed index based WPD in the presence of
available_externally vtables. It added a cast of the vtable def
summary to a GlobalVarSummary. However, in some cases one def may be an
alias, in which case we need to get the base object before casting,
otherwise we will crash.

Reviewers: evgeny777, steven_wu, aganea

Subscribers: mehdi_amini, inglorion, hiraditya, dexonsmith, arphaman, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71040
2020-01-13 13:38:26 -08:00
Simon Pilgrim 2740b2d5d5 Fix uninitialized value clang static analyzer warning. NFC. 2020-01-11 16:02:22 +00:00
Nuno Lopes 87407fc03c DSE: fix bug where we would only check libcalls for name rather than whole decl 2020-01-11 11:57:29 +00:00
Nikita Popov 0e322c8a1f [InstCombine] Preserve nuw on sub of geps (PR44419)
Fix https://bugs.llvm.org/show_bug.cgi?id=44419 by preserving the
nuw on sub of geps. We only do this if the offset has a multiplication
as the final operation, as we can't be sure the operations is nuw
in the other cases without more thorough analysis.

Differential Revision: https://reviews.llvm.org/D72048
2020-01-11 11:01:12 +01:00
Andrew Paverd bdd88b7ed3 Add support for __declspec(guard(nocf))
Summary:
Avoid using the `nocf_check` attribute with Control Flow Guard. Instead, use a
new `"guard_nocf"` function attribute to indicate that checks should not be
added on indirect calls within that function. Add support for
`__declspec(guard(nocf))` following the same syntax as MSVC.

Reviewers: rnk, dmajor, pcc, hans, aaron.ballman

Reviewed By: aaron.ballman

Subscribers: aaron.ballman, tomrittervg, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D72167
2020-01-10 16:04:12 +00:00
Simon Pilgrim 2e66405d8d Don't use dyn_cast_or_null if we know the pointer is nonnull.
Fix clang static analyzer null dereference warning by using dyn_cast instead.
2020-01-10 10:32:36 +00:00
Benjamin Kramer 498856fca5 [LV] Silence unused variable warning in Release builds. NFC. 2020-01-10 11:21:27 +01:00
Gil Rapaport 8647a72c4a [LV] VPValues for memory operation pointers (NFCI)
Memory instruction widening recipes use the pointer operand of their load/store
ingredient for generating the needed GEPs, making it difficult to feed these
recipes with pointers based on other ingredients or none at all.
This patch modifies these recipes to use a VPValue for the pointer instead, in
order to reduce ingredient def-use usage by ILV as a step towards full
VPlan-based def-use relations. The recipes are constructed with VPValues bound
to these ingredients, maintaining current behavior.

Differential revision: https://reviews.llvm.org/D70865
2020-01-10 09:24:59 +02:00
Whitney Tsang d27a15fed7 [NFCI][LoopUnrollAndJam] Changing LoopUnrollAndJamPass to a function
pass.

Summary: This patch changes LoopUnrollAndJamPass to a function pass, and
keeps the loops traversal order same as defined in
FunctionToLoopPassAdaptor LoopPassManager.h.

The next patch will change the loop traversal to outer to inner order,
so more loops can be transform.

Discussion in llvm-dev mailing list:
https://groups.google.com/forum/#!topic/llvm-dev/LF4rUjkVI2g
Reviewer: dmgreen, jdoerfert, Meinersbur, kbarton, bmahjour, etiotto
Reviewed By: dmgreen
Subscribers: hiraditya, zzheng, llvm-commits
Tag: LLVM
Differential Revision: https://reviews.llvm.org/D72230
2020-01-09 16:18:36 +00:00
@raghesh (Raghesh Aloor) 6c04ef472a [InstCombine] Z / (1.0 / Y) => (Y * Z)
This is a special case of Z / (X / Y) => (Y * Z) / X, with X = 1.0.
The m_OneUse check is avoided because even in the case of the
multiple uses for 1.0/Y, the number of instructions remain the same
and a division is replaced by a multiplication.

Differential Revision: https://reviews.llvm.org/D72319
2020-01-09 10:52:39 -05:00
Florian Hahn ccf24225e3 [Matrix] Update shape propagation to iterate until done.
This patch updates the shape propagation to iterate until no new shape
information is discovered.

As initial seed for the forward propagation, we use the matrix intrinsic
instructions. Both propagateShapeForward and propagateShapeBackward
return new work lists, with the instructions to be used for the next
iteration. When propagating forward, we record all instructions we added
new shape information for. When propagating backward, we record all
users of instructions we added new shape information for.

Reviewers: anemet, Gerolf, reames, hfinkel, andrew.w.kaylor

Reviewed By: anemet

Differential Revision: https://reviews.llvm.org/D70901
2020-01-09 10:52:52 +00:00
Florian Hahn 7adf6644f5 [Matrix] Propagate and use shape information for loads.
This patch extends to shape propagation to also include load
instructions and implements shape aware lowering for vector loads.

Reviewers: anemet, Gerolf, reames, hfinkel, andrew.w.kaylor

Reviewed By: anemet

Differential Revision: https://reviews.llvm.org/D70900
2020-01-09 10:21:20 +00:00
Evgeniy Brevnov f0abe820ee [LoopUtils][NFC] Minor refactoring in getLoopEstimatedTripCount. 2020-01-09 16:49:15 +07:00
Florian Hahn 459ad8e97e [Matrix] Implement back-propagation of shape information.
This patch extends the shape propagation for matrix operations to also
propagate the shape of instructions to their operands.

Reviewers: anemet, Gerolf, reames, hfinkel, andrew.w.kaylor

Reviewed By: anemet

Differential Revision: https://reviews.llvm.org/D70899
2020-01-09 09:48:07 +00:00
Sjoerd Meijer 8f1887456a [LV] Still vectorise when tail-folding can't find a primary inducation variable
This addresses a vectorisation regression for tail-folded loops that are
counting down, e.g. loops as simple as this:

  void foo(char *A, char *B, char *C, uint32_t N) {
    while (N > 0) {
      *C++ = *A++ + *B++;
       N--;
    }
  }

These are loops that can be vectorised, but when tail-folding is requested, it
can't find a primary induction variable which we do need for predicating the
loop. As a result, the loop isn't vectorised at all, which it is able to do
when tail-folding is not attempted. So, this adds a check for the primary
induction variable where we decide how to lower the scalar epilogue. I.e., when
there isn't a primary induction variable, a scalar epilogue loop is allowed
(i.e. don't request tail-folding) so that vectorisation could still be
triggered.

Having this check for the primary induction variable make sense anyway, and in
addition, in a follow-up of this I will look into discovering earlier the
primary induction variable for counting down loops, so that this can also be
tail-folded.

Differential revision: https://reviews.llvm.org/D72324
2020-01-09 09:14:00 +00:00
Johannes Doerfert a4088c75cc [Attributor][FIX] Carefully change invokes to calls (after manifest)
Before we manually inserted unreachable early but that could lead to
broken PHI nodes. Now we use the existing late modification
functionality.
2020-01-08 19:32:38 -06:00
Johannes Doerfert 1e46eb74be [Attributor][FIX] Avoid dangling value pointers during code modification
When we replace instructions with unreachable we delete instructions. We
now avoid dangling pointers to those deleted instructions in the
`ToBeChangedToUnreachableInsts` set. Other modification collections
might need to be updated in the future as well.
2020-01-08 19:32:37 -06:00
Kazu Hirata 2d258ed931 Revert "[JumpThreading] Thread jumps through two basic blocks"
It looks like my patch breaks the sanitizer-windows build:

http://lab.llvm.org:8011/builders/sanitizer-windows/builds/56324

This reverts commit ead815924e.
2020-01-08 13:58:39 -08:00
Kazu Hirata ead815924e [JumpThreading] Thread jumps through two basic blocks
Summary:
This patch teaches JumpThreading.cpp to thread through two basic
blocks like:

  bb3:
    %var = phi i32* [ null, %bb1 ], [ @a, %bb2 ]
    %tobool = icmp eq i32 %cond, 0
    br i1 %tobool, label %bb4, label ...

  bb4:
    %cmp = icmp eq i32* %var, null
    br i1 %cmp, label bb5, label bb6

by duplicating basic blocks like bb3 above.  Once we duplicate bb3 as
bb3.dup and redirect edge bb2->bb3 to bb2->bb3.dup, we have:

  bb3:
    %var = phi i32* [ @a, %bb2 ]
    %tobool = icmp eq i32 %cond, 0
    br i1 %tobool, label %bb4, label ...

  bb3.dup:
    %var = phi i32* [ null, %bb1 ]
    %tobool = icmp eq i32 %cond, 0
    br i1 %tobool, label %bb4, label ...

  bb4:
    %cmp = icmp eq i32* %var, null
    br i1 %cmp, label bb5, label bb6

Then the existing code in JumpThreading.cpp can thread edge
bb3.dup->bb4 through bb4 and eventually create bb3.dup->bb5.

Reviewers: wmi

Subscribers: hiraditya, jfb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70247
2020-01-08 06:57:36 -08:00
Kadir Cetinkaya b212eb7159
Revert "[InstCombine] fold zext of masked bit set/clear"
This reverts commit a041c4ec6f.

This looks like a non-trivial change and there has been no code
reviews (at least there were no phabricator revisions attached to the
commit description). It is also causing a regression in one of our
downstream integration tests, we haven't been able to come up with a
minimal reproducer yet.
2020-01-08 11:21:21 +01:00
Philip Reames 312a532dc0 [GVN/FP] Considate logic for reasoning about equality vs equivalance for floats
Factor out common logic into some reasonable commented helper functions. In the process, ensure that the in-block vs cross-block cases are handled the same. They previously weren't.

Differential Revision: https://reviews.llvm.org/D67126
2020-01-07 16:05:04 -08:00
Sanjay Patel f8962571f7 [InstCombine] try to pull 'not' of select into compare operands
not (select ?, (cmp TPred, ?, ?), (cmp FPred, ?, ?) -->
     select ?, (cmp TPred', ?, ?), (cmp FPred', ?, ?)

If both sides of the select are cmps, we can remove an instruction.
The case where only side is a cmp is deferred to a possible
follow-on patch.

We have a more general 'isFreeToInvert' analysis, but I'm not seeing
a way to use that more widely without inducing infinite looping
(opposing transforms).
Here, we flip the compare predicates directly, so we should not have
any danger by creating extra intermediate 'not' ops.

Alive proofs:
https://rise4fun.com/Alive/jKa

Name: both select values are compares - invert predicates
  %tcmp = icmp sle i32 %x, %y
  %fcmp = icmp ugt i32 %z, %w
  %sel = select i1 %cond, i1 %tcmp, i1 %fcmp
  %not = xor i1 %sel, true
=>
  %tcmp_not = icmp sgt i32 %x, %y
  %fcmp_not = icmp ule i32 %z, %w
  %not = select i1 %cond, i1 %tcmp_not, i1 %fcmp_not

Name: false val is compare - invert/not
  %fcmp = icmp ugt i32 %z, %w
  %sel = select i1 %cond, i1 %tcmp, i1 %fcmp
  %not = xor i1 %sel, true
=>
  %tcmp_not = xor i1 %tcmp, -1
  %fcmp_not = icmp ule i32 %z, %w
  %not = select i1 %cond, i1 %tcmp_not, i1 %fcmp_not

Differential Revision: https://reviews.llvm.org/D72007
2020-01-07 10:44:23 -05:00
Simon Pilgrim bd1dc6a3eb Fix "use of uninitialized variable" static analyzer warnings. NFCI. 2020-01-07 10:55:37 +00:00
Fangrui Song 6904cd9486 Add Triple::isX86()
Reviewed By: craig.topper, skan

Differential Revision: https://reviews.llvm.org/D72247
2020-01-06 15:51:02 -08:00
James Henderson d68904f957 [NFC] Fix trivial typos in comments
Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D72143

Patch by Kazuaki Ishizaki.
2020-01-06 10:50:26 +00:00
Brian Gesiak 83a9321f60 [Coroutines] Remove corresponding phi values when apply simplifyTerminatorLeadingToRet
Summary:
In addMustTailToCoroResumes, we set musttail on those resume instructions that are followed by a ret instruction. This is done by simplifyTerminatorLeadingToRet which replace a sequence of branches leading to a ret with a clone of the ret.

However it forgets to remove corresponding PHI values that come from basic block of replaced branch, and may cause jumpthreading pass hangs (https://bugs.llvm.org/show_bug.cgi?id=43720)

This patch fix this issue

Test Plan:
cppcoro library with O3+flto
check-llvm

Reviewers: modocache, GorNishanov, lewissbaker

Reviewed By: modocache

Subscribers: mehdi_amini, EricWF, hiraditya, dexonsmith, jfb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71826

Patch by junparser (JunMa)!
2020-01-05 18:26:30 -05:00
Florian Hahn b8a3c34eee Revert "[SCEV] Move ScalarEvolutionExpander.cpp to Transforms/Utils (NFC)."
This reverts commit 51ef53f3bd, as it
breaks some bots.
2020-01-04 18:44:38 +00:00
Florian Hahn 51ef53f3bd [SCEV] Move ScalarEvolutionExpander.cpp to Transforms/Utils (NFC).
SCEVExpander modifies the underlying function so it is more suitable in
Transforms/Utils, rather than Analysis. This allows using other
transform utils in SCEVExpander.

Reviewers: sanjoy.google, efriedma, reames

Reviewed By: sanjoy.google

Differential Revision: https://reviews.llvm.org/D71537
2020-01-04 18:29:35 +00:00
Florian Hahn 99f74a64a2 [SCEV] Remove unused ScalarEvolutionExpander.h includes (NFC). 2020-01-04 18:29:35 +00:00
Roman Lebedev 6d05bc2e3a
[NFCI][InstCombine] Refactor 'sink negation into select if that folds one hand of select to 0' fold
I would think it's better than having two practically identical folds
next to eachother, but then generalization isn't all that pretty
due to the fact that we need to produce different `sub` each time..

This change is no-functional-changes-intended refactoring.
2020-01-04 17:30:51 +03:00
Roman Lebedev 772ede3d5d
[InstCombine] Sink sub into hands of select if one hand becomes zero. Part 2 (PR44426)
This decreases use count of %Op0, makes one hand of select to be 0,
and possibly exposes further folding potential.

Name: sub %Op0, (select %Cond, %Op0, %FalseVal) -> select %Cond, 0, (sub %Op0, %FalseVal)
  %Op0 = %TrueVal
  %o = select i1 %Cond, i8 %Op0, i8 %FalseVal
  %r = sub i8 %Op0, %o
=>
  %n = sub i8 %Op0, %FalseVal
  %r = select i1 %Cond, i8 0, i8 %n

Name: sub %Op0, (select %Cond, %TrueVal, %Op0) -> select %Cond, (sub %Op0, %TrueVal), 0
  %Op0 = %FalseVal
  %o = select i1 %Cond, i8 %TrueVal, i8 %Op0
  %r = sub i8 %Op0, %o
=>
  %n = sub i8 %Op0, %TrueVal
  %r = select i1 %Cond, i8 %n, i8 0

https://rise4fun.com/Alive/aHRt

https://bugs.llvm.org/show_bug.cgi?id=44426
2020-01-04 17:30:51 +03:00
Roman Lebedev 4d8e47ca18
[InstCombine] Sink sub into hands of select if one hand becomes zero (PR44426)
This decreases use count of %Op1, makes one hand of select to be 0,
and possibly exposes further folding potential.

Name: sub (select %Cond, %Op1, %FalseVal), %Op1 -> select %Cond, 0, (sub %FalseVal, %Op1)
  %Op1 = %TrueVal
  %o = select i1 %Cond, i8 %Op1, i8 %FalseVal
  %r = sub i8 %o, %Op1
=>
  %n = sub i8 %FalseVal, %Op1
  %r = select i1 %Cond, i8 0, i8 %n

Name: sub (select %Cond, %TrueVal, %Op1), %Op1 -> select %Cond, (sub %TrueVal, %Op1), 0
  %Op1 = %FalseVal
  %o = select i1 %Cond, i8 %TrueVal, i8 %Op1
  %r = sub i8 %o, %Op1
=>
  %n = sub i8 %TrueVal, %Op1
  %r = select i1 %Cond, i8 %n, i8 0

https://rise4fun.com/Alive/avL

https://bugs.llvm.org/show_bug.cgi?id=44426
2020-01-04 17:30:51 +03:00
Alexey Lapshin 831bfcea47 [Transforms][GlobalSRA] huge array causes long compilation time and huge memory usage.
Summary:
For artificial cases (huge array, few usages), Global SRA optimization creates
a lot of redundant data. It creates an instance of GlobalVariable for each array
element. For huge array, that means huge compilation time and huge memory usage.
Following example compiles for 10 minutes and requires 40GB of memory.

namespace {
  char LargeBuffer[64 * 1024 * 1024];
}

int main ( void ) {

    LargeBuffer[0] = 0;

    printf("\n ");

    return LargeBuffer[0] == 0;
}

The fix is to avoid Global SRA for large arrays.

Reviewers: craig.topper, rnk, efriedma, fhahn

Reviewed By: rnk

Subscribers: xbolva00, lebedev.ri, lkail, merge_guards_bot, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71993
2020-01-04 16:42:38 +03:00
Roman Lebedev 7973aa05f6
[NFC][InstCombine] '(Op1 & С) - Op1' -> '-(Op1 & ~C)' fold (PR44427)
This decreases use count of Op1, potentially allows
us to further hoist said 'neg' later on,
and results in marginally better X86 codegen.

Name: (Op1 & С) - Op1 -> -(Op1 & ~C)
  %o = and i64 %Op1, C1
  %r = sub i64 %o, %Op1
=>
  %n = and i64 %Op1, ~C1
  %r = sub i64 0, %n

https://rise4fun.com/Alive/rwgA

https://godbolt.org/z/R_RMfM

https://bugs.llvm.org/show_bug.cgi?id=44427
2020-01-03 21:25:48 +03:00
Roman Lebedev cc0216bedb
[NFC][InstCombine] '(X & (- Y)) - X' -> '- (X & (Y - 1))' fold (PR44448)
Name: (X & (- Y)) - X  ->  - (X & (Y - 1))  (PR44448)
  %negy = sub i8 0, %y
  %unbiasedx = and i8 %negy, %x
  %r = sub i8 %unbiasedx, %x
=>
  %ymask = add i8 %y, -1
  %xmasked = and i8 %ymask, %x
  %r = sub i8 0, %xmasked

https://rise4fun.com/Alive/OIpla

This decreases use count of %x, may allow us to
later hoist said negation even further,
and results in marginally nicer X86 codegen.

See
  https://bugs.llvm.org/show_bug.cgi?id=44448
  https://reviews.llvm.org/D71499
2020-01-03 20:27:29 +03:00
Johannes Doerfert d2d2fb19f7 [Attributor][FIX] Allow dead users of rewritten function
If we replace a function with a new one because we rewrite the
signature, dead users may still refer to the old version. With this
patch we reuse the code that deals with dead functions, which the old
versions are, to avoid problems.
2020-01-03 10:43:40 -06:00
Johannes Doerfert 6b9ee2d6cd [Attributor][NFC] Unify the way we delete dead functions 2020-01-03 10:43:40 -06:00
Johannes Doerfert c90681b681 [Attributor][FIX] Don't crash on ptr2int/int2ptr instructions
An integer isn't allowed in getAlignmentForValue so we need to stop at a
ptr2int instruction during exploration.
2020-01-03 10:43:40 -06:00
Johannes Doerfert 412a0101a9 [Attributor][FIX] Do not derive nonnull and dereferenceable w/o access
An inbounds GEP results in poison if the value is not "inbounds", not in
UB. We accidentally derived nonnull and dereferenceable from these
inbounds GEPs even in the absence of accesses that would make the poison
to UB.
2020-01-03 10:43:40 -06:00
Johannes Doerfert a4b3588ba2 [Attributor][FIX] Return CHANGED once a pessimistic fixpoint is reached. 2020-01-03 10:43:40 -06:00
Ankit 369a919514 Fix for a dangling point bug in DeadStoreElimination pass
The patch makes sure that the LastThrowing pointer does not point to any instruction deleted by call to DeleteDeadInstruction.

While iterating through the instructions the pass maintains a pointer to the lastThrowing Instruction. A call to deleteDeadInstruction deletes a dead store and other instructions feeding the original dead instruction which also become dead. The instruction pointed by the lastThrowing pointer could also be deleted by the call to DeleteDeadInstruction and thus it becomes a dangling pointer. Because of this, we see an error in the next iteration.

In the patch, we maintain a list of throwing instructions encountered previously and use the last non deleted throwing instruction from the container.

Reviewers: fhahn, bcahoon, efriedma

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D65326
2020-01-03 14:28:44 +00:00
Sanjay Patel 1640582743 [InstCombine] replace undef elements in vector constant when doing icmp folds (PR44383)
As shown in P44383:
https://bugs.llvm.org/show_bug.cgi?id=44383
...we can't safely propagate a vector constant through this icmp fold
if that vector constant contains undefined elements.

We know that each defined element of the constant is safe though, so
find the first of those and replicate it into the formerly undef lanes.

Differential Revision: https://reviews.llvm.org/D72101
2020-01-03 09:16:57 -05:00
Hideto Ueno 5fc02dc0a7 Revert "[Attributor] AAValueConstantRange: Value range analysis using constant range"
This reverts commit e996303431.
2020-01-03 11:03:56 +09:00
Sanjay Patel 88fc5fdef6 [InstCombine] remove uses before deleting instructions (PR43723)
This is a less ambitious alternative to previous attempts to fix
this bug with:
rG56b2aee1875a
rGef02831f0a4e
rG56b2aee1875a
...because those all failed bot testing with use-after-free or
other problems.

The original crashing/assert problem is still showing up on
various fuzzers, so I've added a new minimal test based on
another one of those failures.

Instead of trying to manage and coordinate the logic in
isAllocSiteRemovable() with the deletion loops, just loosen
the existing code that handles casts and GEP by replacing
with undef to allow other opcodes. That means that no
instructions with uses should assert on deletion, and there
are hopefully no non-obvious sanitizer bugs induced.
2020-01-02 09:47:36 -05:00
Brian Gesiak 9ce0ff2eef [Coroutines] const-ify internal helpers (NFC)
Several helpers internal to llvm/Transforms/Coroutines do not use
'const' for parameters that are not modified. Add const where possible.
2020-01-01 21:57:49 -05:00
Brian Gesiak 2fcf7691df [Coroutines] Rename "legacy" passes (NFC)
A series of patches beginning with https://reviews.llvm.org/D71898
propose to add an implementation of the coroutine passes to the new pass
manager. As part of these changes, the coroutine passes that implement
the legacy pass manager interface are renamed, to `<PassName>Legacy`.
This mirrors similar changes that have been made to many other passes in
LLVM as they've been transitioned to support both old and new pass
managers.

This commit splits out the renaming portion of that patch and commits it
in advance as an NFC (no functional change intended) commit. It renames:

* `CoroEarly` => `CoroEarlyLegacy`
* `CoroSplit` => `CoroSplitLegacy`
* `CoroElide` => `CoroElideLegacy`
* `CoroCleanup` => `CoroCleanupLegacy`
2020-01-01 21:41:16 -05:00
Nikita Popov 8dd9a13619 [InstCombine] Preserve inbounds when merging with zero-index GEP (PR44423)
This addresses https://bugs.llvm.org/show_bug.cgi?id=44423.
If one of the GEPs is inbounds and the other is zero-index,
we can also preserve inbounds.

Differential Revision: https://reviews.llvm.org/D72060
2020-01-01 23:04:28 +01:00
Nikita Popov 6ba5f8c4ac [InstCombine] Fix incorrect inbounds on GEP of GEP (PR44425)
This fixes https://bugs.llvm.org/show_bug.cgi?id=44425. We need to
drop inbounds if one of the GEPs is not inbounds. This was already
done when creating a new GEP, but not when modifying in place.

Differential Revision: https://reviews.llvm.org/D72059
2020-01-01 22:10:55 +01:00
Hideto Ueno e996303431 [Attributor] AAValueConstantRange: Value range analysis using constant range
This patch introduces `AAValueConstantRange`, which answers a possible range for integer value in a specific program point.
One of the motivations is propagating existing `range` metadata. (I think we need to change the situation that `range` metadata cannot be put to Argument).

The state is a tuple of `ConstantRange` and it is initialized to (known, assumed) = ([-∞, +∞], empty).

Currently, AAValueConstantRange is created when AAValueSimplify cannot
simplify the value.

Supported
 - BinaryOperator(add, sub, ...)
 - CmpInst(icmp eq, ...)
 - !range metadata

`AAValueConstantRange` is not intended to extend to polyhedral range value analysis.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D71620
2020-01-01 15:35:56 +09:00
Craig Topper 374e0299cf [X86][InstCombine] Add constant folding and simplification support for pdep and pext
The instructions use a mask to either pack disjoint bits together(pext) or spread bits to disjoint locations(pdep). If the mask is all 0s then no bits are extracted or deposited. If the mask is all ones, then the source value is written to the result since no compression or expansion happens. Otherwise if both the source and mask are constant we can walk the bits in the source/mask and calculate the result.

There other crazier things we could do like computeKnownBits or turning pext into shift/and if only a single contiguous range of bits is extracted.

Fixes PR44389

Differential Revision: https://reviews.llvm.org/D71952
2019-12-31 15:06:47 -08:00
Sanjay Patel a041c4ec6f [InstCombine] fold zext of masked bit set/clear
This does not solve PR17101, but it is one of the
underlying diffs noted here:
https://bugs.llvm.org/show_bug.cgi?id=17101#c8

We could ease the one-use checks for the 'clear'
(no 'not' op) half of the transform, but I do not
know if that asymmetry would make things better
or worse.

Proofs:
https://rise4fun.com/Alive/uVB

  Name: masked bit set
  %sh1 = shl i32 1, %y
  %and = and i32 %sh1, %x
  %cmp = icmp ne i32 %and, 0
  %r = zext i1 %cmp to i32
  =>
  %s = lshr i32 %x, %y
  %r = and i32 %s, 1

  Name: masked bit clear
  %sh1 = shl i32 1, %y
  %and = and i32 %sh1, %x
  %cmp = icmp eq i32 %and, 0
  %r = zext i1 %cmp to i32
  =>
  %xn = xor i32 %x, -1
  %s = lshr i32 %xn, %y
  %r = and i32 %s, 1
2019-12-31 12:35:10 -05:00
Nikita Popov 7adb5c2aca Revert "[InstCombine] Fix infinite loop due to bitcast <-> phi transforms"
This reverts commit 27a0795943.

Seems to break test-suite.
2019-12-31 17:42:57 +01:00
Nikita Popov 27a0795943 [InstCombine] Fix infinite loop due to bitcast <-> phi transforms
Fix for https://bugs.llvm.org/show_bug.cgi?id=44245.

The optimizeBitCastFromPhi() and FoldPHIArgOpIntoPHI() end up
fighting against each other, because optimizeBitCastFromPhi()
assumes that bitcasts of loads will get folded. This doesn't happen
here, because a dangling phi node prevents the one-use fold in
https://github.com/llvm/llvm-project/blob/master/llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp#L620-L628 from triggering.

This patch fixes the issue by adding manually removing the old phis.

Differential Revision: https://reviews.llvm.org/D71164
2019-12-31 16:17:14 +01:00
Connor Abbott fb114694e9 [InstCombine] Don't rewrite phi-of-bitcast when the phi has other users
Judging by the existing comments, this was the intention, but the
transform never actually checked if the existing phi's would be removed.
See https://bugs.llvm.org/show_bug.cgi?id=44242 for an example where
this causes much worse code generation on AMDGPU.

Differential Revision: https://reviews.llvm.org/D71209
2019-12-31 12:15:02 +01:00
Ilya Biryukov 4f82af81a0 [Attributor] Suppress unused warnings when assertions are disabled. NFC 2019-12-31 10:21:52 +01:00
Johannes Doerfert 751336340d [Attributor] Function signature rewrite infrastructure
As part of the Attributor manifest we want to change the signature of
functions. This patch introduces a fairly generic interface to do so.
As a first, very simple, use case, we remove unused arguments. A second
use case, pointer privatization, will be committed with this patch as
well.

A lot of the code and ideas are taken from argument promotion and we
run all argument promotion tests through this framework as well.

Reviewed By: uenoku

Differential Revision: https://reviews.llvm.org/D68765
2019-12-31 02:31:33 -06:00
Johannes Doerfert dada8132af [Attributor] Propagate known align from arguments to call sites arguments
Since the information is known we can simply use it at the call site.
This is especially useful for callbacks but also helps regular calls.

The test changes are mechanical.
2019-12-31 01:33:22 -06:00
Johannes Doerfert b1b441d22d [Attributor] Use abstract call sites to determine associated arguments
This is the second step after D67871 to make use of abstract call sites.
In this patch the argument we associate with a abstract call site
argument can be the one in the callback callee instead of the one in the
callback broker.

Caveat: We cannot allow no-alias arguments for problematic callbacks:
As described in [1], adding no-alias (or restrict) to arguments could
break synchronization as the synchronization effect, e.g., a barrier,
does not "alias" with the pointer anymore. This disables no-alias
annotation for potentially problematic arguments until we implement the
fix described in [1].

Reviewed By: uenoku

Differential Revision: https://reviews.llvm.org/D68008

[1] Compiler Optimizations for OpenMP, J. Doerfert and H. Finkel,
    International Workshop on OpenMP 2018,
    http://compilers.cs.uni-saarland.de/people/doerfert/par_opt18.pdf
2019-12-31 01:33:22 -06:00
Johannes Doerfert 2888019871 [Attributor] Annotate the memory behavior of call site arguments
Especially for callbacks, annotating the call site arguments is
important. Doing so exposed a too strong dependence of AAMemoryBehavior
on AANoCapture since we handle the case of potentially captured pointers
explicitly.

The changes to the tests are all mechanical.
2019-12-31 01:33:21 -06:00
Sanjay Patel 987eb8e26c [InstCombine] propagate sign argument through nested copysigns
This is another optimization suggested in PR44153:
https://bugs.llvm.org/show_bug.cgi?id=44153
2019-12-30 11:06:02 -05:00
Evgeniy Brevnov 948e745270 [LV][NFC] Keep dominator tree up to date during vectorization. 2019-12-30 18:38:41 +07:00
Evgeniy Brevnov 1b6286b945 [LV][NFC] Some refactoring and renaming to facilitate next change. 2019-12-30 18:38:41 +07:00
Hideto Ueno 34fe8d0451 [Attributor] Use `changeUseAfterManifest` in AAValueSimplify manifest
Summary: This patch makes `AAValueSimplify` use `changeUsesAfterManifest` in `manifest`. This will invoke simple folding after the manifest.

Reviewers: jdoerfert, sstefan1

Reviewed By: jdoerfert

Subscribers: hiraditya, arphaman, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71972
2019-12-30 17:08:48 +09:00
Hideto Ueno ef4febd85b [Attributor] AAUndefinedBehavior: Check for branches on undef value.
A branch is considered UB if it depends on an undefined / uninitialized value.
At this point this handles simple UB branches in the form: `br i1 undef, ...`
We query `AAValueSimplify` to get a value for the branch condition, so the branch
can be more complicated than just: `br i1 undef, ...`.

Patch By: Stefanos Baziotis (@baziotis)

Reviewers: jdoerfert, sstefan1, uenoku

Reviewed By: uenoku

Differential Revision: https://reviews.llvm.org/D71799
2019-12-29 17:43:00 +09:00
Gil Rapaport d62bf16131 [LV] Use getMask() when printing recipe [NFCI]
Use dedicated API for getting the mask instead of duplicating it.

Differential Revision: https://reviews.llvm.org/D71964
2019-12-29 08:50:40 +02:00
Florian Hahn dc2c9b0fcf [Matrix] Propagate and use shape info for binary operators.
This patch extends the current shape propagation and shape aware
lowering to also support binary operators. Those operators are uniform
with respect to their shape (shape of the input operands is the same as
the shape of their result).

Reviewers: anemet, Gerolf, reames, hfinkel, andrew.w.kaylor

Reviewed By: anemet

Differential Revision: https://reviews.llvm.org/D70898
2019-12-27 15:50:47 +00:00
Fangrui Song 7a7334663c Delete llvm.{sig,}{setjmp,longjmp} remnant after r136821
Intrinsic has incorrect argument type!
  i32 (i32*)* @llvm.setjmp

*wipes tear*
2019-12-27 00:00:14 -08:00
Hideto Ueno cb5eb13eaf [Attributor] Add helper to change an instruction to `unreachable` inst
Summary: Calling `changeToUnreachable` in `manifest` from different places might cause really unpredictable problems. As other deleting functions are doing, we need to change these instructions after all `manifest`.

Reviewers: jdoerfert, sstefan1

Reviewed By: jdoerfert

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71910
2019-12-27 02:39:37 +09:00
Whitney Tsang d1f41b2ca9 [NFC][LoopFusion] Fix printing of the guard branch.
Reviewer: kbarton, jdoerfert
Reviewed By: jdoerfert
Subscribers: hiraditya, llvm-commits
Tag: LLVM
Differential Revision: https://reviews.llvm.org/D71878
2019-12-26 02:45:29 +00:00
Hideto Ueno 1d5d074aef [Attributor] Reach optimistic fixpoint in AAValueSimplify when the value is constant or undef
Summary:
As discussed in D71799, we have found that it is more useful to reach an optimistic fixpoint in AAValueSimpify when the value is constant or undef.

Reviewers: jdoerfert, sstefan1

Reviewed By: jdoerfert

Subscribers: baziotis, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71852
2019-12-25 14:18:34 +09:00
Johannes Doerfert 5732f56bbd [Attributor] UB Attribute now handles all instructions that access memory through a pointer
Summary:
Follow-up on: https://reviews.llvm.org/D71435
We basically use `checkForAllInstructions` to loop through all the instructions in a function that access memory through a pointer: load, store, atomicrmw, atomiccmpxchg
Note that we can now use the `getPointerOperand()` that gets us the pointer operand for an instruction that belongs to the aforementioned set.

Question: This function returns `nullptr` if the instruction is `volatile`. Why?
Guess:  Because if it is volatile, we don't want to do any transformation to it.

Another subtle point is that I had to add AtomicRMW, AtomicCmpXchg to `initializeInformationCache()`. Following `checkAllInstructions()` path, that
seemed the most reasonable place to add it and correct the fact that these instructions were ignored (they were not in `OpcodeInstMap` etc.). Is that ok?

Reviewers: jdoerfert, sstefan1

Reviewed By: jdoerfert, sstefan1

Subscribers: hiraditya, jfb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71787
2019-12-24 19:25:08 -06:00
Johannes Doerfert 58f324a468 [Attributor] Function level undefined behavior attribute
_Eventually_, this attribute will be assigned to a function if it
contains undefined behavior. As a first small step, I tried to make it
loop through the load instructions in a function (eventually, the plan
is to check if a load instructions causes undefined behavior, because
e.g. dereferences a null pointer - Also eventually, this won't happen in
initialize() but in updateImpl()).

Patch By: Stefanos Baziotis (@baziotis)

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D71435
2019-12-24 19:23:08 -06:00
Florian Hahn 8d6f59b78a [Matrix] Use fmuladd for matrix.multiply if allowed.
If the matrix.multiply calls have the contract fast math flag, we can
use fmuladd. This als adds a command line option to force fmuladd
generation. We can retire this option once there is a clang-level
option.

Reviewers: anemet, Gerolf, hfinkel, andrew.w.kaylor

Reviewed By: anemet

Differential Revision: https://reviews.llvm.org/D70951
2019-12-23 14:49:14 +01:00
Florian Hahn 109e4e3851 [Matrix] Add forward shape propagation and first shape aware lowerings.
This patch adds infrastructure for forward shape propagation to
LowerMatrixIntrinsics. It also updates the pass to make use of
the shape information to break up larger vector operations and to
eliminate unnecessary conversion operations between columnwise matrixes
and flattened vectors: if shape information is available for an
instruction, lower the operation to a set of instructions operating on
columns. For example, a store of a matrix is broken down into separate
stores for each column. For users that do not have shape
information (e.g. because they do not yet support shape information
aware lowering), we pack the result columns into a flat vector and
update those users.

It also adds shape aware lowering for the first non-intrinsic
instruction: vector stores.

Example:

For
  %c  = call <4 x double> @llvm.matrix.transpose(<4 x double> %a, i32 2, i32 2)
  store <4 x double> %c, <4 x double>* %Ptr

We generate the code below without shape propagation. Note %9 which
combines the columns of the transposed matrix into a flat vector.

  %split = shufflevector <4 x double> %a, <4 x double> undef, <2 x i32> <i32 0, i32 1>
  %split1 = shufflevector <4 x double> %a, <4 x double> undef, <2 x i32> <i32 2, i32 3>
  %1 = extractelement <2 x double> %split, i64 0
  %2 = insertelement <2 x double> undef, double %1, i64 0
  %3 = extractelement <2 x double> %split1, i64 0
  %4 = insertelement <2 x double> %2, double %3, i64 1
  %5 = extractelement <2 x double> %split, i64 1
  %6 = insertelement <2 x double> undef, double %5, i64 0
  %7 = extractelement <2 x double> %split1, i64 1
  %8 = insertelement <2 x double> %6, double %7, i64 1
  %9 = shufflevector <2 x double> %4, <2 x double> %8, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
  store <4 x double> %9, <4 x double>* %Ptr

With this patch, we propagate the 2x2 shape information from the
transpose to the store and we generate the code below. Note that we
store the columns directly and do not need an extra shuffle.

  %9 = bitcast <4 x double>* %Ptr to double*
  %10 = bitcast double* %9 to <2 x double>*
  store <2 x double> %4, <2 x double>* %10, align 8
  %11 = getelementptr double, double* %9, i32 2
  %12 = bitcast double* %11 to <2 x double>*
  store <2 x double> %8, <2 x double>* %12, align 8

Reviewers: anemet, Gerolf, reames, hfinkel, andrew.w.kaylor

Reviewed By: anemet

Differential Revision: https://reviews.llvm.org/D70897
2019-12-23 13:51:56 +01:00
Dinar Temirbulatov a755ccefe6 [SLP] Replace NeedToGather variable with enum. 2019-12-23 08:21:53 +01:00
Mark de Wever 098d3347e7 [Transforms] Fixes -Wrange-loop-analysis warnings
This avoids new warnings due to D68912 adds -Wrange-loop-analysis to -Wall.

Differential Revision: https://reviews.llvm.org/D71810
2019-12-22 19:20:17 +01:00
Sanjay Patel 9cdcd81d3f [InstCombine] enhance fold for copysign with known sign arg
This is another optimization suggested in PRPR44153:
https://bugs.llvm.org/show_bug.cgi?id=44153
2019-12-22 10:07:01 -05:00
Sanjay Patel 79c7fa31f3 [InstCombine] check alloc size in bitcast of geps fold (PR44321)
We missed a constraint in D44833
when folding a bitcast into a GEP with vector/array types.
If the alloc sizes specified by the datalayout don't match,
this could miscompile as shown in:
https://bugs.llvm.org/show_bug.cgi?id=44321

Differential Revision: https://reviews.llvm.org/D71771
2019-12-21 10:31:21 -05:00
Sanjay Patel 19f9f374d9 [SimplifyLibCalls] require fast-math-flags for pow(X, -0.5) transforms
As discussed in PR44330:
https://bugs.llvm.org/show_bug.cgi?id=44330
...the transform from pow(X, -0.5) libcall/intrinsic to
reciprocal square root can result in small deviations from
the expected result due to differences in the pow()
implementation and/or the extra rounding step from the division.

This patch proposes to allow that difference with either the
'approximate functions' or 'reassociate' FMF:
http://llvm.org/docs/LangRef.html#fast-math-flags

In practice, this likely means that the code is compiled with
all of 'fast' (-ffast-math), but I have preserved the existing
specializations for -0.0/-INF that enable generating safe code
if those special values are allowed simultaneously with
allowing approximation/reassociation.

The question about whether a similar restriction is needed for
the non-reciprocal case -- pow(X, 0.5) -- is deferred. That
transform is allowed without FMF currently, and this patch does
not change that behavior.

Differential Revision: https://reviews.llvm.org/D71706
2019-12-21 10:00:53 -05:00
Jakub Kuderski c431c407eb [InstCombine] Improve infinite loop detection
Summary:
This patch limits the default number of iterations performed by InstCombine. It also exposes a new option that allows to specify how many iterations is considered getting stuck in an infinite loop.

Based on experiments performed on real-world C++ programs, InstCombine seems to perform at most ~8-20 iterations, so treating 1000 iterations as an infinite loop seems like a safe choice. See D71145 for details.

The two limits can be specified via command line options.

Reviewers: spatel, lebedev.ri, nikic, xbolva00, grosser

Reviewed By: spatel

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71673
2019-12-20 16:15:04 -05:00
Ayal Zaks e498be5738 [LV] Strip wrap flags from vectorized reductions
A sequence of additions or multiplications that is known not to wrap, may wrap
if it's order is changed (i.e., reassociated). Therefore when vectorizing
integer sum or product reductions, their no-wrap flags need to be removed.

Fixes PR43828

Patch by Denis Antrushin

Differential Revision: https://reviews.llvm.org/D69563
2019-12-20 14:48:53 +02:00
Vedant Kumar caaacb8399 HotColdSplitting: Do not outline within noreturn functions
A function marked `noreturn` may contain unreachable terminators: these
should not be considered cold, as the function may be a trampoline.

rdar://58068594
2019-12-19 14:06:24 -08:00
Bjorn Pettersson 89e3bb4502 [ConstantHoisting] Ignore unreachable bb:s when collecting candidates
Summary:
Ignore looking at blocks that are unreachable from entry when
collecting candidates for hosting.

Normally the consthoist pass is executed in the llc pipeline,
just after unreachableblockelim. So it is abnormal to have code
that is unreachable from the entry block. But when running the
pass as part of opt, for example as part of fuzzy testing, we
might trigger various kinds of asserts when collecting candidates
if we include unreachable blocks in that analysis.

It seems like a waste of time to hoist constants in unreachble
blocks, so the solution is to simply ignore such blocks when
collecting the hoisting candidates.

The two added test cases used to end up in two different asserts,
and the intention with the checks is just to verify that we no
longer fail.

Fixes: PR43903

Reviewers: spatel

Reviewed By: spatel

Subscribers: hiraditya, uabelho, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71678
2019-12-19 15:07:55 +01:00
David Green a59cc5e128 [InstCombine] Canonicalize select immediates
In certain situations after inlining and simplification we end up with
code that is _almost_ a min/max pattern, but contains constants that
have been demand-bit optimised to the wrong values, ending up with code
like:
  %1 = icmp slt i32 %shr, -128
  %2 = select i1 %1, i32 128, i32 %shr
  %.inv = icmp sgt i32 %shr, 127
  %spec.select.i = select i1 %.inv, i32 127, i32 %2
  %conv7 = trunc i32 %spec.select.i to i8
This should be turned into a min/max pattern, but the -128 in the first
select was instead transformed into 128, as only the bottom byte was
ever demanded.

To fix this, I've put in further canonicalisation for the immediates of
selects, preferring to use the same value as the icmp if available.

Differential Revision: https://reviews.llvm.org/D71516
2019-12-19 12:36:46 +00:00
Piotr Sobczak 40b5a0f7c8 Revert "[InstCombine][AMDGPU] Trim more components of *buffer_load"
Revert D70315, as it breaks gfx8 for some reason.

This reverts commit 65f94b3380.
2019-12-18 22:04:44 +01:00
Kit Barton 3db1cf7a1e [LoopFusion] Use the LoopInfo::isRotatedForm method (NFC).
Loop fusion previously had a method to check whether a loop was in rotated form. This method has
been moved into the LoopInfo class. This patch removes the old isRotated method from loop fusion,
in favour of the new one in LoopInfo.
2019-12-18 15:04:25 -05:00
Jakub Kuderski 3d29c41ad5 [InstCombine] Insert instructions before adding them to worklist
Summary:
This patch adds instructions to the InstCombine worklist after they are properly inserted. This way we don't get `<badref>`s printed when logging added instructions.
It also adds a check in `Worklist::Add` that ensures that all added instructions have parents.

Simple test case that illustrates the difference when run with `--debug-only=instcombine`:

```
define i32 @test35(i32 %a, i32 %b) {
  %1 = or i32 %a, 1135
  %2 = or i32 %1, %b
  ret i32 %2
}
```

Before this patch:
```
INSTCOMBINE ITERATION #1 on test35
IC: ADDING: 3 instrs to worklist
IC: Visiting:   %1 = or i32 %a, 1135
IC: Visiting:   %2 = or i32 %1, %b
IC: ADD:   %2 = or i32 %a, %b
IC: Old =   %3 = or i32 %1, %b
    New =   <badref> = or i32 %2, 1135
IC: ADD:   <badref> = or i32 %2, 1135
...
```

With this patch:
```
INSTCOMBINE ITERATION #1 on test35
IC: ADDING: 3 instrs to worklist
IC: Visiting:   %1 = or i32 %a, 1135
IC: Visiting:   %2 = or i32 %1, %b
IC: ADD:   %2 = or i32 %a, %b
IC: Old =   %3 = or i32 %1, %b
    New =   <badref> = or i32 %2, 1135
IC: ADD:   %3 = or i32 %2, 1135
...
```

Reviewers: fhahn, davide, spatel, foad, grosser, nikic

Reviewed By: nikic

Subscribers: nikic, lebedev.ri, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71093
2019-12-18 14:55:41 -05:00
Jakub Kuderski 406b6019cd [InstCombine] Allow to limit the max number of iterations
Summary:
This patch teaches InstCombine to accept a new parameter: maximum number of iterations over functions.

InstCombine tries to simplify instructions by iterating over the whole function until the function stops changing. As a consequence, the last iteration before reaching a fixpoint visits all instructions in the worklist and never performs any rewrites.

Bounding the number of iterations can have 2 benefits:
* In case the users of the pass can make a good guess about the number of required iterations, we can save the time normally spent on the last iteration that doesn't change anything.
* When the wants to use InstCombine as a cleanup pass, it may be enough to run just a few iterations and stop even before reaching a fixpoint. This can be also useful for implementing a lightweight pass pipeline (think `-O1`).

This patch does not change the behavior of opt or Clang -- limiting the number of iterations is entirely opt-in.

Reviewers: fhahn, davide, spatel, foad, nlopes, grosser, lebedev.ri, nikic, xbolva00

Reviewed By: spatel

Subscribers: craig.topper, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71145
2019-12-18 13:48:54 -05:00
stozer 89d19d60ad Reapply: [DebugInfo] Correctly handle salvaged casts and split fragments at ISel
This reverts commit 1f3dd83cc1, reapplying
commit bb1b0bc4e5.

The original commit failed on some builds seemingly due to the use of a
bracketed constructor with an std::array, i.e. `std::array<> arr({...})`.
2019-12-18 16:26:42 +00:00
Whitney Tsang 9883d7edc6 [LoopUtils] Updated deleteDeadLoop() to handle loop nest.
Reviewer: kariddi, sanjoy, reames, Meinersbur, bmahjour, etiotto,
kbarton
Reviewed By: Meinersbur
Subscribers: mgorny, hiraditya, llvm-commits
Tag: LLVM
Differential Revision: https://reviews.llvm.org/D70939
2019-12-18 15:59:45 +00:00
stozer 1f3dd83cc1 Revert "[DebugInfo] Correctly handle salvaged casts and split fragments at ISel"
Reverted due to build failure on windows bots.

This reverts commit bb1b0bc4e5.
2019-12-18 11:46:10 +00:00
stozer bb1b0bc4e5 [DebugInfo] Correctly handle salvaged casts and split fragments at ISel
Previously, LLVM had no functional way of performing casts inside of a
DIExpression(), which made salvaging cast instructions other than Noop
casts impossible. This patch enables the salvaging of casts by using the
DW_OP_LLVM_convert operator for SExt and Trunc instructions.

There is another issue which is exposed by this fix, in which fragment
DIExpressions (which are preserved more readily by this patch) for
values that must be split across registers in ISel trigger an assertion,
as the 'split' fragments extend beyond the bounds of the fragment
DIExpression causing an error. This patch also fixes this issue by
checking the fragment status of DIExpressions which are to be split, and
dropping fragments that are invalid.
2019-12-18 11:09:18 +00:00
Anna Welker 7cd1cfdd6b [NFC][TTI] Add Alignment for isLegalMasked[Gather/Scatter]
Add an extra parameter so alignment can be taken under
consideration in gather/scatter legalization.

Differential Revision: https://reviews.llvm.org/D71610
2019-12-18 09:14:39 +00:00
Whitney Tsang 36bdc3dc35 [LoopFusion] Move instructions from FC0.Latch to FC1.Latch.
Summary:This PR move instructions from FC0.Latch bottom up to the
beginning of FC1.Latch as long as they are proven safe.

To illustrate why this is beneficial, let's consider the following
example:
Before Fusion:
header1:
  br header2
header2:
  br header2, latch1
latch1:
  br header1, preheader3
preheader3:
  br header3
header3:
  br header4
header4:
  br header4, latch3
latch3:
  br header3, exit3

After Fusion (before this PR):
header1:
  br header2
header2:
  br header2, latch1
latch1:
  br header3
header3:
  br header4
header4:
  br header4, latch3
latch3:
  br header1, exit3

Note that preheader3 is removed during fusion before this PR.
Notice that we cannot fuse loop2 with loop4 as there exists block latch1
in between.
This PR move instructions from latch1 to beginning of latch3, and remove
block latch1. LoopFusion is now able to fuse loop nest recursively.

After Fusion (after this PR):
header1:
  br header2
header2:
  br header3
header3:
  br header4
header4:
  br header2, latch3
latch3:
  br header1, exit3

Reviewer: kbarton, jdoerfert, Meinersbur, dmgreen, fhahn, hfinkel,
bmahjour, etiotto
Reviewed By: kbarton, Meinersbur
Subscribers: hiraditya, llvm-commits
Tag: LLVM
Differential Revision: https://reviews.llvm.org/D71165
2019-12-17 22:10:23 +00:00
Stefan Stipanovic fff8ec9813 [Attributor] H2S fix.
Summary: Fixing issues that were noticed in D71521

Reviewers: jdoerfert, lebedev.ri, uenoku

Subscribers:

Differential Revision: https://reviews.llvm.org/D71564
2019-12-17 20:41:09 +01:00
Piotr Sobczak 65f94b3380 [InstCombine][AMDGPU] Trim more components of *buffer_load
Summary:
Add trimming of unused components of s_buffer_load.

Extend trimming of *buffer_load to also include
unused components at the beginning of vectors and update offset.

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70315
2019-12-17 17:50:07 +01:00
Guillaume Chatelet 531c1161b9 Resubmit "[Alignment][NFC] Deprecate CreateMemCpy/CreateMemMove"
Summary:
This is a resubmit of D71473.

This patch introduces a set of functions to enable deprecation of IRBuilder functions without breaking out of tree clients.
Functions will be deprecated one by one and as in tree code is cleaned up.

This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: aaron.ballman, courbet

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71547
2019-12-17 10:07:46 +01:00
Whitney Tsang ec4749e3b8 Revert "[LoopUtils] Updated deleteDeadLoop() to handle loop nest."
This reverts commit cd09fee3d6.
This reverts commit c066ff11d8.
2019-12-17 03:51:41 +00:00
Johannes Doerfert 0bc3336ac1 [Attributor][NFC] Clang format the Attributor
The Attributor is always kept formatted so diffs are cleaner.

Sometime we get out of sync for various reasons so we need to format the
file once in a while.
2019-12-16 21:03:18 -06:00
Whitney Tsang c066ff11d8 [LoopUtils] Updated deleteDeadLoop() to handle loop nest.
Reviewer: kariddi, sanjoy, reames, Meinersbur, bmahjour, etiotto,
kbarton
Reviewed By: Meinersbur
Subscribers: mgorny, hiraditya, llvm-commits
Tag: LLVM
Differential Revision: https://reviews.llvm.org/D70939
2019-12-17 01:06:14 +00:00
Kit Barton ff07fc66d9 [LoopFusion] Restrict loop fusion to rotated loops.
Summary:
This patch restricts loop fusion to only consider rotated loops as valid candidates.
This simplifies the analysis and transformation and aligns with other loop optimizations.

Reviewers: jdoerfert, Meinersbur, dmgreen, etiotto, Whitney, fhahn, hfinkel

Reviewed By: Meinersbur

Subscribers: ormris, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71025
2019-12-16 15:17:29 -05:00
Craig Topper 02f644c59a [InstCombine] Teach removeBitcastsFromLoadStoreOnMinMax not to change the size of a store.
We can change the type as long as we don't change the size.

Fixes PR44306

Differential Revision: https://reviews.llvm.org/D71532
2019-12-16 12:12:54 -08:00
Guillaume Chatelet 4658da10e4 Revert "[Alignment][NFC] Deprecate CreateMemCpy/CreateMemMove"
This reverts commit 181ab91efc.
2019-12-16 15:19:49 +01:00
Guillaume Chatelet 181ab91efc [Alignment][NFC] Deprecate CreateMemCpy/CreateMemMove
Summary:
This patch introduces a set of functions to enable deprecation of IRBuilder functions without breaking out of tree clients.
Functions will be deprecated one by one and as in tree code is cleaned up.

This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: arsenm, jvesely, nhaehnle, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71473
2019-12-16 13:35:55 +01:00
Bjorn Pettersson e5f07080b8 [BasicBlockUtils] Fix dbg.value elimination problem in MergeBlockIntoPredecessor
Summary:
In commit d60f34c20a (llvm-svn 317128,
PR35113) MergeBlockIntoPredecessor was changed into
discarding some dbg.value intrinsics referring to
PHI values, post-splice due to loop rotation.

That elimination of dbg.value intrinsics did not
consider which dbg.value to keep depending on the
context (e.g. if the variable is changing its value
several times inside the basic block).

In the past that hasn't been such a big problem since
CodeGenPrepare::placeDbgValues has moved the dbg.value
to be next to the PHI node anyway. But after commit
00e238896c CodeGenPrepare isn't doing that
any longer, so we need to be more careful when avoiding
duplicate dbg.value intrinsics in MergeBlockIntoPredecessor.

This patch replaces the code that tried to avoid duplicate
dbg.values by using the RemoveRedundantDbgInstrs helper.

Reviewers: aprantl, jmorse, vsk

Reviewed By: aprantl, vsk

Subscribers: jholewinski, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71480
2019-12-16 11:41:21 +01:00
Bjorn Pettersson 1c49553c19 [BasicBlockUtils] Add utility to remove redundant dbg.value instrs
Summary:
Add a RemoveRedundantDbgInstrs to BasicBlockUtils with the
goal to remove redundant dbg intrinsics from a basic block.

This can be useful after various transforms, as it might
be simpler to do a filtering of dbg intrinsics after the
transform than during the transform.
One primary use case would be to replace a too aggressive
removal done by MergeBlockIntoPredecessor, seen at loop
rotate (not done in this patch).

The elimination algorithm currently focuses on dbg.value
intrinsics and is doing two iterations over the BB.

First we iterate backward starting at the last instruction
in the BB. Whenever a consecutive sequence of dbg.value
instructions are found we keep the last dbg.value for
each variable found (variable fragments are identified
using the  {DILocalVariable, FragmentInfo, inlinedAt}
triple as given by the DebugVariable helper class).

Next we iterate forward starting at the first instruction
in the BB. Whenever we find a dbg.value describing a
DebugVariable (identified by {DILocalVariable, inlinedAt})
we save the {DIValue, DIExpression} that describes that
variables value. But if the variable already was mapped
to the same {DIValue, DIExpression} pair we instead drop
the second dbg.value.

To ease the process of making lit tests for this utility a
new pass is introduced called RedundantDbgInstElimination.
It can be executed by opt using -redundant-dbg-inst-elim.

Reviewers: aprantl, jmorse, vsk

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71478
2019-12-16 11:41:21 +01:00
Johannes Doerfert 139c9ef45a [Attributor] Annotate call sites of declarations with a callback
Even if a declaration is called, if there is a callback we might need
the information during CG-SCC traversal (D70767).
2019-12-13 23:51:59 -06:00
Johannes Doerfert 3d347e2835 [Attributor][NFC] Simplify debug printing for abstract attributes
This also fixes a type in the debug printing of AANoAlias.
2019-12-13 23:51:59 -06:00
Johannes Doerfert 5d34602da4 [Attributor] Only replace instruction operands
This was part of D70767. When we replace the value of (call/invoke)
instructions we do not want to disturb the old call graph so we will
only replace instruction uses until we get rid of the old PM.

Accepted as part of D70767.
2019-12-13 22:16:38 -06:00
Francesco Petrogalli 19f73f0d1b Revert "[VectorUtils] Introduce the Vector Function Database (VFDatabase)."
This reverts commit 0be81968a2.

The VFDatabase needs some rework to be able to handle vectorization
and subsequent scalarization of intrinsics in out-of-tree versions of
the compiler. For more details, see the discussion in
https://reviews.llvm.org/D67572.
2019-12-13 19:42:04 +00:00
Fangrui Song 193da743db [profile] Fix a crash when -fprofile-remapping-file= triggers an error
Reviewed By: wmi

Differential Revision: https://reviews.llvm.org/D71485
2019-12-13 11:38:20 -08:00
Hiroshi Yamauchi ed50e6060b [PGO][PGSO] Enable size optimizations in code gen / target passes for cold code.
Summary: Split off of D67120.

Reviewers: davidxl

Subscribers: hiraditya, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71288
2019-12-13 11:01:19 -08:00
Nicola Zaghen 97572775d2 Reland [DataLayout] Fix occurrences that size and range of pointers are assumed to be the same.
GEP index size can be specified in the DataLayout, introduced in D42123. However, there were still places
in which getIndexSizeInBits was used interchangeably with getPointerSizeInBits. This notably caused issues
with Instcombine's visitPtrToInt; but the unit tests was incorrect, so this remained undiscovered.

This fixes the buildbot failures.

Differential Revision: https://reviews.llvm.org/D68328

Patch by Joseph Faulls!
2019-12-13 14:30:21 +00:00
Evgenii Stepanov dabd2622a8 hwasan: add tag_offset DWARF attribute to optimized debug info
Summary:
Support alloca-referencing dbg.value in hwasan instrumentation.
Update AsmPrinter to emit DW_AT_LLVM_tag_offset when location is in
loclist format.

Reviewers: pcc

Subscribers: srhines, aprantl, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70753
2019-12-12 16:18:54 -08:00
Johannes Doerfert 6abd01e462 [Attributor][FIX] Do treat byval arguments special
When we reason about the pointer argument that is byval we actually
reason about a local copy of the value passed at the call site. This was
not the case before and we wrongly introduced attributes based on the
surrounding function.

AAMemoryBehaviorArgument, AAMemoryBehaviorCallSiteArgument and
AANoCaptureCallSiteArgument are made aware of byval now. The code
to skip "subsuming positions" for reasoning follows a common pattern and
we should refactor it. A TODO was added.

Discovered by @efriedma as part of D69748.
2019-12-12 16:04:21 -06:00
Florian Hahn 526244b187 [Matrix] Add first set of matrix intrinsics and initial lowering pass.
This is the first patch adding an initial set of matrix intrinsics and a
corresponding lowering pass. This has been discussed on llvm-dev:
http://lists.llvm.org/pipermail/llvm-dev/2019-October/136240.html

The first patch introduces four new intrinsics (transpose, multiply,
columnwise load and store) and a LowerMatrixIntrinsics pass, that
lowers those intrinsics to vector operations.

Matrixes are embedded in a 'flat' vector (e.g. a 4 x 4 float matrix
embedded in a <16 x float> vector) and the intrinsics take the dimension
information as parameters. Those parameters need to be ConstantInt.
For the memory layout, we initially assume column-major, but in the RFC
we also described how to extend the intrinsics to support row-major as
well.

For the initial lowering, we split the input of the intrinsics into a
set of column vectors, transform those column vectors and concatenate
the result columns to a flat result vector.

This allows us to lower the intrinsics without any shape propagation, as
mentioned in the RFC. In follow-up patches, we plan to submit the
following improvements:
 * Shape propagation to eliminate the embedding/splitting for each
   intrinsic.
 * Fused & tiled lowering of multiply and other operations.
 * Optimization remarks highlighting matrix expressions and costs.
 * Generate loops for operations on large matrixes.
 * More general block processing for operation on large vectors,
   exploiting shape information.

We would like to add dedicated transpose, columnwise load and store
intrinsics, even though they are not strictly necessary. For example, we
could instead emit a large shufflevector instruction instead of the
transpose. But we expect that to
  (1) become unwieldy for larger matrixes (even for 16x16 matrixes,
      the resulting shufflevector masks would be huge),
  (2) risk instcombine making small changes, causing us to fail to
      detect the transpose, preventing better lowerings

For the load/store, we are additionally planning on exploiting the
intrinsics for better alias analysis.

Reviewers: anemet, Gerolf, reames, hfinkel, andrew.w.kaylor, efriedma, rengolin

Reviewed By: anemet

Differential Revision: https://reviews.llvm.org/D70456
2019-12-12 15:42:18 +00:00
Hideto Ueno 4ecf25545c [Attributor][NFC] Fix comments and unnecessary comma 2019-12-12 13:42:40 +00:00
Hideto Ueno 827bade262 [Attributor] [NFC] Use `checkForAllUses` helpr in `AAHeapToStackImpl::updateImpl`
Summary: Remove `Worklist` iteration and make use `checkForAllUses`. There is no test chage.

Reviewers: sstefan1, jdoerfert

Reviewed By: jdoerfert

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71352
2019-12-12 13:27:53 +00:00
Hideto Ueno 63599bd072 [Attributor][NFC] Refactoring `AANoFreeArgument::updateImpl`
Summary: Refactoring `AANoFreeArgument::updateImpl`. There is no test change.

Reviewers: sstefan1, jdoerfert

Reviewed By: sstefan1

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71349
2019-12-12 13:27:53 +00:00
Nicola Zaghen f798eb21ec Temporarily Revert "[DataLayout] Fix occurrences that size and range of pointers are assumed to be the same."
This reverts commit 5f6208778f.

This caused failures in Transforms/PhaseOrdering/scev-custom-dl.ll
const: Assertion `getBitWidth() == CR.getBitWidth() && "ConstantRange types don't agree!"' failed.
2019-12-12 10:29:54 +00:00
Nicola Zaghen 5f6208778f [DataLayout] Fix occurrences that size and range of pointers are assumed to be the same.
GEP index size can be specified in the DataLayout, introduced in D42123. However, there were still places
in which getIndexSizeInBits was used interchangeably with getPointerSizeInBits. This notably caused issues
with Instcombine's visitPtrToInt; but the unit tests was incorrect, so this remained undiscovered.

Differential Revision: https://reviews.llvm.org/D68328

Patch by Joseph Faulls!
2019-12-12 10:07:01 +00:00
Wenlei He d275a06487 [AutoFDO] Statistic for context sensitive profile guided inlining
Summary: AutoFDO compilation has two places that do inlining - the sample profile loader that does inlining with context sensitive profile, and the regular inliner as CGSCC pass. Ideally we want most inlining to come from sample profile loader as that is driven by context sensitive profile and also retains context sensitivity after inlining. However the reality is most of the inlining actually happens during regular inliner. To track the number of inline instances from sample profile loader and help move more inlining to sample profile loader, I'm adding statistics and optimization remarks for sample profile loader's inlining.

Reviewers: wmi, davidxl

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70584
2019-12-11 21:37:21 -08:00
Reid Kleckner 5d986953c8 [IR] Split out target specific intrinsic enums into separate headers
This has two main effects:
- Optimizes debug info size by saving 221.86 MB of obj file size in a
  Windows optimized+debug build of 'all'. This is 3.03% of 7,332.7MB of
  object file size.
- Incremental step towards decoupling target intrinsics.

The enums are still compact, so adding and removing a single
target-specific intrinsic will trigger a rebuild of all of LLVM.
Assigning distinct target id spaces is potential future work.

Part of PR34259

Reviewers: efriedma, echristo, MaskRay

Reviewed By: echristo, MaskRay

Differential Revision: https://reviews.llvm.org/D71320
2019-12-11 18:02:14 -08:00
Reid Kleckner 85ba5f637a Rename TTI::getIntImmCost for instructions and intrinsics
Soon Intrinsic::ID will be a plain integer, so this overload will not be
possible.

Rename both overloads to ensure that downstream targets observe this as
a build failure instead of a runtime failure.

Split off from D71320

Reviewers: efriedma

Differential Revision: https://reviews.llvm.org/D71381
2019-12-11 18:00:20 -08:00
Nikita Popov 8db5143b1a [InstCombine] Optimize overflow check base on uadd.with.overflow result
Fix for https://bugs.llvm.org/show_bug.cgi?id=40846.

This adds a combine for cases where a (a + b) < a style overflow
check is performed, but with a + b being the result of
uadd.with.overflow, so the overflow result is also already available
and we can just use it. Subsequently GVN/CSE will deduplicate the extracts.

We can run into this situation if you have both a uadd.with.overflow
and a manual add + overflow check in the same function (on the same
operands), in which case GVN will rewrite the add to the with.overflow
result and leave you with this pattern.

The implementation is a bit ugly because I'm handling the various
canonicalization edge cases.

This does not yet handle the negated version of this pattern.

Differential Revision: https://reviews.llvm.org/D58644
2019-12-11 20:52:04 +01:00
Nikita Popov b361d3bbcd [MergeFuncs] Remove incorrect attribute copying
Fix for https://bugs.llvm.org/show_bug.cgi?id=44236. This code was
originally introduced in rG36512330041201e10f5429361bbd79b1afac1ea1.
However, the attribute copying was done in the wrong place (in general
call replacement, not thunk generation) and a proper fix was
implemented in D12581.

Previously this code was just unnecessary but harmless (because
FunctionComparator ensured that the attributes of the two functions
are exactly the same), but since byval was changed to accept a type
this copying is actively wrong and may result in malformed IR.

Differential Revision: https://reviews.llvm.org/D71173
2019-12-11 20:09:54 +01:00
Guillaume Chatelet 0a0d54b357 [Alignment][NFC] Introduce Align in IRBuilder
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71343
2019-12-11 14:41:23 +01:00
Guillaume Chatelet 3491109587 Rollback assumeAligned in MemorySanitizer
Summary: Rollback of parts of D71213. After digging more into the code I think we should leave 0 when creating the instructions (CreateMemcpy, CreateMaskedStore, CreateMaskedLoad). It's probably fine for MemorySanitizer because Alignement is resolved but I'm having a hard time convincing myself it has no impact at all (although tests are passing).

Reviewers: courbet

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71332
2019-12-11 14:25:21 +01:00
Guillaume Chatelet 8a7c52bc22 [Alignment][NFC] Introduce Align in SROA
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71277
2019-12-11 09:34:38 +01:00
Vlad Tsyrklevich 636c93ed11 Revert "Reapply: [DebugInfo] Recover debug intrinsics when killing duplicated/empty..."
This reverts commit f2ba93971c, it was
causing build timeouts on sanitizer-x86_64-linux-autoconf such as
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-autoconf/builds/44917
2019-12-10 16:03:17 -08:00
Francesco Petrogalli 0be81968a2 [VectorUtils] Introduce the Vector Function Database (VFDatabase).
This patch introduced the VFDatabase, the framework proposed in
http://lists.llvm.org/pipermail/llvm-dev/2019-June/133484.html. [*]

In this patch the VFDatabase is used to bridge the TargetLibraryInfo
(TLI) calls that were previously used to query for the availability of
vector counterparts of scalar functions.

The VFISAKind field `ISA` of VFShape have been moved into into VFInfo,
under the assumption that different vector ISAs may provide the same
vector signature. At the moment, the vectorizer accepts any of the
available ISAs as long as the signature provided by the VFDatabase
matches the one expected in the vectorization process. For example,
when targeting AVX or AVX2, which both have 256-bit registers, the IR
signature of the two vector functions associated to the two ISAs is
the same. The `getVectorizedFunction` method at the moment returns the
first available match. We will need to add more heuristics to the
search system to decide which of the available version (TLI, AVX,
AVX2, ...)  the system should prefer, when multiple versions with the
same VFShape are present.

Some of the code in this patch is based on the work done by Sumedh
Arani in https://reviews.llvm.org/D66025.

[*] Notice that in the proposal the VFDatabase was called SVFS. The
name VFDatabase is more in line with LLVM recommendations for
naming classes and variables.

Differential Revision: https://reviews.llvm.org/D67572
2019-12-10 16:36:44 +00:00
Sanjay Patel 396d18aeb6 [InstCombine] replace shuffle's insertelement operand if inserted scalar is not demanded
This pattern is noted as a regression from:
D70246
...where we removed an over-aggressive shuffle simplification.

SimplifyDemandedVectorElts fails to catch this case when the insert has multiple uses,
so I'm proposing to pattern match the minimal sequence directly. This fold does not
conflict with any of our current shuffle undef/poison semantics.

Differential Revision: https://reviews.llvm.org/D71220
2019-12-10 10:10:05 -05:00
Guillaume Chatelet 1b2842bf90 [Alignment][NFC] CreateMemSet use MaybeAlign
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: arsenm, jvesely, nhaehnle, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D71213
2019-12-10 15:17:44 +01:00
stozer f2ba93971c Reapply: [DebugInfo] Recover debug intrinsics when killing duplicated/empty...
basic blocks

Originally applied in 72ce759928.

Fixed a build failure caused by incorrect use of cast instead of
dyn_cast.

This reverts commit 8b0780f795.
2019-12-10 13:33:32 +00:00
Djordje Todorovic 9b9e995819 [DebugInfo][EarlyCSE] Use the salvageDebugInfoOrMarkUndef(); NFC
Use the newest API.

Differential Revision: https://reviews.llvm.org/D71061
2019-12-09 13:57:35 +01:00
David Green be7a107070 [ARM] Teach the Arm cost model that a Shift can be folded into other instructions
This attempts to teach the cost model in Arm that code such as:
  %s = shl i32 %a, 3
  %a = and i32 %s, %b
Can under Arm or Thumb2 become:
  and r0, r1, r2, lsl #3

So the cost of the shift can essentially be free. To do this without
trying to artificially adjust the cost of the "and" instruction, it
needs to get the users of the shl and check if they are a type of
instruction that the shift can be folded into. And so it needs to have
access to the actual instruction in getArithmeticInstrCost, which if
available is added as an extra parameter much like getCastInstrCost.

We otherwise limit it to shifts with a single user, which should
hopefully handle most of the cases. The list of instruction that the
shift can be folded into include ADC, ADD, AND, BIC, CMP, EOR, MVN, ORR,
ORN, RSB, SBC and SUB. This translates to Add, Sub, And, Or, Xor and
ICmp.

Differential Revision: https://reviews.llvm.org/D70966
2019-12-09 10:24:33 +00:00
Florian Hahn c491949694 [LV] Pick correct BB as insert point when fixing PHI for FORs.
Currently we fail to pick the right insertion point when
PreviousLastPart of a first-order-recurrence is a PHI node not in the
LoopVectorBody. This can happen when PreviousLastPart is produce in a
predicated block. In that case, we should pick the insertion point in
the BB the PHI is in.

Fixes PR44020.

Reviewers: hsaito, fhahn, Ayal, dorit

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D71071
2019-12-07 19:32:00 +00:00
Florian Hahn c25de56905 [SimplifyCFG] Account for N being null.
Fixes a crash, e.g.
  http://lab.llvm.org:8011/builders/clang-with-lto-ubuntu/builds/15119/
2019-12-07 17:23:42 +00:00
Rodrigo Caetano Rocha d714aa0dfd [SimplifyCFG] Handle AssumptionCache being null.
AssumptionCache can be null in SimplifyCFGOptions. However, FoldCondBranchOnPHI() was not properly handling that when passing a null AssumptionCache to simplifyCFG.

Patch by Rodrigo Caetano Rocha <rcor.cs@gmail.com>

Reviewers: fhahn, lebedev.ri, spatel

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D69963
2019-12-07 16:54:49 +00:00
Florian Hahn e60b36cf92 [VPlan] Rename VPlanHCFGTransforms to VPlanTransforms (NFC).
The file is intended to gather various VPlan transformations, not only
CFG related transforms. Actually, the only transformation there is not
CFG related.

Reviewers: Ayal, gilr, hsaito, rengolin

Reviewed By: gilr

Differential Revision: https://reviews.llvm.org/D70732
2019-12-07 08:56:35 +00:00
Teresa Johnson c8e36862f5 [WPD] Remove unused parameter (NFC)
Remove unused parameter.
2019-12-06 13:14:21 -08:00
Wenlei He 7b61ae68ec [AutoFDO] Inline replay for cold/small callees from sample profile loader
Summary:
Sample profile loader of AutoFDO tries to replay previous inlining using context sensitive profile. The replay only repeats inlining if the call site block is hot. As a result it punts inlining of small functions, some of which can be beneficial for size, and will still be inlined by CSGCC inliner later. The oscillation between sample profile loader's inlining and regular CGSSC inlining cause unnecessary loss of context-sensitive profile. It doesn't have much impact for inline decision itself, but it negatively affects post-inline profile quality as CGSCC inliner have to scale counts which is not as accurate as the original context sensitive profile, and bad post-inline profile can misguide code layout.

This change added regular Inline Cost calculation for sample profile loader, so we can inline small functions upfront under switch -sample-profile-inline-size. In addition -sample-profile-cold-inline-threshold is added so we can tune the separate size threshold - currently the default is chosen to be the same as regular inliner's cold call-site threshold.

Reviewers: wmi, davidxl

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70750
2019-12-06 11:44:45 -08:00
Sanjay Patel 43e2a901e1 Revert "[InstCombine] reduce code duplication; NFC"
This reverts commit db57396584.
At least 1 of these supposedly NFC commits wasn't - sanitizer bot is angry.
2019-12-06 14:24:14 -05:00
Sanjay Patel b6d6f5470f Revert "[InstCombine] improve readability; NFC"
This reverts commit 7250ef3613.
At least 1 of these supposedly NFC commits wasn't - sanitizer bot is angry.
2019-12-06 14:20:44 -05:00
Sanjay Patel 142a75a9b1 Revert "[InstCombine] reduce indentation; NFC"
This reverts commit 8bf8ef7116.
At least 1 of these supposedly NFC commits wasn't - sanitizer bot is angry.
2019-12-06 14:19:02 -05:00
Sanjay Patel 8bf8ef7116 [InstCombine] reduce indentation; NFC 2019-12-06 13:26:45 -05:00
Sanjay Patel 7250ef3613 [InstCombine] improve readability; NFC
CreateIntCast returns the input if its type matches, so need to duplicate that check.
2019-12-06 13:26:45 -05:00
Sanjay Patel db57396584 [InstCombine] reduce code duplication; NFC 2019-12-06 13:26:45 -05:00
Sanjay Patel 6bb62a9d97 [InstCombine] improve readability; NFC 2019-12-06 13:26:44 -05:00
Gil Rapaport 39ccc099c9 [LV] Record GEP widening decisions in recipe (NFCI)
InnerLoopVectorizer's code called during VPlan execution still relies on
original IR's def-use relations to decide which vector code to generate,
limiting VPlan transformations ability to modify def-use relations and still
have ILV generate the vector code.
This commit moves GEP operand queries controlling how GEPs are widened to a
dedicated recipe and extracts GEP widening code to its own ILV method taking
those recorded decisions as arguments. This reduces ingredient def-use usage by
ILV as a step towards full VPlan-based def-use relations.

Differential revision: https://reviews.llvm.org/D69067
2019-12-06 13:41:19 +02:00
Daniil Suchkov c4d8c6319f [LCSSA] Don't use VH callbacks to invalidate SCEV when creating LCSSA phis
In general ValueHandleBase::ValueIsRAUWd shouldn't be called when not
all uses of the value were actually replaced, though, currently
formLCSSAForInstructions calls it when it inserts LCSSA-phis.

Calls of ValueHandleBase::ValueIsRAUWd were added to LCSSA specifically
to update/invalidate SCEV. In the best case these calls duplicate some
of the work already done by SE->forgetValue, though in case when SCEV of
the value is SCEVUnknown, SCEV replaces the underlying value of
SCEVUnknown with the new value (i.e. acts like LCSSA-phi actually fully
replaces the value it is created for), which leads to SCEV being
corrupted because LCSSA-phi rarely dominates all uses of its inputs.

Fixes bug https://bugs.llvm.org/show_bug.cgi?id=44058.

Reviewers: fhahn, efriedma, reames, sanjoy.google

Reviewed By: fhahn

Subscribers: hiraditya, javed.absar, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70593
2019-12-06 13:21:49 +07:00
Teresa Johnson 54a3c2a81e [ThinLTO] Add option to disable readonly/writeonly attribute propagation
Summary:
Add an option to allow the attribute propagation on the index to be
disabled, to allow a workaround for issues (such as that fixed by
D70977).

Also move the setting of the WithAttributePropagation flag on the index
into propagateAttributes(), and remove some old stale code that predated
this flag and cleared the maybe read/write only bits when we need to
disable the propagation (previously only when importing disabled, now
also when the new option disables it).

Reviewers: evgeny777, steven_wu

Subscribers: mehdi_amini, inglorion, hiraditya, dexonsmith, arphaman, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70984
2019-12-05 16:33:54 -08:00
Wenlei He 532196d811 [AutoFDO] Top-down Inlining for specialization with context-sensitive profile
Summary:
AutoFDO's sample profile loader processes function in arbitrary source code order, so if I change the order of two functions in source code, the inline decision can change. This also prevented the use of context-sensitive profile to do specialization while inlining. This commit enforces SCC top-down order for sample profile loader. With this change, we can now do specialization, as illustrated by the added test case:

Say if we have A->B->C and D->B->C call path, we want to inline C into B when root inliner is B, but not when root inliner is A or D, this is not possible without enforcing top-down order. E.g. Once C is inlined into B, A and D can only choose to inline (B->C) as a whole or nothing, but what we want is only inline B into A and D, not its recursive callee C. If we process functions in top-down order, this is no longer a problem, which is what this commit is doing.

This change is guarded with a new switch "-sample-profile-top-down-load" for tuning, and it depends on D70653. Eventually, top-down can be the default order for sample profile loader.

Reviewers: wmi, davidxl

Subscribers: hiraditya, llvm-commits, tejohnson

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70655
2019-12-05 16:07:01 -08:00
Wenlei He e503fd85d3 [AutoFDO] Properly merge context-sensitive profile of inlinee back to outlined function
Summary:
When sample profile loader decides not to inline a previously inlined call-site, we adjust the profile of outlined function simply by scaling up its profile counts by call-site count. This means the context-sensitive profile of that inlined instance will be thrown away. This commit try to keep context-sensitive profile for such cases:

 - Instead of scaling outlined function's profile, we now properly merge the FunctionSamples of inlined instance into outlined function, including all recursively inlined profile.
 - Instead of adjusting the profile for negative inline decision at the end of the sample profile loader pass, we do the profile merge right after processing each function. This change paired with top-down ordering of annotation/inline-replay (a separate diff) will make sure we recursively merge profile back before the profile is used for annotation and inline replay.

A new switch -sample-profile-merge-inlinee is added to enable the new profile merge for tuning. It should be the default behavior eventually.

Reviewers: wmi, davidxl

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70653
2019-12-05 15:57:55 -08:00
Florian Hahn 19071173fc Revert "[DSE] Fix for a dangling point bug in DeadStoreElimination."
The commit causes a failure:
http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-win/builds/20911

This reverts commit 1847fd9d85.
2019-12-05 19:29:21 +00:00
Evgenii Stepanov 6f89cbc429 LowerDbgDeclare: look through bitcasts.
Summary:
Emit a value debug intrinsic (with OP_deref) when an alloca address is
passed to a function call after going through a bitcast.

This generates an FP or SP-relative location for the local variable in
the following case:
  int x;
  use((void *)&x;

Reviewers: aprantl, vsk, pcc

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70752
2019-12-05 11:19:07 -08:00
Bob Haarman 055779a9ac Revert "[InstCombine] keep assumption before sinking calls"
Summary:
This reverts commit c3b06d0c39.

Reason for revert: Caused miscompiles when inserting assume for undef.

Also adds a test to prevent similar breakage in future.

Fixes PR44154.

Reviewers: rnk, jdoerfert, efriedma, xbolva00

Reviewed By: rnk

Subscribers: thakis, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70933
2019-12-05 10:39:34 -08:00
Roman Lebedev 796fa662f1
[InstCombine] Invert `add A, sext(B) --> sub A, zext(B)` canonicalization (to `sub A, zext B -> add A, sext B`)
Summary:
D68408 proposes to greatly improve our negation sinking abilities.
But in current canonicalization, we produce `sub A, zext(B)`,
which we will consider non-canonical and try to sink that negation,
undoing the existing canonicalization.
So unless we explicitly stop producing previous canonicalization,
we will have two conflicting folds, and will end up endlessly looping.

This inverts canonicalization, and adds back the obvious fold
that we'd miss:
* `sub [nsw] Op0, sext/zext (bool Y) -> add [nsw] Op0, zext/sext (bool Y)`
  https://rise4fun.com/Alive/xx4
* `sext(bool) + C -> bool ? C - 1 : C`
  https://rise4fun.com/Alive/fBl

It is obvious that `@ossfuzz_9880()` / `@lshr_out_of_range()`/`@ashr_out_of_range()`
(oss-fuzz 4871) are no longer folded as much, though those aren't really worrying.

Reviewers: spatel, efriedma, t.p.northover, hfinkel

Reviewed By: spatel

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71064
2019-12-05 21:21:30 +03:00
Ankit 1847fd9d85 [DSE] Fix for a dangling point bug in DeadStoreElimination.
The patch makes sure that the LastThrowing pointer does not point to any instruction deleted by call to DeleteDeadInstruction.

While iterating through the instructions the pass maintains a pointer to the lastThrowing Instruction. A call to deleteDeadInstruction deletes a dead store and other instructions feeding the original dead instruction which also become dead. The instruction pointed by the lastThrowing pointer could also be deleted by the call to DeleteDeadInstruction and thus it becomes a dangling pointer. Because of this, we see an error in the next iteration.

In the patch, we maintain a list of throwing instructions encountered previously and use the last non deleted throwing instruction from the container.

Patch by Ankit <quic_aankit@quicinc.com>

Reviewers: fhahn, bcahoon, efriedma

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D65326
2019-12-05 17:53:58 +00:00
Sanjay Patel 3c6b5d3674 [InstCombine] narrow select with FP casts
Select doesn't change values, so truncate of extended operand cancels out.
2019-12-05 11:12:44 -05:00
Sanjay Patel 51e420c27e [InstCombine] add FMF guard to builder in fptrunc transform; NFC
This makes no difference currently because we don't apply FMF
to FP casts, but that may change.

This could also be a place to add a fold for select with fptrunc,
so it will make that patch easier/smaller.
2019-12-05 10:55:07 -05:00
Roman Lebedev 09311459e3
[InstCombine] Extend `0 - (X sdiv C) -> (X sdiv -C)` fold to non-splat vectors
Split off from https://reviews.llvm.org/D68408
2019-12-05 15:48:29 +03:00
Teresa Johnson e420c0c78e [ThinLTO] Fix importing of writeonly variables in distributed ThinLTO
Summary:
D69561/dde5893 enabled importing of readonly variables with references,
however, it introduced a bug relating to importing/internalization of
writeonly variables with references.

A fix for this was added in D70006/7f92d66. But this didn't work in
distributed ThinLTO mode. The reason is that the fix (importing the
writeonly var with a zeroinitializer) was only applied when there were
references on the writeonly var summary. In distributed ThinLTO mode,
where we only have a small slice of the index, we will not have the
references on the importing side if we are not importing those
referenced values. Rather than changing this handshaking (which will
require a lot of other changes, since that's how we know what to import
in the distributed backend clang invocation), we can simply always give
the writeonly variable a zero initializer.

Reviewers: evgeny777, steven_wu

Subscribers: mehdi_amini, inglorion, hiraditya, dexonsmith, arphaman, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70977
2019-12-04 14:59:27 -08:00
Tozer 8b0780f795 Revert "[DebugInfo] Recover debug intrinsics when killing duplicated/empty basic blocks"
This reverts commit 72ce759928.

Reverted due to build failure.
2019-12-04 18:47:08 +00:00
Vedant Kumar f208b70fbc Revert "[Coverage] Revise format to reduce binary size"
This reverts commit e18531595b.

On Windows, there is an error:

http://lab.llvm.org:8011/builders/sanitizer-windows/builds/54963/steps/stage%201%20check/logs/stdio

error: C:\b\slave\sanitizer-windows\build\stage1\projects\compiler-rt\test\profile\Profile-x86_64\Output\instrprof-merging.cpp.tmp.v1.o: Failed to load coverage: Malformed coverage data
2019-12-04 10:35:14 -08:00
Vedant Kumar e18531595b [Coverage] Revise format to reduce binary size
Revise the coverage mapping format to reduce binary size by:

1. Naming function records and marking them `linkonce_odr`, and
2. Compressing filenames.

This shrinks the size of llc's coverage segment by 82% (334MB -> 62MB)
and speeds up end-to-end single-threaded report generation by 10%. For
reference the compressed name data in llc is 81MB (__llvm_prf_names).

Rationale for changes to the format:

- With the current format, most coverage function records are discarded.
  E.g., more than 97% of the records in llc are *duplicate* placeholders
  for functions visible-but-not-used in TUs. Placeholders *are* used to
  show under-covered functions, but duplicate placeholders waste space.

- We reached general consensus about giving (1) a try at the 2017 code
  coverage BoF [1]. The thinking was that using `linkonce_odr` to merge
  duplicates is simpler than alternatives like teaching build systems
  about a coverage-aware database/module/etc on the side.

- Revising the format is expensive due to the backwards compatibility
  requirement, so we might as well compress filenames while we're at it.
  This shrinks the encoded filenames in llc by 86% (12MB -> 1.6MB).

See CoverageMappingFormat.rst for the details on what exactly has
changed.

Fixes PR34533 [2], hopefully.

[1] http://lists.llvm.org/pipermail/llvm-dev/2017-October/118428.html
[2] https://bugs.llvm.org/show_bug.cgi?id=34533

Differential Revision: https://reviews.llvm.org/D69471
2019-12-04 10:10:55 -08:00
Florian Hahn e8a5c17211 [LoopInterchange] Improve inner exit loop safety checks.
The PHI node checks for inner loop exits are too permissive currently.
As indicated by an existing comment, we should only allow LCSSA PHI
nodes that are part of reductions or are only used outside of the loop
nest. We ensure this by checking the users of the LCSSA PHIs.
Specifically, it is not safe to use an exiting value from the inner loop in the latch of the outer
loop.

It also moves the inner loop exit check before the outer loop exit
check.

Fixes PR43473.

Reviewers: efriedma, mcrosier

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D68144
2019-12-04 17:46:01 +00:00
Francesco Petrogalli a249551bb2 [llvm][Transform] Remove unused variable. [NFCI]
The variable prevents compiling when using -Werror=unused-variable.
2019-12-04 17:40:30 +00:00
Hiroshi Yamauchi 62d429972e [PGO][PGSO] Distinguish queries from unit tests and explicitly enable for the existing IR passes only. NFC.
Summary:
This is one more prep step necessary before the code gen pass instrumentation
code could go in.

Reviewers: davidxl

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70988
2019-12-04 09:35:50 -08:00
stozer 72ce759928 [DebugInfo] Recover debug intrinsics when killing duplicated/empty basic blocks
When basic blocks are killed, either due to being empty or to being an if.then
or if.else block whose complement contains identical instructions, some of the
debug intrinsics in that block are lost. This patch sinks those intrinsics
into the single successor block, setting them Undef if necessary to
prevent debug info from falling out-of-date.

Differential Revision: https://reviews.llvm.org/D70318
2019-12-04 16:01:49 +00:00
Florian Hahn 4a9cde5a79 [SimpleLoopUnswitch] Invalidate the topmost loop with ExitBB as exiting.
SCEV caches the exiting blocks when computing exit counts. In
SimpleLoopUnswitch, we split the exit block of the loop to unswitch.

Currently we only invalidate the loop containing that exit block, but if
that block is the exiting block for a parent loop, we have stale cache
entries. We have to invalidate the top-most loop that contains the exit
block as exiting block. We might also be able to skip invalidating the
loop containing the exit block, if the exit block is not an exiting
block of that loop.

There are also 2 more places in SimpleLoopUnswitch, that use a similar
problematic approach to get the loop to invalidate. If the patch makes
sense, I will also update those places to a similar approach (they deal
with multiple exit blocks, so we cannot directly re-use
getTopMostExitingLoop).

Fixes PR43972.

Reviewers: skatkov, reames, asbirlea, chandlerc

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D70786
2019-12-04 11:32:09 +00:00
Ehud Katz 2b6b8cb10c [APFloat] Prevent construction of APFloat with Semantics and FP value
Constructor invocations such as `APFloat(APFloat::IEEEdouble(), 0.0)`
may seem like they accept a FP (floating point) value, but the overload
they reach is actually the `integerPart` one, not a `float` or `double`
overload (which only exists when `fltSemantics` isn't passed).

This may lead to possible loss of data, by the conversion from `float`
or `double` to `integerPart`.

To prevent future mistakes, a new constructor overload, which accepts
any FP value and marked with `delete`, to prevent its usage.

Fixes PR34095.

Differential Revision: https://reviews.llvm.org/D70425
2019-12-04 12:02:04 +02:00
Craig Topper 5ebbabc1af [InstCombine] Revert aafde063aa and 6749dc3446 related to bitcast handling of x86_mmx
This reverts these two commits
[InstCombine] Turn (extractelement <1 x i64/double> (bitcast (x86_mmx))) into a single bitcast from x86_mmx to i64/double.
[InstCombine] Don't transform bitcasts between x86_mmx and v1i64 into insertelement/extractelement

We're seeing at least one internal test failure related to a
bitcast that was previously before an inline assembly block
containing emms being placed after it. This leads to the mmx
state ending up not empty after the emms. IR has no way to
make any specific guarantees about this. Reverting these patches
to get back to previous behavior which at least worked for this
test.
2019-12-03 14:02:22 -08:00
Ayal Zaks 6ed9cef25f [LV] Scalar with predication must not be uniform
Fix PR40816: avoid considering scalar-with-predication instructions as also
uniform-after-vectorization.

Instructions identified as "scalar with predication" will be "vectorized" using
a replicating region. If such instructions are also optimized as "uniform after
vectorization", namely when only the first of VF lanes is used, such a
replicating region becomes erroneous - only the first instance of the region can
and should be formed. Fix such cases by not considering such instructions as
"uniform after vectorization".

Differential Revision: https://reviews.llvm.org/D70298
2019-12-03 19:50:24 +02:00
Anton Afanasyev a315519c17 [SLP] Enhance SLPVectorizer to vectorize different combinations of aggregates
Summary:
Make SLPVectorize to recognize homogeneous aggregates like
`{<2 x float>, <2 x float>}`, `{{float, float}, {float, float}}`,
`[2 x {float, float}]` and so on.
It's a follow-up of https://reviews.llvm.org/D70068.
Merged `findBuildVector()` and `findBuildAggregate()` to
one `findBuildAggregate()` function making it recursive
to recognize multidimensional aggregates. Aggregates required
to be homogeneous.

Reviewers: RKSimon, ABataev, dtemirbulatov, spatel, vporpo

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70587
2019-12-03 19:29:27 +03:00
Florian Hahn e9c68422de [VPlan] Add dump function to VPlan class.
This adds a dump() function to VPlan, which uses the existing
operator<<.

This method provides a convenient way to dump a VPlan while debugging,
e.g. from lldb.

Reviewers: hsaito, Ayal, gilr, rengolin

Reviewed By: hsaito

Differential Revision: https://reviews.llvm.org/D70920
2019-12-03 11:59:10 +00:00
Johannes Altmanninger 09667bc192 [asan] Remove debug locations from alloca prologue instrumentation
Summary:
This fixes https://llvm.org/PR26673
"Wrong debugging information with -fsanitize=address"
where asan instrumentation causes the prologue end to be computed
incorrectly: findPrologueEndLoc, looks for the first instruction
with a debug location to determine the prologue end.  Since the asan
instrumentation instructions had debug locations, that prologue end was
at some instruction, where the stack frame is still being set up.

There seems to be no good reason for extra debug locations for the
asan instrumentations that set up the frame; they don't have a natural
source location.  In the debugger they are simply located at the start
of the function.

For certain other instrumentations like -fsanitize-coverage=trace-pc-guard
the same problem persists - that might be more work to fix, since it
looks like they rely on locations of the tracee functions.

This partly reverts aaf4bb2394
"[asan] Set debug location in ASan function prologue"
whose motivation was to give debug location info to the coverage callback.
Its test only ensures that the call to @__sanitizer_cov_trace_pc_guard is
given the correct source location; as the debug location is still set in
ModuleSanitizerCoverage::InjectCoverageAtBlock, the test does not break.
So -fsanitize-coverage is hopefully unaffected - I don't think it should
rely on the debug locations of asan-generated allocas.

Related revision: 3c6c14d14b
"ASAN: Provide reliable debug info for local variables at -O0."

Below is how the X86 assembly version of the added test case changes.
We get rid of some .loc lines and put prologue_end where the user code starts.

```diff
--- 2.master.s	2019-12-02 12:32:38.982959053 +0100
+++ 2.patch.s	2019-12-02 12:32:41.106246674 +0100
@@ -45,8 +45,6 @@
 	.cfi_offset %rbx, -24
 	xorl	%eax, %eax
 	movl	%eax, %ecx
- .Ltmp2:
- 	.loc	1 3 0 prologue_end      # 2.c:3:0
 	cmpl	$0, __asan_option_detect_stack_use_after_return
 	movl	%edi, 92(%rbx)          # 4-byte Spill
 	movq	%rsi, 80(%rbx)          # 8-byte Spill
@@ -57,9 +55,7 @@
 	callq	__asan_stack_malloc_0
 	movq	%rax, 72(%rbx)          # 8-byte Spill
 .LBB1_2:
- 	.loc	1 0 0 is_stmt 0         # 2.c:0:0
 	movq	72(%rbx), %rax          # 8-byte Reload
- 	.loc	1 3 0                   # 2.c:3:0
 	cmpq	$0, %rax
 	movq	%rax, %rcx
 	movq	%rax, 64(%rbx)          # 8-byte Spill
@@ -72,9 +68,7 @@
 	movq	%rax, %rsp
 	movq	%rax, 56(%rbx)          # 8-byte Spill
 .LBB1_4:
- 	.loc	1 0 0                   # 2.c:0:0
 	movq	56(%rbx), %rax          # 8-byte Reload
- 	.loc	1 3 0                   # 2.c:3:0
 	movq	%rax, 120(%rbx)
 	movq	%rax, %rcx
 	addq	$32, %rcx
@@ -99,7 +93,6 @@
 	movb	%r8b, 31(%rbx)          # 1-byte Spill
 	je	.LBB1_7
 # %bb.5:
- 	.loc	1 0 0                   # 2.c:0:0
 	movq	40(%rbx), %rax          # 8-byte Reload
 	andq	$7, %rax
 	addq	$3, %rax
@@ -118,7 +111,8 @@
 	movl	%ecx, (%rax)
 	movq	80(%rbx), %rdx          # 8-byte Reload
 	movq	%rdx, 128(%rbx)
-	.loc	1 4 3 is_stmt 1         # 2.c:4:3
+.Ltmp2:
+	.loc	1 4 3 prologue_end      # 2.c:4:3
 	movq	%rax, %rdi
 	callq	f
 	movq	48(%rbx), %rax          # 8-byte Reload
```

Reviewers: eugenis, aprantl

Reviewed By: eugenis

Subscribers: ormris, aprantl, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70894
2019-12-03 11:24:17 +01:00
Bill Wendling 87f146767e Place the "cold" code piece into the same section as the original function
Summary:
This cropped up in the Linux kernel where cold code was placed in an
incompatible section.

Reviewers: compnerd, vsk, tejohnson

Reviewed By: vsk

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70925
2019-12-02 15:24:59 -08:00
Hiroshi Yamauchi 8cdfdfeee6 [PGO][PGSO] Add an optional query type parameter to shouldOptimizeForSize.
Summary:
In case of a need to distinguish different query sites for gradual commit or
debugging of PGSO. NFC.

Reviewers: davidxl

Subscribers: hiraditya, zzheng, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70510
2019-12-02 13:54:13 -08:00
Florian Hahn fe459ce65a [VPlan] Move graph traits (NFC).
By defining the graph traits right after the VPBlockBase definitions, we
can make use of them earlier in the file.

Reviewers: hsaito, Ayal, gilr

Reviewed By: gilr

Differential Revision: https://reviews.llvm.org/D70733
2019-12-02 18:23:11 +00:00
Sanjay Patel af4e59949c [InstCombine] fix undef propagation for vector urem transform (PR44186)
As described here:
https://bugs.llvm.org/show_bug.cgi?id=44186

The match() code safely allows undef values, but we can't safely
propagate a vector constant that contains an undef to the new
compare instruction.
2019-12-02 12:17:38 -05:00
Simon Tatham 01aefae4a1 [ARM,MVE] Add an InstCombine rule permitting VPNOT.
Summary:
If a user writing C code using the ACLE MVE intrinsics generates a
predicate and then complements it, then the resulting IR will use the
`pred_v2i` IR intrinsic to turn some `<n x i1>` vector into a 16-bit
integer; complement that integer; and convert back. This will generate
machine code that moves the predicate out of the `P0` register,
complements it in an integer GPR, and moves it back in again.

This InstCombine rule replaces `i2v(~v2i(x))` with a direct complement
of the original predicate vector, which we can already instruction-
select as the VPNOT instruction which complements P0 in place.

Reviewers: ostannard, MarkMurrayARM, dmgreen

Reviewed By: dmgreen

Subscribers: kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70484
2019-12-02 16:20:30 +00:00
Roman Lebedev 0f22e783a0
[InstCombine] Revert rL341831: relax one-use check in foldICmpAddConstant() (PR44100)
rL341831 moved one-use check higher up, restricting a few folds
that produced a single instruction from two instructions to the case
where the inner instruction would go away.

Original commit message:
> InstCombine: move hasOneUse check to the top of foldICmpAddConstant
>
> There were two combines not covered by the check before now,
> neither of which actually differed from normal in the benefit analysis.
>
> The most recent seems to be because it was just added at the top of the
> function (naturally). The older is from way back in 2008 (r46687)
> when we just didn't put those checks in so routinely, and has been
> diligently maintained since.

From the commit message alone, there doesn't seem to be a
deeper motivation, deeper problem that was trying to solve,
other than 'fixing the wrong one-use check'.

As i have briefly discusses in IRC with Tim, the original motivation
can no longer be recovered, too much time has passed.

However i believe that the original fold was doing the right thing,
we should be performing such a transformation even if the inner `add`
will not go away - that will still unchain the comparison from `add`,
it will no longer need to wait for `add` to compute.

Doing so doesn't seem to break any particular idioms,
as least as far as i can see.

References https://bugs.llvm.org/show_bug.cgi?id=44100
2019-12-02 18:06:15 +03:00
Sanjay Patel af0babc90a [InstCombine] fold copysign with constant sign argument to (fneg+)fabs
If the sign of the sign argument is known (this could be extended to use ValueTracking),
then we can use fneg+fabs to clear/set the sign bit of the magnitude argument.
http://llvm.org/docs/LangRef.html#llvm-copysign-intrinsic

This transform is already done in DAGCombiner, but we can do it sooner in IR as
suggested in PR44153:
https://bugs.llvm.org/show_bug.cgi?id=44153

We have effectively no analysis for copysign in IR, so we are taking the unusual step
of increasing the number of IR instructions for the negative constant case.

Differential Revision: https://reviews.llvm.org/D70792
2019-12-02 09:23:12 -05:00
Bjorn Pettersson a9d6b0e544 [InstCombine] Fix big-endian miscompile of (bitcast (zext/trunc (bitcast)))
Summary:
optimizeVectorResize is rewriting patterns like:
  %1 = bitcast vector %src to integer
  %2 = trunc/zext %1
  %dst = bitcast %2 to vector

Since bitcasting between integer an vector types gives
different integer values depending on endianness, we need
to take endianness into account. As it happens the old
implementation only produced the correct result for little
endian targets.

Fixes: https://bugs.llvm.org/show_bug.cgi?id=44178

Reviewers: spatel, lattner, lebedev.ri

Reviewed By: spatel, lebedev.ri

Subscribers: lebedev.ri, hiraditya, uabelho, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70844
2019-12-02 11:05:25 +01:00
David Green 59b56e5c57 [InstCombine] Expand usub_sat patterns to handle constants
The constants come through as add %x, -C, not a sub as would be
expected. They need some extra matchers to canonicalise them towards
usub_sat.

Differential Revision: https://reviews.llvm.org/D69514
2019-11-30 16:58:01 +00:00
David Green 3a1bef5616 [InstCombine] Adjust usub_sat fold one use checks
This adjusts the one use checks in the the usub_sat fold code to not
increase instruction count, but otherwise do the fold. Reviewed as a
part of D69514.
2019-11-30 16:58:00 +00:00
Hideto Ueno 6c742fdbf4 [Attributor] Deduce dereferenceable based on accessed bytes map
Summary:
This patch introduces the deduction based on load/store instructions whose pointer operand is a non-inbounds GEP instruction.
For example if we have,
```
void f(int *u){
 u[0] = 0;
 u[1] = 1;
 u[2] = 2;
}
```
then u must be dereferenceable(12).

This patch is inspired by D64258

Reviewers: jdoerfert, spatel, hfinkel, RKSimon, sstefan1, xbolva00, dtemirbulatov

Reviewed By: jdoerfert

Subscribers: jfb, lebedev.ri, xbolva00, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70714
2019-11-29 06:55:58 +00:00
Hideto Ueno dfedae5001 [Attributor] Remove dereferenceable_or_null when nonull is present
Summary: This patch prevents the simultaneous presence of `dereferenceable` and `dereferenceable_or_null` attribute

Reviewers: jdoerfert, sstefan1

Reviewed By: jdoerfert

Subscribers: lebedev.ri, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70789
2019-11-29 06:45:07 +00:00
Dávid Bolvanský 40963b2bf0 Revert "[Attributor] Move pass after InstCombine to futher eliminate null pointer checks"
This reverts commit 7ca7d62c6e. Commited accidentally.
2019-11-27 22:45:47 +01:00
Dávid Bolvanský 7ca7d62c6e [Attributor] Move pass after InstCombine to futher eliminate null pointer checks
Summary: PR44149

Reviewers: jdoerfert

Subscribers: mehdi_amini, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70737
2019-11-27 22:36:51 +01:00
Hideto Ueno 0f4383faa7 [Attributor] Handle special case when offset equals zero in nonnull deduction 2019-11-27 14:45:16 +00:00
Eric Christopher fd39b1bb20 Revert "Revert "As a follow-up to my initial mail to llvm-dev here's a first pass at the O1 described there.""
This reapplies: 8ff85ed905

Original commit message:

As a follow-up to my initial mail to llvm-dev here's a first pass at the O1 described there.

This change doesn't include any change to move from selection dag to fast isel
and that will come with other numbers that should help inform that decision.
There also haven't been any real debuggability studies with this pipeline yet,
this is just the initial start done so that people could see it and we could start
tweaking after.

Test updates: Outside of the newpm tests most of the updates are coming from either
optimization passes not run anymore (and without a compelling argument at the moment)
that were largely used for canonicalization in clang.

Original post:

http://lists.llvm.org/pipermail/llvm-dev/2019-April/131494.html

Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65410

This reverts commit c9ddb02659.
2019-11-26 20:28:52 -08:00
Dávid Bolvanský 0e32fbd223 [InstCombine] Fixed std::min on some bots. NFCI 2019-11-26 11:06:31 +01:00
Dávid Bolvanský bb7b8540f0 [InstCombine] Optimize some memccpy calls to memcpy/null
Summary:
return memccpy(d, "helloworld", 'r', 20)
=>
return memcpy(d, "helloworld", 8 /* pos of 'r' in string */), d + 8

Reviewers: efriedma, jdoerfert

Reviewed By: jdoerfert

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68089
2019-11-26 10:54:47 +01:00
Hideto Ueno 78a750276f [Attributor] Track a GEP Instruction in align deduction
Summary:
This patch enables us to track GEP instruction in align deduction.
If a pointer `B` is defined as `A+Offset` and known to have alignment `C`, there exists some integer Q such that
```
 A + Offset = C * Q = B
```
 So we can say that the maximum power of two which is a divisor of gcd(Offset, C) is an alignment.

Reviewers: jdoerfert, sstefan1

Reviewed By: jdoerfert

Subscribers: lebedev.ri, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70392
2019-11-26 07:55:28 +00:00
Muhammad Omair Javaid c9ddb02659 Revert "As a follow-up to my initial mail to llvm-dev here's a first pass at the O1 described there."
This reverts commit 8ff85ed905.

This commit introduced 9 new failures on lldb buildbot host at http://lab.llvm.org:8014/builders/lldb-aarch64-ubuntu

Following tests were failing:
    lldb-api :: functionalities/tail_call_frames/ambiguous_tail_call_seq1/TestAmbiguousTailCallSeq1.py
    lldb-api :: functionalities/tail_call_frames/ambiguous_tail_call_seq2/TestAmbiguousTailCallSeq2.py
    lldb-api :: functionalities/tail_call_frames/disambiguate_call_site/TestDisambiguateCallSite.py
    lldb-api :: functionalities/tail_call_frames/disambiguate_paths_to_common_sink/TestDisambiguatePathsToCommonSink.py
    lldb-api :: functionalities/tail_call_frames/disambiguate_tail_call_seq/TestDisambiguateTailCallSeq.py
    lldb-api :: functionalities/tail_call_frames/inlining_and_tail_calls/TestInliningAndTailCalls.py
    lldb-api :: functionalities/tail_call_frames/sbapi_support/TestTailCallFrameSBAPI.py
    lldb-api :: functionalities/tail_call_frames/thread_step_out_message/TestArtificialFrameStepOutMessage.py
    lldb-api :: functionalities/tail_call_frames/thread_step_out_or_return/TestSteppingOutWithArtificialFrames.py
    lldb-api :: functionalities/tail_call_frames/unambiguous_sequence/TestUnambiguousTailCalls.py

Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65410
2019-11-26 09:32:13 +05:00
Eric Christopher 8ff85ed905 As a follow-up to my initial mail to llvm-dev here's a first pass at the O1 described there.
This change doesn't include any change to move from selection dag to fast isel
and that will come with other numbers that should help inform that decision.
There also haven't been any real debuggability studies with this pipeline yet,
this is just the initial start done so that people could see it and we could start
tweaking after.

Test updates: Outside of the newpm tests most of the updates are coming from either
optimization passes not run anymore (and without a compelling argument at the moment)
that were largely used for canonicalization in clang.

Original post:

http://lists.llvm.org/pipermail/llvm-dev/2019-April/131494.html

Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65410
2019-11-25 17:16:46 -08:00
Sanjay Patel 35827164c4 [InstCombine] remove shuffle mask canonicalization that creates undef elements
This is NFC-intended because SimplifyDemandedVectorElts() does the same
transform later. As discussed in D70641, we may want to change that
behavior, so we need to isolate where it happens.
2019-11-25 13:33:56 -05:00
Whitney Tsang aaf7f05a96 [NFC][LoopFusion] Use isControlFlowEquivalent() from CodeMoverUtils.
Reviewer: kbarton, jdoerfert, Meinersbur, bmahjour, etiotto
Reviewed By: Meinersbur
Subscribers: hiraditya, llvm-commits
Tag: LLVM
Differential Revision: https://reviews.llvm.org/D70619
2019-11-25 17:54:42 +00:00
Sanjay Patel e85d2e4981 [InstCombine] prevent infinite loop from conflicting shuffle mask transforms
The pattern in question is currently not possible because we
aggressively (wrongly) transform mask elements to undef values
if they choose from an undef operand. That, however, would
change if we tighten our semantics for shuffles as discussed
in D70641. Adding this check gives us the flexibility to make
that change with minimal overhead for current definitions.
2019-11-25 12:00:41 -05:00
Sanjay Patel fc31b58eff [InstCombine] simplify code for shuffle mask canonicalization; NFC
We never use the local 'Mask' before returning, so that was dead code.
2019-11-25 11:11:12 -05:00
Sanjay Patel 847aabf11f [InstCombine] remove dead code from shuffle mask canonicalization; NFC 2019-11-25 10:54:18 -05:00
Sanjay Patel 20684092ab [InstCombine] simplify loop for shuffle mask canonicalization; NFC 2019-11-25 10:41:50 -05:00
OCHyams 2de23c8364 [DebugInfo@O2][Utils] Undef instead of delete dbg.values in helper func
Summary:
Related bug: https://bugs.llvm.org/show_bug.cgi?id=40648

Static helper function rewriteDebugUsers in Local.cpp deletes dbg.value
intrinsics when it cannot move or rewrite them, or salvage the deleted
instruction's value. It should instead undef them in this case.

This patch fixes that and I've added a test which covers the failing test
case in bz40648. I've updated the unit test Local.ReplaceAllDbgUsesWith
to check for this behaviour (and fixed a typo in the test which would
cause the old test to always pass).

Reviewers: aprantl, vsk, djtodoro, probinson

Reviewed By: vsk

Subscribers: hiraditya, llvm-commits

Tags: #debug-info, #llvm

Differential Revision: https://reviews.llvm.org/D70604
2019-11-25 10:55:14 +00:00
Florian Hahn 9d24933f79 Recommit f0c2a5a "[LV] Generalize conditions for sinking instrs for first order recurrences."
This version contains 2 fixes for reported issues:
1. Make sure we do not try to sink terminator instructions.
2. Make sure we bail out, if we try to sink an instruction that needs to
   stay in place for another recurrence.

Original message:
If the recurrence PHI node has a single user, we can sink any
instruction without side effects, given that all users are dominated by
the instruction computing the incoming value of the next iteration
('Previous'). We can sink instructions that may cause traps, because
that only causes the trap to occur later, but not on any new paths.

With the relaxed check, we also have to make sure that we do not have a
direct cycle (meaning PHI user == 'Previous), which indicates a
reduction relation, which potentially gets missed by
ReductionDescriptor.

As follow-ups, we can also sink stores, iff they do not alias with
other instructions we move them across and we could also support sinking
chains of instructions and multiple users of the PHI.

Fixes PR43398.

Reviewers: hsaito, dcaballe, Ayal, rengolin

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D69228
2019-11-24 21:21:55 +00:00
Florian Hahn 9a432161c6 [LoopInterchange] Adjust assertions when updating successors.
Currently the assertion in updateSuccessor is overly strict in some
cases and overly relaxed in other cases. For branches to the inner and
outer loop preheader it is too strict, because they can either be
unconditional branches or conditional branches with duplicate targets.
Both cases are fine and we can allow updating multiple successors.

On the other hand, we have to at least update one successor. This patch
adds such an assertion.
2019-11-24 19:37:16 +00:00
Sanjay Patel f575f12c64 [InstCombine] remove identity shuffle simplification for mask with undefs
And simultaneously enhance SimplifyDemandedVectorElts() to rcognize that
pattern. That preserves some of the old optimizations in IR.

Given a shuffle that includes undef elements in an otherwise identity mask like:

define <4 x float> @shuffle(<4 x float> %arg) {
  %shuf = shufflevector <4 x float> %arg, <4 x float> undef, <4 x i32> <i32 undef, i32 1, i32 2, i32 3>
  ret <4 x float> %shuf
}

We were simplifying that to the input operand.

But as discussed in PR43958:
https://bugs.llvm.org/show_bug.cgi?id=43958
...that means that per-vector-element poison that would be stopped by the shuffle can now
leak to the result.

Also note that we still have (and there are tests for) the same transform with no undef
elements in the mask (a fully-defined identity mask). I don't think there's any
controversy about that case - it's a valid transform under any interpretation of
shufflevector/undef/poison.

Looking at a few of the diffs into codegen, I don't see any difference in final asm. So
depending on your perspective, that's good (no real loss of optimization power) or bad
(poison exists in the DAG, so we only partially fixed the bug).

Differential Revision: https://reviews.llvm.org/D70246
2019-11-24 10:06:26 -05:00
Davide Italiano c32f0ff92f [InstCombine] Fix call guard difference with dbg
Patch by Chris Ye!

Differential Revision: https://reviews.llvm.org/D68004
2019-11-22 13:35:53 -08:00
Tsang Whitney W.H ae8a8c2db6 [CodeMoverUtils] Added an API to check if an instruction can be safely
moved before another instruction.
Summary:Added an API to check if an instruction can be safely moved
before another instruction. In future PRs, we will like to add support
of moving instructions between blocks that are not control flow
equivalent, and add other APIs to enhance usability, e.g. moving basic
blocks, moving list of instructions...
Loop Fusion will be its first user. When there is intervening code in
between two loops, fusion is currently unable to fuse them. Loop Fusion
can use this utility to check if the intervening code can be safely
moved before or after the two loops, and move them, then it can
successfully fuse them.
Reviewer:kbarton,jdoerfert,Meinersbur,bmahjour,etiotto
Reviewed By:bmahjour
Subscribers:mgorny,hiraditya,llvm-commits
Tag:LLVM
Differential Revision:https://reviews.llvm.org/D70049
2019-11-22 21:29:08 +00:00
Anton Afanasyev 80cd6b6e04 [SLP] Enhance SLPVectorizer to vectorize vector aggregate
Summary:
Vector aggregate is homogeneous aggregate of vectors like `{ <2 x float>, <2 x float> }`.
This patch allows `findBuildAggregate()` to consider vector aggregates as
well as scalar ones. For instance, `{ <2 x float>, <2 x float> }` maps to `<4 x float>`.

Fixes vector part of llvm.org/PR42022

Reviewers: RKSimon

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70068
2019-11-22 20:01:59 +03:00
Kazu Hirata a195556628 [JumpThreading] NFC: Don't cache F.hasProfileData()
Summary:
With this patch, we no longer cache F.hasProfileData().  We simply
call the function again.

I'm doing this because:

- JumpThreadingPass also has a member variable named HasProfileData,
  which is very confusing,

- the function is very lightweight, and

- this patch makes JumpThreading::runOnFunction more consistent with
  JumpThreadingPass::run.

Subscribers: hiraditya, jfb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70602
2019-11-22 08:51:14 -08:00
Kazu Hirata 1a58be2ac5 [JumpThreading] Use profile data even with the new pass manager
Summary:
Without this patch, the jump threading pass ignores profiling data
whenever we invoke the pass with the new pass manager.

Specifically, JumpThreadingPass::run calls runImpl with class variable
HasProfileData always set to false.  In turn, runImpl sets
HasProfileData to false again:

  HasProfileData = HasProfileData_;

In the end, we don't use profiling data at all with the new pass
manager.

This patch fixes the problem by passing F.hasProfileData() to runImpl.

The bug appears to have been introduced at:

  https://reviews.llvm.org/D41461

which removed local variable HasProfileData in JumpThreadingPass::run
even though there was one more use left in the same function.  As a
result, the remaining use ended referring to the class variable
instead.

Note that F.hasProfileData is an extremely lightweight function, so I
don't see the need to cache its result.  Once this patch is approved,
I'm planning to stop caching the result of F.hasProfileData in
runOnFunction.

Reviewers: wmi, eli.friedman

Subscribers: hiraditya, jfb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70509
2019-11-22 08:21:48 -08:00
Pankaj Gode 04945c92ce [WIP][Attributor] AAReachability Attribute
Summary: Working towards Johannes's suggestion for fixme, in Attributor's Noalias attribute deduction.
(ii) Check whether the value is captured in the scope using AANoCapture.
FIXME: This is conservative though, it is better to look at CFG and
// check only uses possibly executed before this call site.

A Reachability abstract attribute answers the question "does execution at point A potentially reach point B". If this question is answered with false for all other uses of the value that might be captured, we know it is not *yet* captured and can continue with the noalias deduction. Currently, information AAReachability provides is completely pessimistic.

    Reviewers: jdoerfert

    Reviewed By: jdoerfert

    Subscribers: uenoku, sstefan1, hiraditya, llvm-commits

    Differential Revision: https://reviews.llvm.org/D70233
2019-11-22 18:40:47 +05:30
Pankaj Gode b9a26a80c8 Test commit. 2019-11-22 14:46:43 +05:30
Alina Sbirlea fa09dddd70 [LoopInstSimplify] Move MemorySSA verification under flag.
The verification inside loop passes should be done under the
VerifyMemorySSA flag (enabled by EXPESIVE_CHECKS or explicitly with
opt), in order to not add to compile time during regular builds.
2019-11-21 17:01:24 -08:00
Philip Reames dfb7a9091a [LoopPred] Robustly handle partially unswitched loops
We may end up with a case where we have a widenable branch above the loop, but not all widenable branches within the loop have been removed.  Since a widenable branch inhibit SCEVs ability to reason about exit counts (by design), we have a tradeoff between effectiveness of this optimization and allowing future widening of the branches within the loop.  LoopPred is thought to be one of the most important optimizations for range check elimination, so let's pay the cost.
2019-11-21 15:44:36 -08:00
Philip Reames 8293f74345 Further cleanup manipulation of widenable branches [NFC]
This is a follow on to aaea24802b.  In post commit discussion, Artur and I realized we could cleanup the code using Uses; this patch does so.
2019-11-21 15:07:30 -08:00
Vedant Kumar 844d97f650 Clang-trunk Generates Wrong Debug values with -O1
Bit-Tracking Dead Code Elimination (bdce) do not mark dbg.value as undef after
deleting instruction.  which shows invalid state of variable in debugger.  This
patches fixes this by marking the dbg.value as undef which depends on dead
instruction.

This fixes https://bugs.llvm.org/show_bug.cgi?id=41925

Patch by kamlesh kumar!

Differential Revision: https://reviews.llvm.org/D70040
2019-11-21 13:53:10 -08:00
Kazu Hirata 4f5d931c58 [JumpThreading] Refactor ThreadEdge
Summary:
This patch moves various checks from ThreadEdge to new function
TryThreadEdge The rational behind this is that I'd like to use
ThreadEdge without its checks in my upcoming patch.

This patch preserves lightweight checks as assertions in ThreadEdge.
ThreadEdge does not repeat the cost check, however.

Reviewers: wmi

Subscribers: hiraditya, jfb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70338
2019-11-21 12:38:22 -08:00
Tom Stellard ab411801b8 [cmake] Explicitly mark libraries defined in lib/ as "Component Libraries"
Summary:
Most libraries are defined in the lib/ directory but there are also a
few libraries defined in tools/ e.g. libLLVM, libLTO.  I'm defining
"Component Libraries" as libraries defined in lib/ that may be included in
libLLVM.so.  Explicitly marking the libraries in lib/ as component
libraries allows us to remove some fragile checks that attempt to
differentiate between lib/ libraries and tools/ libraires:

1. In tools/llvm-shlib, because
llvm_map_components_to_libnames(LIB_NAMES "all") returned a list of
all libraries defined in the whole project, there was custom code
needed to filter out libraries defined in tools/, none of which should
be included in libLLVM.so.  This code assumed that any library
defined as static was from lib/ and everything else should be
excluded.

With this change, llvm_map_components_to_libnames(LIB_NAMES, "all")
only returns libraries that have been added to the LLVM_COMPONENT_LIBS
global cmake property, so this custom filtering logic can be removed.
Doing this also fixes the build with BUILD_SHARED_LIBS=ON
and LLVM_BUILD_LLVM_DYLIB=ON.

2. There was some code in llvm_add_library that assumed that
libraries defined in lib/ would not have LLVM_LINK_COMPONENTS or
ARG_LINK_COMPONENTS set.  This is only true because libraries
defined lib lib/ use LLVMBuild.txt and don't set these values.
This code has been fixed now to check if the library has been
explicitly marked as a component library, which should now make it
easier to remove LLVMBuild at some point in the future.

I have tested this patch on Windows, MacOS and Linux with release builds
and the following combinations of CMake options:

- "" (No options)
- -DLLVM_BUILD_LLVM_DYLIB=ON
- -DLLVM_LINK_LLVM_DYLIB=ON
- -DBUILD_SHARED_LIBS=ON
- -DBUILD_SHARED_LIBS=ON -DLLVM_BUILD_LLVM_DYLIB=ON
- -DBUILD_SHARED_LIBS=ON -DLLVM_LINK_LLVM_DYLIB=ON

Reviewers: beanz, smeenai, compnerd, phosek

Reviewed By: beanz

Subscribers: wuzish, jholewinski, arsenm, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, mgorny, mehdi_amini, sbc100, jgravelle-google, hiraditya, aheejin, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, steven_wu, rogfer01, MartinMosbeck, brucehoult, the_o, dexonsmith, PkmX, jocewei, jsji, dang, Jim, lenary, s.egerton, pzheng, sameer.abuasal, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70179
2019-11-21 10:48:08 -08:00
Philip Reames aaea24802b Broaden the definition of a "widenable branch"
As a reminder, a "widenable branch" is the pattern "br i1 (and i1 X, WC()), label %taken, label %untaken" where "WC" is the widenable condition intrinsics. The semantics of such a branch (derived from the semantics of WC) is that a new condition can be added into the condition arbitrarily without violating legality.

Broaden the definition in two ways:
    Allow swapped operands to the br (and X, WC()) form
    Allow widenable branch w/trivial condition (i.e. true) which takes form of br i1 WC()

The former is just general robustness (e.g. for X = non-instruction this is what instcombine produces). The later is specifically important as partial unswitching of a widenable range check produces exactly this form above the loop.

Differential Revision: https://reviews.llvm.org/D70502
2019-11-21 10:46:16 -08:00
Sanjay Patel 4ae0a13256 [InstCombine] add assert in SimplifyDemandedVectorElts and improve readability; NFC 2019-11-21 11:16:36 -05:00
Sjoerd Meijer 901cd3b3f6 [LV] PreferPredicateOverEpilog respecting option
Follow-up of cb47b8783: don't query TTI->preferPredicateOverEpilogue when
option -prefer-predicate-over-epilog is set to false, i.e. when we prefer not
to predicate the loop.

Differential Revision: https://reviews.llvm.org/D70382
2019-11-21 14:06:10 +00:00
Ilya Biryukov aa981c1802 Reland 9f3fdb0d7fab: [Driver] Use VFS to check if sanitizer blacklists exist
With updates to various LLVM tools that use SpecialCastList.

It was tempting to use RealFileSystem as the default, but that makes it
too easy to accidentally forget passing VFS in clang code.
2019-11-21 11:56:09 +01:00
Ilya Biryukov 9f3fdb0d7f Revert "[Driver] Use VFS to check if sanitizer blacklists exist"
This reverts commit ba6f906854.
Commit caused compilation errors on llvm tests. Will fix and re-land.
2019-11-21 11:31:14 +01:00
Ilya Biryukov ba6f906854 [Driver] Use VFS to check if sanitizer blacklists exist
Summary:
This is a follow-up to 590f279c45, which
moved some of the callers to use VFS.

It turned out more code in Driver calls into real filesystem APIs and
also needs an update.

Reviewers: gribozavr2, kadircet

Reviewed By: kadircet

Subscribers: ormris, mgorny, hiraditya, llvm-commits, jkorous, cfe-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D70440
2019-11-21 11:00:30 +01:00
David Stenberg 3889ff82bf [DebugInfo] Refactor DIExpression [SZ]Ext creation into function [NFC]
Summary:
Also, replace the SmallVector with a normal C array.

Reviewers: vsk

Reviewed By: vsk

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70498
2019-11-21 10:44:04 +01:00
James Y Knight e47d6da8a5 D'oh. Fix assert after a84922916e.
(Which was attempting to fix unused variable warning in NDEBUG mode after 8ba56f322a)
2019-11-20 22:22:51 -05:00
James Y Knight a84922916e Fix unused variable warning in NDEBUG mode after 8ba56f322a 2019-11-20 22:05:05 -05:00
Alina Sbirlea 5c5cf899ef [MemorySSA] Moving at the end often means before terminator.
Moving accesses in MemorySSA at InsertionPlace::End, when an instruction is
moved into a block, almost always means insert at the end of the block, but
before the block terminator. This matters when the block terminator is a
MemoryAccess itself (an invoke), and the insertion must be done before
the terminator for the update to be correct.

Insert an additional position: InsertionPlace:BeforeTerminator and update
current usages where this applies.

Resolves PR44027.
2019-11-20 17:11:00 -08:00
Alina Sbirlea da4baa2a6c [MemorySSA] Update analysis when the terminator is a memory instruction.
Update MemorySSA when moving the terminator instruction, as that may be a memory touching instruction.
Resolves PR44029.
2019-11-20 16:36:52 -08:00
Eric Christopher 714aabacfb Temporarily Revert "[SLP] allow forming 2-way reduction patterns" and update testcases.
After speaking with Sanjay - seeing a number of miscompiles and working
on tracking down a testcase. None of the follow on patches seem to
have helped so far.

This reverts commit 8a0aa5310b.
2019-11-20 16:00:53 -08:00
Eric Christopher 8a0aa5310b Temporarily Revert "Temporarily Revert "[SLP] allow forming 2-way reduction patterns""
as there were testcase changes after that need to also be reverted.

This reverts commit cd8748a15f.
2019-11-20 15:39:47 -08:00
Eric Christopher cd8748a15f Temporarily Revert "[SLP] allow forming 2-way reduction patterns"
After speaking with Sanjay - seeing a number of miscompiles and working
on tracking down a testcase. None of the follow on patches seem to
have helped so far.

This reverts commit 7ff57705ba.
2019-11-20 15:19:31 -08:00
Philip Reames 8ba56f322a Move widenable branch formation into makeGuardControlFlowExplicit helper
This is mostly NFC, but I removed the setting of the guard's calling convention onto the WC call.  Why?  Because it was untested, and was producing an ill defined output as the declaration's convention wasn't been changed leaving a mismatch which is UB.
2019-11-20 12:54:05 -08:00
Philip Reames 28a91473e3 [GuardWidening] Remove WidenFrequentBranches transform
This code has never been enabled.  While it is tested, it's complicating some refactoring.  If we decide to re-implement this, doing it in SimplifyCFG would probably make more sense anyways.
2019-11-19 15:15:52 -08:00
Philip Reames 70c68a6b0e [NFC] Factor out utilities for manipulating widenable branches
With the widenable condition construct, we have the ability to reason about branches which can be 'widened' (i.e. made to fail more often).  We've got a couple o transforms which leverage this.  This patch just cleans up the API a bit.

This is prep work for generalizing our definition of a widenable branch slightly.  At the moment "br i1 (and A, wc()), ..." is considered widenable, but oddly, neither "br i1 (and wc(), B), ..." or "br i1 wc(), ..." is.  That clearly needs addressed, so first, let's centralize the code in one place.
2019-11-19 14:43:13 -08:00
Philip Reames f3eb5dee57 [LoopPred] Generalize profitability check to handle unswitch output
Unswitch (and other loop transforms) like to generate loop exit blocks with unconditional successors, and phi nodes (LCSSA, or simple multiple exiting blocks sharing an exit).  Generalize the "likely very rare exit" check slightly to handle this form.
2019-11-19 14:06:36 -08:00
Duncan P. N. Exon Smith 3279724905 llvm/ObjCARC: Eliminate inlined AutoreleaseRV calls
Pair up inlined AutoreleaseRV calls with their matching RetainRV or
ClaimRV.

- RetainRV cancels out AutoreleaseRV.  Delete both instructions.
- ClaimRV is a peephole for RetainRV+Release.  Delete AutoreleaseRV and
  replace ClaimRV with Release.

This avoids problems where more aggressive inlining triggers memory
regressions.

This patch is happy to skip over non-callable instructions and non-ARC
intrinsics looking for the pair.  It is likely sound to also skip over
opaque function calls, but that's harder to reason about, and it's not
relevant to the goal here: if there's an opaque function call splitting
up a pair, it's very unlikely that a handshake would have happened
dynamically without inlining.

Note that this patch also subsumes the previous logic that looked
backwards from ReleaseRV.

https://reviews.llvm.org/D70370
rdar://problem/46509586
2019-11-19 12:02:01 -08:00
Sanjay Patel 0a8e7ca402 [SLP] fix miscompile on min/max reductions with extra uses (PR43948) (2nd try)
The 1st attempt was reverted because it revealed an existing
bug where we could produce invalid IR (use of value before
definition). That should be fixed with:
rG39de82ecc9c2

The bug manifests as replacing a reduction operand with an undef
value.

The problem appears to be limited to cases where a min/max reduction
has extra uses of the compare operand to the select.

In the general case, we are tracking "ExternallyUsedValues" and
an "IgnoreList" of the reduction operations, but those may not apply
to the final compare+select in a min/max reduction.

For that, we use replaceAllUsesWith (RAUW) to ensure that the new
vectorized reduction values are transferred to all subsequent users.

Differential Revision: https://reviews.llvm.org/D70148
2019-11-19 14:57:35 -05:00
Sanjay Patel 39de82ecc9 [SLP] fix insertion point for min/max reduction
As discussed in D70148 (and caused a revert of the original commit):
if we insert at the select, then we can produce invalid IR because
the replacement for the compare may have uses before the select.
2019-11-19 10:50:10 -05:00
evgeny 4ef9315c4b [ThinLTO] Make ValueInfo::operator bool() explicit
Differential revision: https://reviews.llvm.org/D70383
2019-11-19 12:46:09 +03:00
Evgeniy Brevnov 4a64d710ae [NFC] Test commit. Please ignore.
As a test commit I fixed a misspelling in one of comments in SLP
vectorizer.
2019-11-19 15:41:57 +07:00
Eric Christopher 6f1cc4151a Temporarily revert "[SLP] fix miscompile on min/max reductions with extra uses (PR43948)"
as it causes an ICE on valid. A testcase was followed up on the original thread.

This reverts commit a3e61946c5.
2019-11-18 14:41:37 -08:00
Teresa Johnson cc1b0bc24d [ThinLTO] Avoid extra index lookup during promotion
Summary:
Pass down the already accessed ValueInfo to shouldPromoteLocalToGlobal,
to avoid an unnecessary extra index lookup.

Add some assertion checking to confirm we have a non-empty VI when
expected.

Also some misc cleanup, merging the two versions of
doImportAsDefinition, since one was only called by the other, and
unnecessarily passed in a member variable.

Reviewers: steven_wu, pcc, evgeny777

Reviewed By: evgeny777

Subscribers: mehdi_amini, inglorion, hiraditya, dexonsmith, arphaman, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70337
2019-11-18 12:55:53 -08:00
Teresa Johnson 3be6dbca3b [ThinLTO] Promotion handling cleanup (NFC)
Summary:
Clean up the code that does GV promotion in the ThinLTO backends.

Specifically, we don't need to check whether we are importing since that
is already checked and handled correctly in shouldPromoteLocalToGlobal.
Simply call shouldPromoteLocalToGlobal, and if it returns true we are
guaranteed that we are promoting, whether or not we are importing (or in
the exporting module). This also makes the handling in getName()
consistent with that in getLinkage(), which checks the DoPromote parameter
regardless of whether we are importing or exporting.

Reviewers: steven_wu, pcc, evgeny777

Subscribers: mehdi_amini, inglorion, hiraditya, dexonsmith, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70327
2019-11-18 11:59:36 -08:00
Philip Reames ad5a84c883 [LoopPred/WC] Use a dominating widenable condition to remove analyze loop exits
This implements a version of the predicateLoopExits transform from IndVarSimplify extended to exploit widenable conditions - and thus be much wider in scope of legality. The code structure ends up being almost entirely different, so I chose to duplicate this into the LoopPredication pass instead of trying to reuse the code in the IndVars.

The core notions of the transform are as follows:

    If we have a widenable condition which controls entry into the loop, we're allowed to widen it arbitrarily. Given that, it's simply a *profitability* question as to what conditions to fold into the widenable branch.
    To avoid pass ordering issues, we want to avoid widening cases that would otherwise be dischargeable. Or... widen in a form which can still be discharged. Thus, we phrase the transform as selecting one analyzeable exit from the set of analyzeable exits to keep. This avoids creating pass ordering complexities.
    Since none of the above proves that we actually exit through our analyzeable exits - we might exit through something else entirely - we limit ourselves to cases where a) the latch is analyzeable and b) the latch is predicted taken, and c) the exit being removed is statically cold.

Differential Revision: https://reviews.llvm.org/D69830
2019-11-18 11:23:29 -08:00
Simon Tatham f4f77aa53e [ARM,MVE] Add InstCombine rules for pred_i2v / pred_v2i.
If you're writing C code using the ACLE MVE intrinsics that passes the
result of a vcmp as input to a predicated intrinsic, e.g.

  mve_pred16_t pred = vcmpeqq(v1, v2);
  v_out = vaddq_m(v_inactive, v3, v4, pred);

then clang's codegen for the compare intrinsic will create calls to
`@llvm.arm.mve.pred.v2i` to convert the output of `icmp` into an
`mve_pred16_t` integer representation, and then the next intrinsic
will call `@llvm.arm.mve.pred.i2v` to convert it straight back again.
This will be visible in the generated code as a `vmrs`/`vmsr` pair
that move the predicate value pointlessly out of `p0` and back into it again.

To prevent that, I've added InstCombine rules to remove round trips of
the form `v2i(i2v(x))` and `i2v(v2i(x))`. Also I've taught InstCombine
about the known and demanded bits of those intrinsics. As a result,
you now get just the generated code you wanted:

  vpt.u16 eq, q1, q2
  vaddt.u16 q0, q3, q4

Reviewers: ostannard, MarkMurrayARM, dmgreen

Reviewed By: dmgreen

Subscribers: kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70313
2019-11-18 10:39:30 +00:00
Duncan P. N. Exon Smith 783cb86b61 llvm/ObjCARC: Split OptimizeIndividualCallImpl out of OptimizeIndividualCalls, NFC
Split out a helper function for the individual call optimizations and
skip useless calls to it (where the instruction is not an ARC
intrinsic).  Besides reducing indentation (and possibly speeding up
compile time in some small way), an upcoming patch will add additional
calls and expand out the `switch`.
2019-11-17 21:54:27 -08:00
Duncan P. N. Exon Smith a937a588dd llvm/ObjCARC: Use continue to reduce some nesting, NFC 2019-11-17 18:22:35 -08:00
Sanjay Patel 5d67d81f48 [InstCombine] prevent crashing/assert on shift constant expression (PR44028)
The binary operator cast implies an instruction, but the matcher for shift does not:
https://bugs.llvm.org/show_bug.cgi?id=44028
2019-11-17 17:31:09 -05:00
Stefan Stipanovic a516fbac52 [Attributor] Use nofree argument attribute for heap-to-stack conversion
Reviewers: jdoerfert, uenoku

Subscribers:

Differential Revision: https://reviews.llvm.org/D70140
2019-11-17 21:35:04 +01:00
Sanjay Patel ebf9bf2cbc [SimplifyCFG] propagate fast-math-flags (FMF) from phi to select
Similar to/extension of D70208 (rGee0882bdf866), but this one
may finally allow closing motivating bugs.

This is another step towards having FMF apply only to FP values
rather than those + fcmp. See PR38086 for one of the original
discussions/motivations:
https://bugs.llvm.org/show_bug.cgi?id=38086

And the test here is derived from PR39535:
https://bugs.llvm.org/show_bug.cgi?id=39535

Currently, we lose FMF when converting any phi to select in
SimplifyCFG. There are a small number of similar changes needed
to correct within SimplifyCFG, so it should be quick to patch
this pass up.

FMF was extended to select and phi with:
D61917
D67564
2019-11-17 11:23:44 -05:00
David Green 08390c52a2 [InstCombine] Canonicalize ssub.with.overflow with clamp to ssub.sat
Working on top of D69252, this adds canonicalisation patterns for ssub.with.overflow to ssub.sats.

Differential Revision: https://reviews.llvm.org/D69753
2019-11-17 10:45:11 +00:00
David Green 03fce6b12e [InstCombine] Canonicalize sadd.with.overflow with clamp to sadd.sat
This adds to D69245, adding extra signed patterns for folding from a
sadd_with_overflow to a sadd_sat. These are more complex than the
unsigned patterns, as the overflow can occur in either direction.

For the add case, the positive overflow can only occur if both of the
values are positive (same for both the values being negative). So there
is an extra select on whether to use the positive or negative overflow
limit.

Differential Revision: https://reviews.llvm.org/D69252
2019-11-17 10:42:39 +00:00
Reid Kleckner 631be5c0d4 Remove Support/Options.h, it is unused
It was added in 2014 in 732e0aa9fb with one use in Scalarizer.cpp.
That one use was then removed when porting to the new pass manager in
2018 in b6f76002d9.

While the RFC and the desire to get off of static initializers for
cl::opt all still stand, this code is now dead, and I think we should
delete this code until someone is ready to do the migration.

There were many clients of CommandLine.h that were it transitively
through LLVMContext.h, so I cleaned that up in 4c1a1d3cf9.

Reviewers: beanz

Differential Revision: https://reviews.llvm.org/D70280
2019-11-15 13:32:52 -08:00
Sanjay Patel ee0882bdf8 [SimplifyCFG] propagate fast-math-flags (FMF) from phi to select
This is another step towards having FMF apply only to FP values
rather than those + fcmp. See PR38086 for one of the original
discussions/motivations:
https://bugs.llvm.org/show_bug.cgi?id=38086

And the test here is derived from PR39535:
https://bugs.llvm.org/show_bug.cgi?id=39535

Currently, we lose FMF when converting any phi to select in
SimplifyCFG. There are a small number of similar changes needed
to correct within SimplifyCFG, so it should be quick to patch
this pass up.

FMF was extended to select and phi with:
D61917
D67564

Differential Revision: https://reviews.llvm.org/D70208
2019-11-15 16:14:35 -05:00
Richard Smith 7889d8e7eb Revert "[LoadStoreVectorize] Use '||' instead of '|' between sides with function calls. NFCI."
This broke two tests. Presumably the non-short-circuting '|' was
intentional here.

This reverts commit f7efea0ded.
2019-11-15 12:49:35 -08:00
Alexandre Ganea 478ad94c8e [GCOV] Skip artificial functions from being emitted
This is a patch to support  D66328, which was reverted until this lands.

Enable a compiler-rt test that used to fail previously with D66328.

Differential Revision: https://reviews.llvm.org/D67283
2019-11-15 14:23:11 -05:00
Francesco Petrogalli d6de5f12d4 [SVFS] Inject TLI Mappings in VFABI attribute.
This patch introduces a function pass to inject the scalar-to-vector
mappings stored in the TargetLIbraryInfo (TLI) into the Vector
Function ABI (VFABI) variants attribute.

The test is testing the injection for three vector libraries supported
by the TLI (Accelerate, SVML, MASSV).

The pass does not change any of the analysis associated to the
function.

Differential Revision: https://reviews.llvm.org/D70107
2019-11-15 18:42:56 +00:00
Fangrui Song 8bcd01f48a [ThinLTO] Fix -Wunused-function in NDEBUG builds after llvmorg-10-init-9933-g3d708bf5c26 2019-11-15 10:00:23 -08:00
Dávid Bolvanský f7efea0ded [LoadStoreVectorize] Use '||' instead of '|' between sides with function calls. NFCI.
Fixes warning from PVS Studio
2019-11-15 18:51:13 +01:00
evgeny 3d708bf5c2 Recommit "[ThinLTO] Add correctness check for RO/WO variable import"
ValueInfo has user-defined 'operator bool' which allows incorrect implicit conversion
to GlobalValue::GUID (which is unsigned long). This causes bugs which are hard to
track and should be removed in future.
2019-11-15 16:13:19 +03:00
Mikael Holmen 1587c7e86f [Scalarizer] Treat values from unreachable blocks as undef
Summary:
When scalarizing PHI nodes we might try to examine/rewrite
InsertElement nodes in predecessors. If those predecessors
are unreachable from entry, then the IR in those blocks could
have unexpected properties resulting in infinite loops in
Scatterer::operator[].
By simply treating values originating from instructions in
unreachable blocks as undef we do not need to analyse them
further.

This fixes PR41723.

Reviewers: bjope

Reviewed By: bjope

Subscribers: bjope, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70171
2019-11-15 11:13:37 +01:00
Francis Visoiu Mistrih a4c76be506 [InstCombine] Don't use getFirstNonPHI in FoldIntegerTypedPHI
getFirstNonPHI iterates over all the instructions in a block until it
finds a non-PHI.

Then, the loop starts from the beginning of the block and goes through
all the instructions until it reaches the instruction found by
getFirstNonPHI.

Instead of doing that, just stop when a non-PHI is found.

This reduces the compile-time of a test case discussed in
https://reviews.llvm.org/D47023 by 13x.

Not entirely sure how to come up with a test case for this since it's a
compile time issue that would significantly slow down running the tests.

Differential Revision: https://reviews.llvm.org/D70016
2019-11-14 17:52:01 -08:00
Reid Kleckner 4c1a1d3cf9 Add missing includes needed to prune LLVMContext.h include, NFC
These are a pre-requisite to removing #include "llvm/Support/Options.h"
from LLVMContext.h: https://reviews.llvm.org/D70280
2019-11-14 15:23:15 -08:00
Alexey Bataev bfa32573bf Revert "Temporarily Revert:"
This reverts commit e511c4b0dff1692c267addf17dce3cebe8f97faa:

    Temporarily Revert:

     "[SLP] Generalization of stores vectorization."
     "[SLP] Fix -Wunused-variable. NFC"
     "[SLP] Vectorize jumbled stores."

after fixing the problem with compile time.
2019-11-14 16:38:20 -05:00
Sanjay Patel 385572ccfe [InstCombine] remove duplicate code for simplifying a shuffle; NFCI
The transform is already handled by InstSimplify or earlier
in InstCombine, so trying to do it again is not necessary.
2019-11-14 13:12:25 -05:00
Benjamin Kramer 360f661733 Revert "[ThinLTO] Add correctness check for RO/WO variable import"
This reverts commit a2292cc537. Breaks
clang selfhost w/ThinLTO.
2019-11-14 16:07:13 +01:00
Simon Pilgrim edfc94e296 GCOVProfiling - fix uninitialized variable warnings + make getFuncChecksum() const. NFCI. 2019-11-14 14:21:18 +00:00
Simon Pilgrim 39c0829a55 WholeProgramDevirt - fix uninitialized variable warnings. NFCI. 2019-11-14 14:21:18 +00:00
Simon Pilgrim 8c09e472d5 Fix uninitialized variable warning. NFCI. 2019-11-14 14:21:17 +00:00
Simon Pilgrim ba229113a9 SROA - fix uninitialized variable warnings. NFCI. 2019-11-14 14:21:17 +00:00
Sjoerd Meijer cb47b87830 [LV] PreferPredicateOverEpilog respecting predicate loop hint
The vectoriser queries TTI->preferPredicateOverEpilogue to determine if
tail-folding is preferred for a loop, but it was not respecting loop hint
'predicate' that can disable this, which has now been added. This showed that
we were incorrectly initialising loop hint 'vectorize.predicate.enable' with 0
(i.e. FK_Disabled) but this should have been FK_Undefined, which has been
fixed.

Differential Revision: https://reviews.llvm.org/D70125
2019-11-14 13:10:44 +00:00
Daniil Suchkov 4c9d0da838 Revert "[InstCombine] Fold PHIs with equal incoming pointers"
This reverts commit a2f6ae9abf.
It is reverted due to clang-cmake-armv7-selfhost buildbot failure.
2019-11-14 17:42:01 +07:00
Daniil Suchkov a2f6ae9abf [InstCombine] Fold PHIs with equal incoming pointers
This is a resubmission of bbb29738b5 that
was reverted due to clang tests failures. It includes the fix and
additional IR tests for the missed case.

Summary:
In case when all incoming values of a PHI are equal pointers, this
transformation inserts a definition of such a pointer right after
definition of the base pointer and replaces with this value both PHI and
all it's incoming pointers. Primary goal of this transformation is
canonicalization of this pattern in order to enable optimizations that
can't handle PHIs. Non-inbounds pointers aren't currently supported.

Reviewers: spatel, RKSimon, lebedev.ri, apilipenko

Reviewed By: apilipenko

Tags: #llvm

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D68128
2019-11-14 17:04:32 +07:00
evgeny a2292cc537 [ThinLTO] Add correctness check for RO/WO variable import
This patch adds an assertion check for exported read/write-only
variables to be also in import list for module. If they aren't
we may face linker errors, because read/write-only variables are
internalized in their source modules. The patch also changes
export lists to store ValueInfo instead of GUID for performance
considerations.

Differential revision: https://reviews.llvm.org/D70128
2019-11-14 12:24:05 +03:00
Dimitry Andric 3db6783d8a Check result of emitStrLen before passing it to CreateGEP
Summary:
This fixes PR43081, where the transformation of `strchr(p, 0) -> p +
strlen(p)` can cause a segfault, if `-fno-builtin-strlen` is used.  In
that case, `emitStrLen` returns nullptr, which CreateGEP is not designed
to handle.  Also add the minimized code from the PR as a test case.

Reviewers: xbolva00, spatel, jdoerfert, efriedma

Reviewed By: efriedma

Subscribers: lebedev.ri, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D70143
2019-11-14 08:04:36 +01:00
Reid Kleckner 05da2fe521 Sink all InitializePasses.h includes
This file lists every pass in LLVM, and is included by Pass.h, which is
very popular. Every time we add, remove, or rename a pass in LLVM, it
caused lots of recompilation.

I found this fact by looking at this table, which is sorted by the
number of times a file was changed over the last 100,000 git commits
multiplied by the number of object files that depend on it in the
current checkout:
  recompiles    touches affected_files  header
  342380        95      3604    llvm/include/llvm/ADT/STLExtras.h
  314730        234     1345    llvm/include/llvm/InitializePasses.h
  307036        118     2602    llvm/include/llvm/ADT/APInt.h
  213049        59      3611    llvm/include/llvm/Support/MathExtras.h
  170422        47      3626    llvm/include/llvm/Support/Compiler.h
  162225        45      3605    llvm/include/llvm/ADT/Optional.h
  158319        63      2513    llvm/include/llvm/ADT/Triple.h
  140322        39      3598    llvm/include/llvm/ADT/StringRef.h
  137647        59      2333    llvm/include/llvm/Support/Error.h
  131619        73      1803    llvm/include/llvm/Support/FileSystem.h

Before this change, touching InitializePasses.h would cause 1345 files
to recompile. After this change, touching it only causes 550 compiles in
an incremental rebuild.

Reviewers: bkramer, asbirlea, bollu, jdoerfert

Differential Revision: https://reviews.llvm.org/D70211
2019-11-13 16:34:37 -08:00
Hiroshi Yamauchi 3f0969daf9 [PGO][PGSO] Temporarily disable the large working set size behavior.
Summary:
This temporarily disables the large working set size behavior in profile guided
size optimization due to internal benchmark regressions.

Reviewers: davidxl

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70207
2019-11-13 14:00:47 -08:00
Sanjay Patel a3e61946c5 [SLP] fix miscompile on min/max reductions with extra uses (PR43948)
The bug manifests as replacing a reduction operand with an undef
value.

The problem appears to be limited to cases where a min/max reduction
has extra uses of the compare operand to the select.

In the general case, we are tracking "ExternallyUsedValues" and
an "IgnoreList" of the reduction operations, but those may not apply
to the final compare+select in a min/max reduction.

For that, we use replaceAllUsesWith (RAUW) to ensure that the new
vectorized reduction values are transferred to all subsequent users.

Differential Revision: https://reviews.llvm.org/D70148
2019-11-13 15:57:35 -05:00
Sanjay Patel e9bf7a60a0 [SLP] reduce code duplication for min/max vs. other reductions; NFCI 2019-11-13 11:26:08 -05:00
Sanjay Patel 3d6b53980c [InstCombine] propagate fast-math-flags (FMF) to select when inverting fcmp+select
As noted by the FIXME comment, this is not correct based on our current FMF semantics.
We should be propagating FMF from the final value in a sequence (in this case the
'select'). So the behavior even without this patch is wrong, but we did not allow FMF
on 'select' until recently.

But if we do the correct thing right now in this patch, we'll inevitably introduce
regressions because we have not wired up FMF propagation for 'phi' and 'select' in
other passes (like SimplifyCFG) or other places in InstCombine. I'm not seeing a
better incremental way to make progress.

That said, the potential extra damage over the existing wrong behavior from this
patch is very limited. AFAIK, the only way to have different FMF on IR in the same
function is if we have LTO inlined IR from 2 modules that were compiled using
different fast-math settings.

As seen in the tests, we may actually see some improvements with this patch because
adding the FMF to the 'select' allows matching to min/max intrinsics that were
previously missed (in the common case, the 'fcmp' and 'select' should have identical
FMF to begin with).

Next steps in the transition:

    Make similar changes in instcombine as needed.
    Enable phi-to-select FMF propagation in SimplifyCFG.
    Remove dependencies on fcmp with FMF.
    Deprecate FMF on fcmp.

Differential Revision: https://reviews.llvm.org/D69720
2019-11-13 10:38:42 -05:00
Simon Pilgrim d1bd5e476b SLPVectorizer - make comparison operators + isInSchedulingRegion const
Fixes cppcheck warnings.
2019-11-13 14:40:19 +00:00
Florian Hahn f7499011ca [InstCombine] Avoid moving ops that do restrict undef across shuffles.
I think we have to be a bit more careful when it comes to moving
ops across shuffles, if the op does restrict undef. For example, without
this patch, we would move 'and %v, <0, 0, -1, -1>' over a
'shufflevector %a, undef, <undef, undef, 1, 2>'. As a result, the first
2 lanes of the result are undef after the combine, but they really
should be 0, unless I am missing something.

For ops that do fold to undef on undef operands, the current behavior
should be fine. I've add conservative check OpDoesRestrictUndef, maybe
there's a better existing utility?

Reviewers: spatel, RKSimon, lebedev.ri

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D70093
2019-11-13 13:40:34 +00:00
Daniil Suchkov cba4a27745 Temporarily revert "[InstCombine] Fold PHIs with equal incoming pointers"
Revert due to sanitizer-windows buildbot failure.

This reverts commit bbb29738b5.
2019-11-13 17:14:11 +07:00
Daniil Suchkov bbb29738b5 [InstCombine] Fold PHIs with equal incoming pointers
In case when all incoming values of a PHI are equal pointers, this
transformation inserts a definition of such a pointer right after
definition of the base pointer and replaces with this value both PHI and
all it's incoming pointers. Primary goal of this transformation is
canonicalization of this pattern in order to enable optimizations that
can't handle PHIs. Non-inbounds pointers aren't currently supported.

Reviewers: spatel, RKSimon, lebedev.ri, apilipenko

Reviewed By: apilipenko

Tags: #llvm

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D68128
2019-11-13 17:00:34 +07:00
Alina Sbirlea 4ae74cc99f [GVNHoist] Preserve AAResults.
Resolves PR38906, PR40898.
2019-11-12 14:10:04 -08:00
Tom Weaver 41c3f76dcd [DBG][OPT] Attempt to salvage or undef debug info when removing trivially deletable instructions in the Reassociate Expression pass.
Reviewed By: aprantl, vsk

Differential revision: https://reviews.llvm.org/D69943
2019-11-12 15:17:04 +00:00
Diana Picus 7f1dcc8952 [InstCombine] Skip scalable vectors in combineLoadToOperationType
Don't try to canonicalize loads to scalable vector types to loads
of integers.

This removes one assertion when trying to use a TypeSize as a parameter
to DataLayout::isLegalInteger. It does not handle the second part of the
function (which looks at bitcasts).

This patch also contains a NFC fix for Load Analysis, where a variable
initialization that would cause the same assertion is moved closer to
its use. This allows us to run the new test for InstCombine without
having to teach LocationSize to play nicely with scalable vectors.

Differential Revision: https://reviews.llvm.org/D70075
2019-11-12 12:27:09 +01:00
Florian Hahn 1ee93240c0 [LoopInterchange] Only skip PHIs with incoming values from the inner loop.
Currently we have limited support for outer loops with multiple basic
blocks after the inner loop exit. But the current checks for creating
PHIs for loop exit values only assumes the header and latches of the
outer loop. It is better to just skip incoming values defined in the
original inner loops. Those are handled earlier.

Reviewers: efriedma, mcrosier

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D70059
2019-11-12 10:30:51 +00:00
Hideto Ueno 88b04ef832 [Attributor] Use must-be-executed-context in align deduction
Summary:
This patch introduces align attribute deduction for callsite argument, function argument, function returned and floating value based on must-be-executed-context.

Reviewers: jdoerfert, sstefan1

Reviewed By: jdoerfert

Subscribers: hiraditya, jfb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69797
2019-11-12 06:41:19 +00:00
Vasileios Porpodas 6a18a95487 [SLP] Look-ahead operand reordering heuristic.
Summary: This patch introduces a new heuristic for guiding operand reordering. The new "look-ahead" heuristic can look beyond the immediate predecessors. This helps break ties when the immediate predecessors have identical opcodes (see lit test for examples).

Reviewers: RKSimon, ABataev, dtemirbulatov, Ayal, hfinkel, rnk

Reviewed By: RKSimon, dtemirbulatov

Subscribers: xbolva00, Carrot, hiraditya, phosek, rnk, rcorcs, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60897
2019-11-11 21:06:51 -08:00
Francesco Petrogalli e9a06e0606 [VFABI] Read/Write functions for the VFABI attribute.
The attribute is stored at the `FunctionIndex` attribute set, with the
name "vector-function-abi-variant".

The get/set methods of the attribute have assertion to verify that:

1. Each name in the attribute is a valid VFABI mangled name.

2. Each name in the attribute correspond to a function declared in the
   module.

Differential Revision: https://reviews.llvm.org/D69976
2019-11-12 03:40:42 +00:00
aqjune 4187cb138b Add InstCombine/InstructionSimplify support for Freeze Instruction
Summary:
- Add llvm::SimplifyFreezeInst
- Add InstCombiner::visitFreeze
- Add llvm tests

Reviewers: majnemer, sanjoy, reames, lebedev.ri, spatel

Reviewed By: reames, lebedev.ri

Subscribers: reames, lebedev.ri, filcab, regehr, trentxintong, llvm-commits

Differential Revision: https://reviews.llvm.org/D29013
2019-11-12 12:13:26 +09:00
Sanjay Patel 29f5d1670c Revert "[InstCombine] avoid crash from deleting an instruction that still has uses (PR43723) (3rd try)"
This reverts commit 3db8a3ef86.
This caused a different memory-sanitizer failure than earlier attempts,
but it's still not right.
2019-11-11 09:56:03 -05:00
Sanjay Patel 3db8a3ef86 [InstCombine] avoid crash from deleting an instruction that still has uses (PR43723) (3rd try)
Re-try because earlier attempts were reverted due to use-after-free.
Hopefully, diagnosed correctly this time - we replace/remove the
invariant.start first rather than the invariant.end to avoid angering
worklist-based iteration.

We gather a set of white-listed instructions in isAllocSiteRemovable() and then
replace/erase them. But we don't know in general if the instructions in the set
have uses amongst themselves, so order of deletion makes a difference.

There's already a special-case for the llvm.objectsize intrinsic, so add another
for llvm.invariant.start.

Should fix:
https://bugs.llvm.org/show_bug.cgi?id=43723

Differential Revision: https://reviews.llvm.org/D69977
2019-11-11 09:29:40 -05:00
Tom Weaver 9f48a160dd Revert "[DBG][OPT] Attempt to salvage or undef debug info when removing trivially deletable instructions in the Reassociate Expression pass."
This reverts commit 1984a27db5.
2019-11-11 14:13:33 +00:00
Tom Weaver 1984a27db5 [DBG][OPT] Attempt to salvage or undef debug info when removing trivially deletable instructions in the Reassociate Expression pass.
Reviewed By: aprantl, vsk

Differential revision: https://reviews.llvm.org/D69943
2019-11-11 13:47:13 +00:00
Tom Weaver 75af15d81e [NFC][TEST_COMMIT] Add fullstop to comment. 2019-11-11 13:38:34 +00:00
Jay Foad 9323ef4ecc [InstCombine] Simplify binary op when only one operand is a select
Summary:
SimplifySelectsFeedingBinaryOp simplified binary ops when both operands
were selects with the same condition. This patch extends it to handle
these cases where only one operand is a select:

X op (C ? P : Q) -> C ? (X op P) : (X op Q)
  // if X op P and X op Q both simplify
(C ? P : Q) op Y -> C ? (P op Y) : (Q op Y)
  // if P op Y and Q op Y both simplify

For example: X *fast (C ? 1.0 : 0.0) -> C ? X : 0.0

Reviewers: mcberg2017, majnemer, craig.topper, qcolombet, mcrosier

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64713
2019-11-11 10:01:59 +00:00
Craig Topper aafde063aa [InstCombine] Turn (extractelement <1 x i64/double> (bitcast (x86_mmx))) into a single bitcast from x86_mmx to i64/double.
The _m64 type is represented in IR as <1 x i64>. The x86-64 ABI
on Linux passes <1 x i64> as a double. MMX intrinsics use x86_mmx
type in IR.These things result in a lot of bitcasts in mmx code.
There's another instcombine that tries to turn bitcast <1 x i64>
to double into extractelement and a bitcast.

The combine here tries to reverse this extractelement conversion
if we see an mmx type.
2019-11-10 16:25:25 -08:00
Sanjay Patel d115b9fd4a Revert "[InstCombine] avoid crash from deleting an instruction that still has uses (PR43723) (2nd try)"
This reverts commit 56b2aee187.
Still causes a use-after-free on sanitizer bots.
2019-11-10 18:47:49 -05:00
Sanjay Patel 56b2aee187 [InstCombine] avoid crash from deleting an instruction that still has uses (PR43723) (2nd try)
Re-try rGef02831f0a4e (reverted due to use-after-free), but bail out completely
if we encounter an unexpected llvm.invariant.start.

We gather a set of white-listed instructions in isAllocSiteRemovable() and then
replace/erase them. But we don't know in general if the instructions in the set
have uses amongst themselves, so order of deletion makes a difference.

There's already a special-case for the llvm.objectsize intrinsic, so add another
for llvm.invariant.end.

Should fix:
https://bugs.llvm.org/show_bug.cgi?id=43723

Differential Revision: https://reviews.llvm.org/D69977
2019-11-10 17:26:36 -05:00
Stefan Stipanovic c250ebf7bc getArgOperandNo helper function.
Summary: A helper function to get argument number of a arg operand Use.

Reviewers: jdoerfert, uenoku

Subscribers: hiraditya, lebedev.ri, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66844
2019-11-10 21:45:11 +01:00
Sanjay Patel b0ac26a632 Revert "[InstCombine] avoid crash from deleting an instruction that still has uses (PR43723)"
This reverts commit ef02831f0a.
Sanitizer bots fail with this change.
2019-11-10 11:18:05 -05:00
Sanjay Patel ef02831f0a [InstCombine] avoid crash from deleting an instruction that still has uses (PR43723)
We gather a set of white-listed instructions in isAllocSiteRemovable() and then
replace/erase them. But we don't know in general if the instructions in the set
have uses amongst themselves, so order of deletion makes a difference.

There's already a special-case for the llvm.objectsize intrinsic, so add another
for llvm.invariant.end.

Should fix:
https://bugs.llvm.org/show_bug.cgi?id=43723

Differential Revision: https://reviews.llvm.org/D69977
2019-11-10 09:18:11 -05:00
Simon Pilgrim 4ff246fef2 Remove unused variable (which allows us to remove vector include). NFC. 2019-11-10 12:16:23 +00:00
Gil Rapaport 7f152543e4 [LV] Apply sink-after & interleave-groups as VPlan transformations (NFCI)
This recommits 11ed1c0239 (reverted in
9f08ce0d21 for failing an assert) with a fix:
tryToWidenMemory() now first checks if the widening decision is to interleave,
thus maintaining previous behavior where tryToInterleaveMemory() was called
first, giving priority to interleave decisions over widening/scalarization. This
commit adds the test case that exposed this bug as a LIT.
2019-11-09 20:52:25 +02:00
Jay Foad d162e02cee Refactor SimplifySelectsFeedingBinaryOp for D64713. NFC. 2019-11-09 09:28:22 +00:00
Teresa Johnson b11391bb47 ThinLTO : Import always_inline functions irrespective of the threshold
Summary: A user can force a function to be inlined by specifying the always_inline attribute. Currently, thinlto implementation is not aware of always_inline functions and does not guarantee import of such functions, which in turn can prevent inlining of such functions.

Patch by Bharathi Seshadri <bseshadr@cisco.com>

Reviewers: tejohnson

Reviewed By: tejohnson

Subscribers: mehdi_amini, inglorion, hiraditya, steven_wu, dexonsmith, arphaman, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70014
2019-11-08 17:02:01 -08:00
Gil Rapaport 9f08ce0d21 Revert "[LV] Apply sink-after & interleave-groups as VPlan transformations (NFCI)"
This reverts commit 11ed1c0239 - causes an assert failure.
2019-11-08 22:17:11 +02:00
evgeny 7f92d66f37 [ThinLTO] Fix bug when importing writeonly variables
Patch enables import of write-only variables with non-trivial initializers
to fix linker errors. Initializers of imported variables are converted to
'zeroinitializer' to avoid promotion of referenced objects.

Differential revision: https://reviews.llvm.org/D70006
2019-11-08 20:50:34 +03:00
Kazu Hirata 9aff5e1c18 [JumpThreading] Fix a comment typo (NFC)
Reviewers: kazu

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70013
2019-11-08 09:29:46 -08:00
Philip Reames 8d22100f66 [LICM] Support hosting of dynamic allocas out of loops
This patch implements a correct, but not terribly useful, transform. In particular, if we have a dynamic alloca in a loop which is guaranteed to execute, and provably not captured, we hoist the alloca out of the loop. The capture tracking is needed so that we can prove that each previous stack region dies before the next one is allocated. The transform decreases the amount of stack allocation needed by a linear factor (e.g. the iteration count of the loop).

Now, I really hope no one is actually using dynamic allocas. As such, why this patch?

Well, the actual problem I'm hoping to make progress on is allocation hoisting. There's a large draft patch out for review (https://reviews.llvm.org/D60056), and this patch was the smallest chunk of testable functionality I could come up with which takes a step vaguely in that direction.

Once this is in, it makes motivating the changes to capture tracking mentioned in TODOs testable. After that, I hope to extend this to trivial malloc free regions (i.e. free dominating all loop exits) and allocation functions for GCed languages.

Differential Revision: https://reviews.llvm.org/D69227
2019-11-08 08:19:48 -08:00
Philip Reames 787dba7aae [LICM] Hoisting of widenable conditions out of loops
The change itself is straight forward and obvious, but ... there's an existing test checking for exactly the opposite. Both I and Artur think this is simply conservatism in the initial implementation.  If anyone bisects a problem to this, a counter example will be very interesting.

Differential Revision: https://reviews.llvm.org/D69907
2019-11-08 08:19:48 -08:00
Gil Rapaport 11ed1c0239 [LV] Apply sink-after & interleave-groups as VPlan transformations (NFCI)
This recommits 100e797adb (reverted in
009e032634 for failing an assert). While the
root cause was independently reverted in eaff300401,
this commit includes a LIT to make sure IVDescriptor's SinkAfter logic does not
try to sink branch instructions.
2019-11-08 15:25:14 +02:00
Daniil Suchkov 7b9f5401a6 [NFC][IndVarS] Adjust a comment
(test commit)
2019-11-08 14:51:36 +07:00
Craig Topper 6749dc3446 [InstCombine] Don't transform bitcasts between x86_mmx and v1i64 into insertelement/extractelement
x86_mmx is conceptually a vector already. Don't introduce an extra conversion between it and scalar i64.

I'm using VectorType::isValidElementType which checks for floating point, integer, and pointers to hopefully make this more readable than just blacklisting x86_mmx.

Differential Revision: https://reviews.llvm.org/D69964
2019-11-07 15:14:13 -08:00
Daniel Sanders 25ee861372 [debugify] Move the Debugify pass from tools/opt to lib/Transform/Utils
Summary:
I need to make use of this pass from a driver program that isn't opt.
Therefore this patch moves this pass into the LLVM library so that it is
available for use elsewhere.

There was one function I kept in tools/opt which is exportDebugifyStats()
this is because it's serializing the statistics into a human readable
format and this seemed more in keeping with opt than a library function

Reviewers: vsk, aprantl

Subscribers: mgorny, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69926
2019-11-07 14:41:54 -08:00
Vedant Kumar a087b78bc4 Wrong debug info generated at -O2 (-O0 is correct)
Instcombiner pass was erasing trivially dead instruction without updating dependent llvm.dbg.value.
which was not showing programmer current state of variables while debugging.
As a part of this fix I did following,
Iterate throught all the users (llvm.dbg) of a instruction which is trivially dead and set each if them undef, Before deleting the instruction.
Now user will see optimized out, when try to print those variables.
This fixes
https://bugs.llvm.org/show_bug.cgi?id=43893

This is my first fix to llvm.

Patch by kamlesh kumar!

Differential Revision: https://reviews.llvm.org/D69809
2019-11-07 11:19:41 -08:00
Sanjay Patel d9ccb6367a [InstCombine] canonicalize shift+logic+shift to reduce dependency chain
shift (logic (shift X, C0), Y), C1 --> logic (shift X, C0+C1), (shift Y, C1)

This is an IR translation of an existing SDAG transform added here:
rL370617

So we again have 9 possible patterns with a commuted IR variant of each pattern:
https://rise4fun.com/Alive/VlI
https://rise4fun.com/Alive/n1m
https://rise4fun.com/Alive/1Vn

Part of the motivation is to allow easier recognition and subsequent
canonicalization of bswap patterns as discussed in PR43146:
https://bugs.llvm.org/show_bug.cgi?id=43146

We had to delay this transform because it used to allow the SLP vectorizer
to create awful reductions out of simple load-combines.
That problem was fixed with:
rL375025
(we'll bring back load combining in IR someday...)

The backend is also better equipped to deal with these patterns now
using hooks like TLI.getShiftAmountThreshold().

The only remaining potential controversy is that the -reassociate pass
tends to reverse this kind of pattern (to help GVN?). But since -reassociate
doesn't do anything with these specific patterns, there is no conflict currently.

Finally, there's a new pass proposal at D67383 for general tree-height-reduction
reassociation, and it could use a cost model to decide how to optimally rearrange
these kinds of ops for a target. That patch appears to be stalled.

Differential Revision: https://reviews.llvm.org/D69842
2019-11-07 12:09:45 -05:00
evgeny dde589389f [ThinLTO] Import readonly vars with refs
Patch allows importing declarations of functions and variables, referenced
by the initializer of some other readonly variable.
Differential revision: https://reviews.llvm.org/D69561
2019-11-07 15:13:35 +03:00
Sanjay Patel 7ff57705ba [SLP] allow forming 2-way reduction patterns
We have a vector compare reduction problem seen in PR39665 comment 2:
https://bugs.llvm.org/show_bug.cgi?id=39665#c2

Or slightly reduced here:

define i1 @cmp2(<2 x double> %a0) {
  %a = fcmp ogt <2 x double> %a0, <double 1.0, double 1.0>
  %b = extractelement <2 x i1> %a, i32 0
  %c = extractelement <2 x i1> %a, i32 1
  %d = and i1 %b, %c
  ret i1 %d
}

SLP would not attempt to turn this into a vector reduction because there is an
artificial lower limit on that transform. We can not completely remove that limit
without inducing regressions though, so this patch just hacks an extra attempt at
creating a 2-way reduction to the end of the analysis.

As shown in the test file, we are still not getting some of the motivating cases,
so follow-on patches will be needed to solve those cases.

Differential Revision: https://reviews.llvm.org/D59710
2019-11-07 06:08:42 -05:00
Eric Christopher 009e032634 Temporarily Revert "[LV] Apply sink-after & interleave-groups as VPlan transformations (NFC)"
as it's causing assert failures.

This reverts commit 100e797adb.
2019-11-06 21:58:28 -08:00
Wenlei He ba1dfae054 Keep import function list for inlinee profile update
Summary:
When adjusting function entry counts after inlining, Funciton::setEntryCount is called without providing an import function list. The side effect of that is the previously set import function list will be dropped. The import function list is used by ThinLTO to help import hot cross module callee for LTO inlining, so dropping that during ThinLTO pre-link may adversely affect LTO inlining. The fix is to keep the list while updating entry counts for inlining.

Reviewers: wmi, davidxl, tejohnson

Subscribers: mehdi_amini, hiraditya, dexonsmith, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69736
2019-11-06 18:36:00 -08:00
Eric Christopher e511c4b0df Temporarily Revert:
"[SLP] Generalization of stores vectorization."
 "[SLP] Fix -Wunused-variable. NFC"
 "[SLP] Vectorize jumbled stores."

As they're causing significant (10-30x) compile time regressions on
vectorizable code.

The primary cause of the compile-time regression is f228b53716.

This reverts commits:

f228b53716
5503455ccb
21d498c9c0
2019-11-06 16:06:15 -08:00
Philip Reames 8748be7750 [LoopPred] Enable new transformation by default
The basic idea of the transform is to convert variant loop exit conditions into invariant exit conditions by changing the iteration on which the exit is taken when we know that the trip count is unobservable.  See the original patch which introduced the code for a more complete explanation.

The individual parts of this have been reviewed, the result has been fuzzed, and then further analyzed by hand, but despite all of that, I will not be suprised to see breakage here.  If you see problems, please don't hesitate to revert - though please do provide a test case.  The most likely class of issues are latent SCEV bugs and without a reduced test case, I'll be essentially stuck on reducing them.

(Note: A bunch of tests were opted out of the new transform to preserve coverage.  That landed in a previous commit to simplify revert cycles if they turn out to be needed.)
2019-11-06 15:41:57 -08:00
Kazu Hirata f0f73ed8b0 [JumpThreading] Factor out code to clone instructions (NFC)
Summary:
This patch factors out code to clone instructions -- partly for
readability and partly to facilitate an upcoming patch of my own.

Reviewers: wmi

Subscribers: hiraditya, jfb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69861
2019-11-06 14:16:48 -08:00
Philip Reames 686f449e3d [WC] Fix a subtle bug in our definition of widenable branch
We had a subtle, but nasty bug in our definition of a widenable branch, and thus in the transforms which used that utility. Specifically, we returned true for any branch which included a widenable condition within it's condition, regardless of whether that widenable condition also had other uses.

The problem is that the result of the WC() call is defined to be one particular value. As such, all users must agree as to what that value is. If we widen a branch without also updating *all other users* of the WC in the same way, we have broken the required semantics.

Most of the textual diff is updating existing transforms not to leave dead uses hanging around. They're largely NFC as the dead instructions would be immediately deleted by other passes. The reason to make these changes is so that the transforms preserve the widenable branch form.

In practice, we don't get bitten by this only because it isn't profitable to CSE WC() calls and the lowering pass from guards uses distinct WC calls per branch.

Differential Revision: https://reviews.llvm.org/D69916
2019-11-06 14:16:34 -08:00
Philip Reames 9bfa5ab3d1 [LoopPred] Fix two subtle issues found by inspection
This patch fixes two issues noticed by inspection when going to enable the loop predication code in IndVarSimplify.

Issue 1 - Both the LoopPredication transform, and the already on by default optimizeLoopExits transform, modify the exit count of the exits they modify. (either to 0 or Infinity) Looking at the code more closely, this was not reflected into SCEV and we were instead running later transforms with incorrect SCEVs. Fixing this requires forgetting the loop, weakening a too strong assert, and updating SCEV to not pessimize results when a loop is provable untaken. I haven't been able to find a test case to demonstrate the miscompile.

Issue 2 - For modules without a data layout, we can end up with unsized pointer typed exit counts. Just bail out of this case.

I think these are the last two issues which need addressed before we enable this by default. The code has already survived a decent amount of fuzzing without revealing either of the above.

Differential Revision: https://reviews.llvm.org/D69695
2019-11-06 14:04:45 -08:00
Roman Lebedev 4fe94d0331
[LoopUnroll] countToEliminateCompares(): fix handling of [in]equality predicates (PR43840)
Summary:
I believe this bisects to https://reviews.llvm.org/D44983
(`[LoopUnroll] Only peel if a predicate becomes known in the loop body.`)

While that revision did contain tests that showed arguably-subpar peeling
for [in]equality predicates that [not] happen in the middle of the loop,
it also disabled peeling for the *first* loop iteration,
because latch would be canonicalized to [in]equality comparison..

That was intentional as per https://reviews.llvm.org/D44983#1059583.
I'm not 100% sure that i'm using correct checks here,
but this fix appears to be going in the right direction..

Let me know if i'm missing some checks here..

Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=43840 | PR43840 ]].

Reviewers: fhahn, mkazantsev, efriedma

Reviewed By: fhahn

Subscribers: xbolva00, hiraditya, zzheng, llvm-commits, fhahn

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69617
2019-11-06 15:08:59 +03:00
Sjoerd Meijer 6c2a4f5ff9 [TTI][LV] preferPredicateOverEpilogue
We have two ways to steer creating a predicated vector body over creating a
scalar epilogue. To force this, we have 1) a command line option and 2) a
pragma available. This adds a third: a target hook to TargetTransformInfo that
can be queried whether predication is preferred or not, which allows the
vectoriser to make the decision without forcing it.

While this change behaves as a non-functional change for now, it shows the
required TTI plumbing, usage of this new hook in the vectoriser, and the
beginning of an ARM MVE implementation. I will follow up on this with:
- a complete MVE implementation, see D69845.
- a patch to disable this, i.e. we should respect "vector_predicate(disable)"
  and its corresponding loophint.

Differential Revision: https://reviews.llvm.org/D69040
2019-11-06 10:14:20 +00:00
Alina Sbirlea 4b698645d3 [LoopRotationUtils] Check values are newly inserted into maps.
This is a cleanup that came up in D63680.
All values added to the ValueMaps should be newly added.
2019-11-05 13:40:10 -08:00
Sergey Dmitriev 82588e05cc [SLP] - Add couple safety checks to TreeEntry::dump(). NFC
Summary: Check for MainOp and AltOp for NULL before dereferencing or issue NULL.

Reviewers: Vasilis, dtemirbulatov, RKSimon, ABataev

Reviewed By: ABataev

Subscribers: mehdi_amini, hiraditya, dexonsmith, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69812
2019-11-05 09:57:30 -08:00
Kazu Hirata 893afb9ca1 [JumpThreading] Factor out code to merge basic blocks (NFC)
Summary:
This patch factors out code to merge a basic block with its sole
successor -- partly for readability and partly to facilitate an
upcoming patch of my own.

Reviewers: wmi

Subscribers: hiraditya, jfb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69852
2019-11-05 09:46:57 -08:00
Gil Rapaport 100e797adb [LV] Apply sink-after & interleave-groups as VPlan transformations (NFC)
This recommits 2be17087f8 (reverted in
d3ec06d219 for heap-use-after-free) with a fix
in IAI's reset() which was not clearing the set of interleave groups after
deleting them.
2019-11-05 17:29:13 +02:00
Francis Visoiu Mistrih 47d1029788 [ObjC][ARC] Ignore lifetime markers between *ReturnValue calls
When eliminating a pair of

`llvm.objc.autoreleaseReturnValue`

followed by

`llvm.objc.retainAutoreleasedReturnValue`

we need to make sure that the instructions in between are safe to
ignore.

Other than bitcasts and useless GEPs, it's also safe to ignore lifetime
markers for both static allocas (lifetime.start/lifetime.end) and dynamic
allocas (stacksave/stackrestore).

These get added by the inliner as part of the return sequence and can
prevent the transformation from happening in practice.

Differential Revision: https://reviews.llvm.org/D69833
2019-11-05 06:39:22 -08:00
Kazu Hirata 0016c1f400 [JumpThreading] Factor out common code to update the SSA form (NFC)
Summary:
This patch factors out common code to update the SSA form in
JumpThreading.cpp -- partly for readability and partly to facilitate
an coming patch of my own.

Reviewers: wmi

Subscribers: hiraditya, jfb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69811
2019-11-05 06:15:44 -08:00
Simon Pilgrim 77debf51ab [GVN] Fix uninitialized variable warnings. NFCI. 2019-11-05 14:10:32 +00:00
Simon Pilgrim 1842fe6be3 Add missing GVN =operator. NFCI.
Fixes PVS Studio warning that the 'ValueTable' class implements a copy constructor, but lacks the '=' operator.
2019-11-05 13:41:50 +00:00
Roman Lebedev ccf1a5f4bb
[InstCombine] dropRedundantMaskingOfLeftShiftInput(): truncation (PR42563)
Summary:
That fold keeps growing and growing :(
I think this may be one of the last pieces for it.

Since D67677/D67725, the fold knowns the general form
of the pattern - where some masking is needed:
https://rise4fun.com/Alive/F5R
https://rise4fun.com/Alive/gslRa

But there is one more huge piece missing - if you are extracting some bits,
it is not impossible that the origin is wider than the extraction,
i.e. there may be a truncation. And we don't deal with that yet.

But we can, and the generalization remains fully identical:
https://rise4fun.com/Alive/Uar
https://rise4fun.com/Alive/5SW

After a preparatory cleanup i think the diff looks rather clean.

One missing piece is that in some patterns (especially pat. b),
`-1` only needs to be `-1` in final type, but that is for later..

https://bugs.llvm.org/show_bug.cgi?id=42563

Reviewers: spatel, nikic

Reviewed By: spatel

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69125
2019-11-05 12:41:26 +03:00
Philip Reames 6ff439b57f [SimplifyCFG] Use a (trivially) dominanting widenable branch to remove later slow path blocks
This transformation is a variation on the GuardWidening transformation we have checked in as it's own pass. Instead of focusing on merge (i.e. hoisting and simplifying) two widenable branches, this transform makes the observation that simply removing a second slowpath block (by reusing an existing one) is often a very useful canonicalization. This may lead to later merging, or may not. This is a useful generalization when the intermediate block has loads whose dereferenceability is hard to establish.

As noted in the patch, this can be generalized further, and will be.

Differential Revision: https://reviews.llvm.org/D69689
2019-11-04 11:03:28 -08:00
Amy Huang ab76cfdd20 Recommit "[CodeView] Add option to disable inline line tables."
This reverts commit 004ed2b0d1.
Original commit hash 6d03890384

Summary:
This adds a clang option to disable inline line tables. When it is used,
the inliner uses the call site as the location of the inlined function instead of
marking it as an inline location with the function location.

https://reviews.llvm.org/D67723
2019-11-04 09:15:26 -08:00
Alexey Bataev b80c41cd3c [SLP]Fix PR43799: Crash on different sizes of GEP indices.
Summary:
If the GEP instructions are going to be vectorized, the indices in those
GEP instructions must be of the same type. Otherwise, the compiler may
crash when trying to build the vector constant.

Reviewers: RKSimon, spatel

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69627
2019-11-04 10:36:26 -05:00
Benjamin Kramer d3ec06d219 Revert "[LV] Apply sink-after & interleave-groups as VPlan transformations (NFC)"
This reverts commit 2be17087f8. Fails ASAN.
2019-11-04 15:04:42 +01:00
David Spickett 91167e22ec [hwasan] Remove lazy thread-initialisation
This was an experiment made possible by a non-standard feature of the
Android dynamic loader.

It required introducing a flag to tell the compiler which ABI was being
targeted.
This flag is no longer needed, since the generated code now works for
both ABI's.

We leave that flag untouched for backwards compatibility. This also
means that if we need to distinguish between targeted ABI's again
we can do that without disturbing any existing workflows.

We leave a comment in the source code and mention in the help text to
explain this for any confused person reading the code in the future.

Patch by Matthew Malcomson

Differential Revision: https://reviews.llvm.org/D69574
2019-11-04 10:58:46 +00:00
Gil Rapaport 2be17087f8 [LV] Apply sink-after & interleave-groups as VPlan transformations (NFC)
The sink-after and interleave-group vectorization decisions were so far applied to
VPlan during initial VPlan construction, which complicates VPlan construction – also because of
their inter-dependence. This patch refactors buildVPlanWithRecipes() to construct a simpler
initial VPlan and later apply both these vectorization decisions, in order, as VPlan-to-VPlan
transformations.

Differential Revision: https://reviews.llvm.org/D68577
2019-11-04 10:37:39 +02:00
Dávid Bolvanský 058b5028de Reland '[InstructionCombining] Fixed null check after dereferencing warning. NFCI.' 2019-11-03 20:34:54 +01:00
Dávid Bolvanský 5b37c018d5 Revert "[InstructionCombining] Fixed null check after dereferencing warning. NFCI."
This reverts commit 8308187fd9. This exposed a bug.
2019-11-03 20:31:05 +01:00
Dávid Bolvanský d825ed24d2 Revert "[InstructionCompares] Fixed null check after dereferencing warning. NFCI."
This reverts commit b8685cf304.
2019-11-03 20:24:01 +01:00
Dávid Bolvanský b8685cf304 [InstructionCompares] Fixed null check after dereferencing warning. NFCI. 2019-11-03 20:13:45 +01:00
Dávid Bolvanský 8308187fd9 [InstructionCombining] Fixed null check after dereferencing warning. NFCI. 2019-11-03 20:10:46 +01:00
Dávid Bolvanský 8262a5b701 [CHR] Fixed null check after dereferencing warning. NFCI. 2019-11-03 20:06:38 +01:00
Dávid Bolvanský 914128ab12 [LoopUnrollRuntime] Fixed null check after dereferencing warning. NFCI. 2019-11-03 20:05:18 +01:00
Dávid Bolvanský 60cb193a40 [LoopUnrollAndJam] Fixed null check after dereferencing warning. NFCI. 2019-11-03 20:02:54 +01:00
Simon Pilgrim 81ba611e88 Ensure VPlanPrinter::Depth is initialized to fix static analyzer warning. NFCI. 2019-11-03 11:17:05 +00:00
Johannes Doerfert 77a6b358b5 [Attributor][NFCI] Do not track unnecessary dependences
If we do not look at assumed information there is no need to track
dependences.
2019-11-02 15:26:30 -05:00
Johannes Doerfert 680f638027 [Attributor][NFCI] Distinguish optional and required dependences
Dependences between two abstract attributes SRC and TRG come naturally in
two flavors:
  Either (1) "some" information of SRC is *required* for TRG to derive
  information, or (2) SRC is just an *optional* way for TRG to derive
  information.

While it is not strictly necessary to distinguish these types
explicitly, it can help us to converge faster, in terms of iterations,
and also cut down the number of `AbstractAttribute::update` calls.

As far as I can tell, we only use optional dependences for liveness so
far but that might change in the future. With this change the Attributor
can be informed about the "dependence class" and it will perform
appropriate actions when an Attribute is set to an invalid state, thus
one that cannot be used by others to derive information from.
2019-11-02 15:26:22 -05:00
Stefan Stipanovic f35740d6e9 NoFree argument attribute.
Summary: Deducing nofree atrribute for function arguments.

Reviewers: jdoerfert

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67886
2019-11-02 19:40:48 +01:00
Stefan Stipanovic 5fb1782918 Revert "NoFree argument attribute."
This reverts commit c12efa2ed0.
2019-11-02 17:31:02 +01:00
Stefan Stipanovic c12efa2ed0 NoFree argument attribute.
Summary: Deducing nofree atrribute for function arguments.

Reviewers: jdoerfert

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67886
2019-11-02 16:35:38 +01:00
Roman Lebedev c4b757be02
Revert BCmp Loop Idiom recognition transform (PR43870)
As discussed in https://bugs.llvm.org/show_bug.cgi?id=43870,
this transform is missing a crucial legality check:
the old (non-countable) loop would early-return upon first mismatch,
but there is no such guarantee for bcmp/memcmp.

We'd need to ensure that [PtrA, PtrA+NBytes) and [PtrB, PtrB+NBytes)
are fully dereferenceable memory regions. But that would limit
the transform to constant loop trip counts and would further
cripple it because dereferenceability analysis is *very* partial.

Furthermore, even if all that is done, every single test
would need to be rewritten from scratch.

So let's just give up.
2019-11-02 12:48:03 +03:00
Johannes Doerfert 2d77b0cad0 [Attributor] Ignore BlockAddress users in call site traversal
BlockAddress users will not "call" the function so they do not qualify
as call sites in the first place. When we delete a function with
BlockAddress users we need to first remove the body so they are properly
discarded.
2019-11-02 01:23:18 -05:00
Johannes Doerfert 07d16424f2 [Attributor][FIX] Do not try to cast if a cast is not required
When we replace constant returns at the call site we did issue a cast in
the hopes it would be a no-op if the types are equal. Turns out that is
not the case and we have to check it ourselves first.

Reused an IPConstantProp test for coverage. No functional change to the
test wrt. IPConstantProp.
2019-11-02 00:54:00 -05:00
Johannes Doerfert c7ab19dbb0 [Attributor][FIX] Transform invoke of nounwind to call + br %normal_dest
Even if the invoked function may-return, we can replace it with a call
and branch if it is nounwind. We had almost everything in place to do
this but did not which actually caused a crash when we removed the
landingpad from the actually dead unwind block.

Exposed by the IPConstantProp tests.
2019-11-02 00:54:00 -05:00
Johannes Doerfert 3cbe3314b4 [Attributor][FIX] Make "known" and "assumed" liveness explicit
We did merge "known" and "assumed" liveness information into a single
set which caused various kinds of problems, especially because we did
not properly record when something was actually known. With this patch
we properly track the "known" bit and distinguish dead ends we know from
the ones we still need to explore in future updates.
2019-11-02 00:49:29 -05:00
Johannes Doerfert 1b6041a9e8 [Attributor] `willreturn` + `noreturn` = UB
We gave up on `noreturn` if `willreturn` was known for a while but we
now again try to always derive `noreturn`. This is useful because a
function that is `noreturn` + `willreturn` is basically dead as
executing it would lead to undefined behavior (UB).

This came up in the IPConstantProp cases where a function only contained
a unreachable but was not marked `noreturn` which caused missed
opportunities down the line.
2019-11-02 00:35:22 -05:00
Johannes Doerfert e360ee6265 [Attributor][FIX] Make AAValueSimplifyArgument aware of thread-dependent constants
As in IPConstantProp, thread-dependent constants need not be propagated
over callbacks. Took the comment and test from there, see also D56447.
2019-11-02 00:32:39 -05:00
Johannes Doerfert ed47a9cde4 [Attributor][FIX] Handle the default case of a switch
In D69605 only the "cases" of a switch were handled but if none matched
we did not make the default case life. This is fixed now and properly
tested (with code from IPConstantProp/user-with-multiple-uses.ll).
2019-11-02 00:30:31 -05:00
Johannes Doerfert 15cd90a2c4 [Attributor][FIX] Make value simplification aware of "complicated" attributes
We cannot simply replace arguments that carry attributes like `nest`,
`inalloca`, `sret`, and `byval`. Except for the last one, which we can
replace if it is not written, we bail for now.
2019-11-02 00:29:17 -05:00
Johannes Doerfert c36e2ebf9f [Attributor][NFCI] Avoid unnecessary work except for testing
Trying to deduce information for declarations and calls sites of
declarations is not useful in practice but only for testing. Add a flag
that disables this by default but also enable it in the tests.

The misc.ll test will verify the flag "works" as expected.
2019-11-02 00:28:24 -05:00
Johannes Doerfert 0437bfcc83 [Attributor][FIX] NoCapture is not a subsuming property
We cannot look at the subsuming positions and take their nocapture bit
as shown with the two tests for which we derived nocapture on the call
site argument and readonly on the argument of the second before.
2019-11-02 00:26:15 -05:00
Johannes Doerfert 0c7d4d7f3e [Attributor][NFCI] Remove obsolete code
The code in question does not add anything as the class is a subclass of
AACallSiteReturnedFromReturnedAndMustBeExecutedContext already.
2019-11-02 00:25:46 -05:00
Teresa Johnson 16ec00eee7 Recommit "[ThinLTO] Handle GUID collision in import global processing""
This recommits cc0b9647b7 which was
reverted in d39d1a2f87.

I added a fix for an issue found when testing via distributed ThinLTO,
and added a test case for that failure.
2019-11-01 13:57:01 -07:00
Teresa Johnson d39d1a2f87 Revert "[LLD][ThinLTO] Handle GUID collision in import global processing"
This reverts commit cc0b9647b7.

The commit is causing a failure in internal testing. Will recommit with
a fix later.
2019-11-01 10:02:58 -07:00
Johannes Doerfert eb4f41dfe5 [Attributor] Really use the executed-context
Before we did not follow casts and geps when we looked at the users of a
pointer in the pointers must-be-executed-context. This caused us to fail
to determine if it was accessed for sure. With this change we follow
such users now.

The above extension exposed problems in getKnownNonNullAndDerefBytesForUse
which did not always check what the base pointer was. We also did not
handle negative offsets as conservative as we have to without explicit
loop handling. Finally, we should not derive a huge number if we access
a pointer that was traversed backwards first.

The problems exposed by this functional change are already tested in the
existing test cases as is the functional change.

Differential Revision: https://reviews.llvm.org/D69647
2019-10-31 15:09:45 -05:00
Alexey Bataev 70ad617dd6 [SLP] Vectorize jumbled stores.
Summary:
Patch adds support for vectorization of the jumbled stores. The value
operands are vectorized and then shuffled in the right order before
store.

Reviewers: RKSimon, spatel, hfinkel, mkuper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D43339
2019-10-31 16:02:25 -04:00
Johannes Doerfert 2d6d651e8c [Attributor] Make AANonNull perform context sensitive queries
Summary:
In order to get context sensitivity from isKnownNonZero we need to
provide a context instruction *and* a dominator tree. The latter is
passed now to which actually allows to remove some initialization code.

Tests taken from PR43833.

Reviewers: uenoku, sstefan1

Subscribers: hiraditya, bollu, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69595
2019-10-31 14:47:06 -05:00
Craig Topper 6773435624 [IPCP] Bail on extractvalue's with more than 1 index.
The replacement code only looks at the first index of the
extractvalue. If there are additional indices we'll end
up doing a bad replacement.

This only happens if the function returns a nested struct. Not
sure if clang ever generates such code. The original report came
from ispc.

Fixes PR43857

Differential Revision: https://reviews.llvm.org/D69656
2019-10-31 10:55:20 -07:00
Sanjay Patel a2240f57e7 [InstCombine] simplify fcmp+select canonicalization; NFCI
We had 2 blocks of code that are nearly identical. Existing
regression tests should cover both of the patterns.
2019-10-31 13:13:32 -04:00
David Green a5f7bc0de7 [InstCombine] Canonicalize uadd.with.overflow to uadd.sat
This adds some patterns to transform uadd.with.overflow to uadd.sat
(with usub.with.overflow to usub.sat too). The patterns selects from
UINTMAX (or 0 for subs) depending on whether the operation overflowed.

Signed patterns are a little more involved (they can wrap in two
directions), but can be added here in a followup patch too.

Differential Revision: https://reviews.llvm.org/D69245
2019-10-31 12:45:38 +00:00
Serguei Katkov 1eb04d289a [LICM] Invalidate SCEV upon instruction hoisting
Since SCEV can cache information about location of an instruction, it should be invalidated when the instruction is moved.
There should be similar bug in code sinking part of LICM, it will be fixed in a follow-up change.

Patch Author: Daniil Suchkov
Reviewers: asbirlea, mkazantsev, reames
Reviewed By: asbirlea
Subscribers: hiraditya, javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D69370
2019-10-31 17:37:53 +07:00
Haojian Wu e65ddcafee Revert "[SLP] Vectorize jumbled stores."
This reverts commit 21d498c9c0.

This commit causes some crashes on some targets.
2019-10-31 10:21:24 +01:00
Johannes Doerfert 31784248ee [Attributor][NFCI] Improve the usage of IntegerStates
Setting the upper bound directly in the state can be beneficial and
simplifies the logic. This also exposed more copy&paste type errors.
2019-10-31 01:05:52 -05:00
Johannes Doerfert dac2d403a2 [Attributor] Make liveness "edge-based"
Summary:
If control is transferred to a successor is the key question when it
comes to liveness. The new implementation puts that question in the
focus and thereby providing a clean way to assume certain CFG edges are
dead or instructions will not transfer control.

Reviewers: sstefan1, uenoku

Subscribers: hiraditya, bollu, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69605
2019-10-31 00:35:18 -05:00
Johannes Doerfert cd4aab4a8a [Attributor] Liveness for values
Summary:
This patch introduces liveness (AAIsDead) for all positions, thus for
all kinds of values. For now, we say an instruction is dead if it would
be removed assuming all users are dead. A call site return is different
as we just look at the users. If all call site returns have been
eliminated, the return values can return undef instead of their original
value, eliminating uses.

We try to recursively delete dead instructions now and we introduce a
simple check interface for use-traversal.

This is the idea tried out in D68626 but implemented in the right way.

Reviewers: uenoku, sstefan1

Subscribers: hiraditya, bollu, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68925
2019-10-31 00:16:36 -05:00
Johannes Doerfert 5e442a51bc [Attributor][NFC] Do not delete dead blocks but "clear" them
Deleting blocks will require us to deal with dead edges, e.g.,
  `br i1 false, label %live, label %dead`
explicitly. For now we just clear the blocks and move on.
This will be revisited once we actually fold branches.
2019-10-31 00:09:50 -05:00
Johannes Doerfert 0be9cf2da9 [Attributor] Add "free"-based heap2stack deduction
Summary:
If there is a unique free of the allocated that has to be reached from
the malloc, we can apply the heap-2-stack transformation even if the
pointer escapes.

Reviewers: hfinkel, sstefan1, uenoku

Subscribers: hiraditya, bollu, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68958
2019-10-30 20:57:57 -05:00
Johannes Doerfert 2dad729f0c [Attributor][NFC] Eagerly mark attributes as fixed.
If an attribute did not query any optimistic (=non-fixed) information to
justify its state, we know the attribute state will not change anymore.
Thus, we can indicate an optimistic fixpoint.
2019-10-30 20:47:47 -05:00
Johannes Doerfert 12173e60ec [Attributor][NFC] Do not record dependences on fixed attributes
Since fixed values cannot change, we do not need to wait for it to
happen, we will never notify the dependent attribute anyway.
2019-10-30 20:44:03 -05:00
Johannes Doerfert b2083c5382 [Attributor][NFC] Simplify the IRPosition interface
We pretended IRPosition came either as mutable or immutable objects
while they are basically always immutable, with a single (existing)
unfortunate exceptions. This patch cleans up the uses to deal with the
immutable version.
2019-10-30 20:43:05 -05:00
Teresa Johnson c844f8846a [ThinLTO/WPD] Fix index-based WPD for available_externally vtables
Summary:
Clang does not add type metadata to available_externally vtables. When
choosing a summary to look at for virtual function definitions, make
sure we skip summaries for any available externally vtables as they will
not describe any virtual function functions, which are only summarized
in the presence of type metadata on the vtable def. Simply look for the
corresponding strong def's summary.

Also add handling for same-named local vtables with the same GUID
because of same-named files without enough distinguishing path.
In that case we return a conservative result with no devirtualization.

Reviewers: pcc, davidxl, evgeny777

Subscribers: mehdi_amini, inglorion, hiraditya, steven_wu, dexonsmith, arphaman, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69452
2019-10-30 17:59:08 -07:00
Amy Huang 004ed2b0d1 Revert "[CodeView] Add option to disable inline line tables."
because it breaks compiler-rt tests.

This reverts commit 6d03890384.
2019-10-30 17:31:12 -07:00
Amy Huang 6d03890384 [CodeView] Add option to disable inline line tables.
Summary:
This adds a clang option to disable inline line tables. When it is used,
the inliner uses the call site as the location of the inlined function instead of
marking it as an inline location with the function location.

See https://bugs.llvm.org/show_bug.cgi?id=42344

Reviewers: rnk

Subscribers: hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D67723
2019-10-30 16:52:39 -07:00
tyker c3b06d0c39 [InstCombine] keep assumption before sinking calls
Summary:
in the following C code the branch is not removed by clang in O3.
```
int f1(char* p) {
    int i1 = __builtin_strlen(p);
    if (!p)
        return -1;
    return i1;
}
```
The issue is that the call to strlen is sunk to the following block by instcombine. In its new place the call to strlen doesn't dominate the use in the icmp anymore so value tracking can't see that p cannot be null.
This patch resolves the issue by inserting an assumption at the place of the call before sinking a call when that call can be used to prove an argument to be nonnull.
This resolves this issue at O3.

Reviewers: majnemer, xbolva00, fhahn, jdoerfert, spatel, efriedma

Reviewed By: jdoerfert

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69477
2019-10-31 00:15:19 +01:00
Alexey Bataev 21d498c9c0 [SLP] Vectorize jumbled stores.
Summary:
Patch adds support for vectorization of the jumbled stores. The value
operands are vectorized and then shuffled in the right order before
store.

Reviewers: RKSimon, spatel, hfinkel, mkuper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D43339
2019-10-30 13:33:52 -04:00
Simon Pilgrim d52f5ed01a [SLPVectorizer] Use getAPInt() for comparison. NFCI.
Technically integers can assert on getZExtValue() if beyond i64 range, and a fuzzer usually find this.....
2019-10-30 16:16:55 +00:00
Karl-Johan Karlsson 760ed8da98 [AddressSanitizer] Only instrument globals of default address space
The address sanitizer ignore memory accesses from different address
spaces, however when instrumenting globals the check for different
address spaces is missing. This result in assertion failure. The fault
was found in an out of tree target.

The patch skip all globals of non default address space.

Reviewed By: leonardchan, vitalybuka

Differential Revision: https://reviews.llvm.org/D68790
2019-10-30 09:32:19 +01:00