Commit Graph

146169 Commits

Author SHA1 Message Date
Duncan P. N. Exon Smith 0db6488a77 Support: Add move semantics to mapped_file_region
Update llvm::sys::fs::mapped_file_region to have a move constructor and
a move assignment operator, allowing it to be used as an Optional. Also,
update FileOutputBuffer's OnDiskBuffer to take advantage of this,
avoiding an extra allocation from the unique_ptr.

A nice follow-up would be to make the mapped_file_region constructor
private and replace its use with a factory function, such as
mapped_file_region::create(), that returns an Expected (or ErrorOr). I
don't plan on doing that immediately, but I might swing back later.

No functionality change, besides the saved allocation in OnDiskBuffer.

Differential Revision: https://reviews.llvm.org/D100159
2021-04-09 17:56:26 -07:00
Mitch Phillips 092f288d36 Revert "[AMDGPU] Remove MachineDCE after SIFoldOperands"
This reverts commit 5a0117b2d0.

Reason: Dependent change d19a42eba9 broke
the ASan buildbots.
2021-04-09 15:47:44 -07:00
Mitch Phillips 3d4730a73f Revert "[AMDGPU] SIFoldOperands: eagerly erase dead REG_SEQUENCEs"
This reverts commit d19a42eba9.

Reason: Broke the ASan buildbots. See the original phabricator review
for more details: https://reviews.llvm.org/D100188
2021-04-09 15:47:44 -07:00
Jessica Paquette 49c3565b9b [AArch64][GlobalISel] Swap compare operands when it may be profitable
This adds support for swapping comparison operands when it may introduce new
folding opportunities.

This is roughly the same as the code added to AArch64ISelLowering in
162435e7b5.

For an example of a testcase which exercises this, see
llvm/test/CodeGen/AArch64/swap-compare-operands.ll

(Godbolt for that testcase: https://godbolt.org/z/43WEMb)

The idea behind this is that sometimes, we may be able to fold away, say, a
shift or extend in a compare by swapping its operands.

e.g. in the case of this compare:

```
lsl x8, x0, #1
cmp x8, x1
cset w0, lt
```

The following is equivalent:

```
cmp x1, x0, lsl #1
cset w0, gt
```

Most of the code here is just a reimplementation of what already exists in
AArch64ISelLowering.

(See `getCmpOperandFoldingProfit` and `getAArch64Cmp` for the equivalent code.)

Note that most of the AND code in the testcase doesn't actually fold. It seems
like we're missing selection support for that sort of fold right now, since SDAG
happily folds these away (e.g testSwapCmpWithShiftedZeroExtend8_32 in the
original .ll testcase)

Differential Revision: https://reviews.llvm.org/D89422
2021-04-09 15:46:48 -07:00
Roman Lebedev 077bff39d4
[Analysis] isDereferenceableAndAlignedPointer(): recurse into select's hands
By doing this within the method itself,
we support traversing multiple levels of selects (TODO: PHI's),
fixing the SROA `std::clamp()` testcase.

Fixes https://bugs.llvm.org/show_bug.cgi?id=47271
Mostly fixes https://bugs.llvm.org/show_bug.cgi?id=49909
2021-04-10 00:56:28 +03:00
Mitch Phillips 1a2756b777 Revert "[PowerPC] Add ROP Protection Instructions for PowerPC"
This reverts commit 16fe741c69.

Reason: Broke the UBSan buildbots. More information available in the
phabricator review: https://reviews.llvm.org/D99375
2021-04-09 13:36:41 -07:00
Jay Foad 5a0117b2d0 [AMDGPU] Remove MachineDCE after SIFoldOperands
Remove the MachineDCE pass after the first SIFoldOperands pass now
that SIFoldOperands deletes its own dead instructions.

Differential Revision: https://reviews.llvm.org/D100189
2021-04-09 20:41:09 +01:00
Jay Foad d19a42eba9 [AMDGPU] SIFoldOperands: eagerly erase dead REG_SEQUENCEs
This is fairly cheap to implement and means less work for future
passes like MachineDCE.

Differential Revision: https://reviews.llvm.org/D100188
2021-04-09 20:41:09 +01:00
Alina Sbirlea f6bff8d157 [MSSA] Rename uses in IDF regardless of new def position in basic block.
When inserting a new def and renaming of uses is asked, always compute
IDF and do the renaming for the blocks with Phis in that IDF.
Resolves PR49859.

Differential Revision: https://reviews.llvm.org/D100163
2021-04-09 12:32:37 -07:00
Stefan Pintilie 5bca7cdafb Add correct types to the xxsplti32dx pattern.
Regiser types for xxsplti32dx for two td file patterns was incorrect.
Fixed the two types and added a test case that was reduced from a larger
failing test.

Reviewed By: nemanjai, #powerpc

Differential Revision: https://reviews.llvm.org/D100223
2021-04-09 14:11:34 -05:00
Duncan P. N. Exon Smith 365053d2a5 Support: Remove code duplication for mapped_file_region accessors, NFC 2021-04-09 11:49:40 -07:00
Thomas Lively f30c429da6 [WebAssembly] Add shuffles as an option for lowering BUILD_VECTOR
When lowering a BUILD_VECTOR SDNode, we choose among various possible vector
creation instructions in an attempt to minimize the total number of instructions
used. We previously considered using swizzles, consts, and splats, and this
patch adds shuffles as well. A common pattern that now lowers to shuffles is
when two 64-bit vectors are concatenated. Previously, concatenations generally
lowered to sequences of extract_lane and replace_lane instructions when they
could have been a single shuffle.

Differential Revision: https://reviews.llvm.org/D100018
2021-04-09 11:21:49 -07:00
Alex Richardson 107189a26e [TableGen] Report an error message on a missing comma
I recently forgot a comma in a defm argument list and tablegen just
failed with exit code 1 without printing an error message. I believe
this issue was introduced in a9fc44c557.

This change prints the following instead:
.../clang/include/clang/Driver/Options.td:569:3: error: Expected comma before next argument

Reviewed By: Paul-C-Anagnostopoulos

Differential Revision: https://reviews.llvm.org/D100178
2021-04-09 18:48:49 +01:00
Amara Emerson 40e75cafc0 [AArch64][GlobalISel] Fix incorrect codegen for <16 x s8> G_ASHR.
Fixes PR49904
2021-04-09 10:41:41 -07:00
Stefan Pintilie 16fe741c69 [PowerPC] Add ROP Protection Instructions for PowerPC
There are four new PowerPC instructions that are introduced in
Power 10. They are hashst, hashchk, hashstp, hashchkp.

These instructions will be used for ROP Protection.
This patch adds the four instructions.

Reviewed By: nemanjai, amyk, #powerpc

Differential Revision: https://reviews.llvm.org/D99375
2021-04-09 12:09:01 -05:00
Adrian Prantl 6ce76ff7eb Update the linkage name of coro-split functions in the debug info.
This patch updates the linkage name in the DISubprogram of coro-split
functions, which is particularly important for Swift, where the
funclets have a special name mangling. This patch does not affect C++
coroutines, since the DW_AT_specification is expected to hold the
(original) linkage name. I believe this is mostly due to limitations
in AsmPrinter, so we might be able to relax this restriction in the
future.

Differential Revision: https://reviews.llvm.org/D99693
2021-04-09 09:50:56 -07:00
Simon Pilgrim d8bc4de3cf [X86] Fold cmpeq/ne(or(X,Y),X) --> cmpeq/ne(and(~X,Y),0) on non-BMI targets (PR44136)
Followup to D100177, enable the fold for non-BMI targets as well.
2021-04-09 16:11:11 +01:00
Simon Pilgrim 245036950a [X86][BMI] Fold cmpeq/ne(or(X,Y),X) --> cmpeq/ne(and(~X,Y),0) (PR44136)
I've initially just enabled this for BMI which has the ANDN instruction for i32/i64 - the i16/i8 cases give an idea of what'd we get when we enable it in all cases (I'll do this as a later commit).

Additionally, the i16/i8 cases could be freely promoted to i32 (as the args are already zeroext) and we could then make use of ANDN + the free cmp0 there as well - this has come up in PR48768 and PR49028 so I'm going to look at this soon.

https://alive2.llvm.org/ce/z/QVWHP_
https://alive2.llvm.org/ce/z/pLngT-

Vector cases do not appear to benefit from this as we end up with having to generate the zero vector as well - this is one of the reasons I didn't try to tie this into hasAndNot/hasAndNotCompare.

Differential Revision: https://reviews.llvm.org/D100177
2021-04-09 15:52:03 +01:00
Sanjay Patel 84cdccc9dc [InstCombine] try to eliminate an instruction in min/max -> abs fold
As suggested in the review thread for 5094e12 and seen in the
motivating example from https://llvm.org/PR49885, it's not
clear if we have a way to create the optimal code without
this heuristic.
2021-04-09 10:34:03 -04:00
Momchil Velikov acf3279a03 For non-null pointer checks, do not descend through out-of-bounds GEPs
In LazyValueInfoImpl::isNonNullAtEndOfBlock we populate a set of
pointers, known to be non-null at the end of a block (e.g. because we
did a load through them). We then infer that any pointer, based on an
element of this set is non-null as well ("based" here meaning a
non-null pointer is the underlying object). This is incorrect, even if
the base pointer was non-null, the value of a GEP, that lacks the
inbounds` attribute, may be null.

This issue appeared as miscompilation of the following test case:

int puts(const char *);

typedef struct iter {
  int *val;
} iter_t;

static long distance(iter_t first, iter_t last) {
  long r = 0;
  for (; first.val != last.val; first.val++)
    ++r;
  return r;
}

int main() {
  int arr[2] = {0};
  iter_t i, j;
  i.val = arr;
  j.val = arr + 1;
  if (distance(i, j) >= 2)
    puts("failed");
  else
    puts("passed");
}

This fixes PR49662.

Differential Revision: https://reviews.llvm.org/D99642
2021-04-09 14:09:23 +01:00
Jay Foad a4ced03d34 [AMDGPU] SIFoldOperands: eagerly delete dead copies
This is cheap to implement, means less work for future passes like
MachineDCE, and slightly improves the folding in some cases.

Differential Revision: https://reviews.llvm.org/D100117
2021-04-09 13:52:54 +01:00
Sebastian Neubauer cc7add5298 [AMDGPU] Use SIInstrFlags for flat variants. NFC
Use SIInstrFlags to differentiate between the different
variants of flat instructions (flat, global and scratch).
This should make it easier to bundle the immediate offset logic in a
single place and implement restrictions and bug workarounds.

Fixed version of D99587, which does not rely on the address space.

Differential Revision: https://reviews.llvm.org/D99743
2021-04-09 12:28:36 +02:00
dfukalov c1a88e007b [AA][NFC] Convert AliasResult to class containing offset for PartialAlias case.
Add an ability to store `Offset` between partially aliased location. Use this
storage within returned `ResultAlias` instead of caching it in `AAQueryInfo`.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D98718
2021-04-09 13:26:09 +03:00
dfukalov d066079728 [NFC][AA] Prepare to convert AliasResult to class with PartialAlias offset.
Main reason is preparation to transform AliasResult to class that contains
offset for PartialAlias case.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D98027
2021-04-09 12:54:22 +03:00
Simon Pilgrim 3ae0a405fc [X86] combineHorizOpWithShuffle - peek through one use bitcasts when decoding shuffles.
Checking for one use, peek through bitcasts of the horizop args to allows us to merge shuffles of different widths through the horizop.
2021-04-09 10:51:04 +01:00
Max Kazantsev baf17e2cc9 [NFC] Move statictic increment out of helper 2021-04-09 16:32:35 +07:00
Sebastian Neubauer ba217b4655 [RegisterScavenging] Add asserts for better errors
These cases were failing before, but with cryptic asserts.
Add asserts in the RegScavenger that fail earlier with better
messages. NFC

Differential Revision: https://reviews.llvm.org/D100109
2021-04-09 11:23:36 +02:00
Sebastian Neubauer 36138db116 [AMDGPU] IsFlatScratch/Global -> FlatScratch/Global
Remove 'Is' from IsFlatScratch/Global. NFC

Differential Revision: https://reviews.llvm.org/D100108
2021-04-09 11:20:31 +02:00
Max Kazantsev 275f3a2540 [GVN][NFC] Factor out load elimination logic via PRE for reuse 2021-04-09 16:12:25 +07:00
Jim Lin 7eaa2810c4 [RISCV][NFC] Replace explicit type i64 with riscv customized SDTypeProfile.
New SDTypeProfile can be reused for other word operation patterns without explicit i64 type in the future.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D100097
2021-04-09 17:06:17 +08:00
Jim Lin 6169f1537c [RISCV][NFC] Fix formatting 2021-04-09 14:41:09 +08:00
Serguei Katkov f6e3b4fe58 [GreedyRA ORE] Re-factor computeNumberOfSplillsReloads.
Replace if-else to if-continue usage.
This simplifies further extension of the collected stats.
2021-04-09 12:44:11 +07:00
Arthur Eubanks 4c89bcadf6 [LICM] Hoist loads with invariant.group metadata
Previously loading the vtable used in calling a virtual method in a loop
was not hoisted out of the loop. This fixes that.

canSinkOrHoistInst() itself doesn't check that the load operands are
loop invariant, callers also check that separately.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D99784
2021-04-08 21:57:37 -07:00
Serguei Katkov d2e15a83a6 [RS4GC] Cleanup meetBDVState. NFC.
meetBDVState looks pretty difficult to read and follow.
This is purely NFC but doing several things:

1) Combine meet and meetBDVState
2) Move the function to be a member of BDVState
3) Make BDVState be a mutable object
4) Convert switch to sequence of ifs
5) Adds comments.

Reviewers: reames, dantrushin
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D99064
2021-04-09 10:20:25 +07:00
Jim Lin 49c79e3b56 [RISCV][NFC] Add explicit type i64 to RV64 only patterns.
Add explicit type i64 to RV64 only patterns to stop emitting unneeded i32 patterns.

It can reduce the isel table size.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D100089
2021-04-09 09:37:04 +08:00
Alex Orlov f47a4c0713 [lld] Fixed CodeView GuidAdapter::format to handle GUID bytes in the right order.
This fixes https://bugs.llvm.org/show_bug.cgi?id=41712 bug.

Reviewed By: aganea

Differential Revision: https://reviews.llvm.org/D99978
2021-04-09 05:29:14 +04:00
David Blaikie 8294019633 Use default ref capture to avoid unused capture warning on assert-used variable 2021-04-08 17:37:55 -07:00
Duncan P. N. Exon Smith 6dc432510f Support: Use std::unique_ptr for SignpostEmitter::Impl, NFC, 3rd attempt
This reverts commit e35afbe535, reapplying
022ccedde8 and
e7ed5c920d.

- The first attempt missed defining `SignpostEmitterImpl`.
- The second attempt missed defining `llvm::SignpostEmitterImpl`.

Not sure how I failed to test both versions locally before; I thought
I'd turned the feature off via rerunning `cmake` but it must have been
stuck in place. This time I confirmed via `clang -E` that I was testing
both build configurations.

Original commit message:

    Replace some manual memory management with std::unique_ptr.

    Differential Revision: https://reviews.llvm.org/D100151
2021-04-08 17:05:59 -07:00
Duncan P. N. Exon Smith e35afbe535 Revert "Revert "Revert "Support: Use std::unique_ptr for SignpostEmitter::Impl, NFC"""
This reverts commit e7ed5c920d again, due
to more buildbot failures:
https://lab.llvm.org/buildbot/#/builders/131/builds/8191
2021-04-08 16:58:12 -07:00
Duncan P. N. Exon Smith e7ed5c920d Revert "Revert "Support: Use std::unique_ptr for SignpostEmitter::Impl, NFC""
This reverts commit 078072285d, reapplying
022ccedde8.

I figured out why this was failing in other environments: it's not a
problem with std::unique_ptr, but that SignpostEmitterImpl only has a
forward declaration. Adding an empty definition should do the trick.

Original commit message:

    Replace some manual memory management with std::unique_ptr.

    Differential Revision: https://reviews.llvm.org/D100151
2021-04-08 16:50:39 -07:00
Duncan P. N. Exon Smith 078072285d Revert "Support: Use std::unique_ptr for SignpostEmitter::Impl, NFC"
This reverts commit 022ccedde8. Looks like
some hosts need a definition of SignpostEmitterImpl to put it in a
unique_ptr:
https://lab.llvm.org/buildbot/#/builders/92/builds/7733
2021-04-08 16:38:47 -07:00
Duncan P. N. Exon Smith 9be4387434 Support: Avoid unnecessary std::function for SignpostEmitterImpl::SignpostLog
The destructor for SignPostEmitterImpl::SignpostLog is known statically. Avoid
the unnecessary vtable indirection through std::function in the std::unique_ptr
by turning LogDeleter into a struct. No real functionality change here.

Differential Revision: https://reviews.llvm.org/D100154
2021-04-08 16:34:22 -07:00
Duncan P. N. Exon Smith bf12b711f9 Support: Drop the no-op initializer for SignpostEmitterImpl::Signposts, NFC
This is a DenseMap, which has its own initializer; we don't need to explicitly
call the default constructor here.

Differential Revision: https://reviews.llvm.org/D100152
2021-04-08 16:34:00 -07:00
Duncan P. N. Exon Smith 022ccedde8 Support: Use std::unique_ptr for SignpostEmitter::Impl, NFC
Replace some manual memory management with std::unique_ptr.

Differential Revision: https://reviews.llvm.org/D100151
2021-04-08 16:31:59 -07:00
Duncan P. N. Exon Smith 429088b9e2 Support: Extract fs::resize_file_before_mapping_readwrite from FileOutputBuffer
Add a variant of `fs::resize_file` for use immediately before opening a
file with `mapped_file_region::readwrite`. On Windows, `_chsize`
(`ftruncate`) is slow, but `CreateFileMapping` (`mmap`) automatically
extends the file so the call to `fs::resize_file` can be skipped.

This optimization was added to `FileOutputBuffer` in
da9bc2e56d5a5c6332a9def1a0065eb399182b93; this commit just extracts the
logic out and adds a unit test.

Differential Revision: https://reviews.llvm.org/D95490
2021-04-08 16:26:35 -07:00
Craig Topper 872931e5d8 [RISCV] Use multiclass inheritance where possible for the VPat* multiclasses in RISVInstrInfoVPseudos. NFCI
Instead of instantiating multiclasses inside multiclasses, just
inherit from them.

We can do the same for the VPseudo* multiclasses, but that may
interfere with the scheduler class work.
2021-04-08 15:14:06 -07:00
Craig Topper ac347a8a0f [RISCV] Remove empty string after 'defm' at top level of vector .td files. NFC
This doesn't do anything so it's just wasted characters. I have
other plans for the ones in multiclasses.
2021-04-08 15:14:06 -07:00
Alexey Bataev ab124bbe2a [SLP]Fix PR49898: Infinite loop in SLP vectorizer.
We should not re-try attempt of finding of the consecutive store chain
if it was tried before.

Differential Revision: https://reviews.llvm.org/D100131
2021-04-08 14:18:06 -07:00
Philip Reames 35393c865c [funcattrs] Infer nosync from instruction walk
Pretty straightforward use of existing infrastructure and port of the attributor inference rules for nosync.

A couple points of interest:
* I deliberately switched from "monotonic or better" to "unordered or better". This is simply me being conservative and is better in line with the rest of the optimizer. We treat monotonic conservatively pretty much everywhere.
* The operand bundle test change is suspicious. It looks like we might have missed something here, but if so, it's an issue with the existing nofree inference as well. I'm going to take a closer look at that separately.
* I needed to keep the previous inference from readnone. This surprised me, but made sense once I realized readonly inference goes to lengths to reason about local vs non-local memory and that writes to local memory are okay. This is fine for the purpose of nosync, but would e.g. prevent us from inferring nofree from readnone - which is slightly surprising.

Differential Revision: https://reviews.llvm.org/D99769
2021-04-08 14:05:00 -07:00
Arthur Eubanks c5d1ccbcdf [GVN] Properly invalidate ICF cache when we simplify a value
This fixes a "Cached first special instruction is wrong!" assert.

The assert fires because replacing a value with another can cause an
instruction to no longer be "special" to ICF. In this case,
devirtualization happened, turning an indirect call to a
call to a willreturn function which is no longer special.

Reviewed By: nikic, rnk

Differential Revision: https://reviews.llvm.org/D99977
2021-04-08 14:01:57 -07:00
Konstantin Zhuravlyov 4fae63c612 AMDGPU: Add gfx90c support to code object v2 for backwards compatibility
Differential Revision: https://reviews.llvm.org/D100126
2021-04-08 16:42:43 -04:00
Stanislav Mekhanoshin 627dab3dbf [AMDGPU] Check for all meta instrs in GCNRegBankReassign
It used to work correctly even with a KILL, but there is
no reason to consider meta instructions since they do not
create real HW uses.

Differential Revision: https://reviews.llvm.org/D100135
2021-04-08 13:41:10 -07:00
Nikita Popov 59a2f67011 [LoopRotate] Don't split loop pass manager
After D99249 we use three different loop pass managers for LICM,
LoopRotate and LICM+LoopUnswitch. This happens because LazyBFI
and LazyBPI are not preserved by LoopRotate (note that D74640
is no longer needed). Avoid this by marking them as preserved.

My understanding of D86156 is that it is okay to simply preserve
them (which LoopUnswitch already does for the same reason) and
rely on callbacks to deal with deleted blocks.

Differential Revision: https://reviews.llvm.org/D99843
2021-04-08 22:05:18 +02:00
Stanislav Mekhanoshin 189310a140 [AMDGPU] Allow -amdgpu-unsafe-fp-atomics to ignore denorm mode
Fixes: SWDEV-274276

Differential Revision: https://reviews.llvm.org/D100072
2021-04-08 12:46:36 -07:00
Wouter van Oortmerssen 04e9cd09c8 [WebAssembly] Fix for PIC external symbol ISEL
wasm64 was missing DAG ISEL patterns for external symbol based global.get, but simply adding these analogous to the existing 32-bit versions doesn't work.
This is because we are conflating the 32-bit global index with the pointer represented by the external symbol, which for wasm32 happened to work.
The simplest fix is to pretend we have a 64-bit global index. This sounds incorrect, but is immaterial since once this index is stored as a MachineOperand it becomes 64-bit anyway (and has been all along). As such, the EmitInstrWithCustomInserter based implementation I experimented with become a no-op and no further changes in the C++ code are required.

Differential Revision: https://reviews.llvm.org/D99904
2021-04-08 12:07:38 -07:00
Congzhe Cao ce2db9005d [LoopInterchange] Fix transformation bugs in loop interchange
After loop interchange, the (old) outer loop header should not jump to
the `LoopExit`. Note that the old outer loop becomes the new inner loop
after interchange. If we branched to `LoopExit` then after interchange
we would jump directly from the (new) inner loop header to `LoopExit`
without executing the rest of outer loop.

This patch modifies adjustLoopBranches() such that the old outer
loop header (which becomes the new inner loop header) jumps to the
old inner loop latch which becomes the new outer loop latch after
interchange.

Reviewed By: bmahjour

Differential Revision: https://reviews.llvm.org/D98475
2021-04-08 14:58:13 -04:00
Levy Hsu 461b554999 [RISCV] Add InstAlias for Zbb Zbp and Zbs extension
Add InstAlias that allows the last operand to be an imm for following instructions:

1. Zbb or Zbp:
    - ror
    - rorw (RV64 Only)

2. Zbs
    - best
    - bclr
    - binv
    - bext

Reviewed By: craig.topper, jrtc27

Differential Revision: https://reviews.llvm.org/D100083
2021-04-08 11:51:31 -07:00
Sanjay Patel 5094e1279e [InstCombine] fold min/max intrinsic with negated operand to abs
The smax case shows up in https://llvm.org/PR49885 .
The others seem unlikely, but we might as well try
for uniformity (although that could mean an extra
instruction to create "nabs").

smax -- https://alive2.llvm.org/ce/z/8yYaGy
smin -- https://alive2.llvm.org/ce/z/0_7zc_
umax -- https://alive2.llvm.org/ce/z/EcsZWs
umin -- https://alive2.llvm.org/ce/z/Xw6WvB
2021-04-08 14:37:39 -04:00
Stanislav Mekhanoshin 5f0ac1ef78 Set IgnoreLLVMUsed to false in CallGraph::addToCallGraph()
clang++ uses llvm.compiler.used in certain cases to preserve
symbol which is fully inlined. D96087 has resulted in undefined
symbols in such cases. Set it to false by default to preserve
old behavior but keep the option for specific uses where we
want to ignore these (e.g. to detect a potential indirect call
to a function).

Differential Revision: https://reviews.llvm.org/D99897
2021-04-08 11:14:09 -07:00
Paul C. Anagnostopoulos 3f919ff250 Revert "[TableGen] Add support for the 'assert' statement in multiclasses"
This reverts commit 3b9a15d910.
2021-04-08 13:58:58 -04:00
Stephen Tozer 1b589172bd Revert "[DebugInfo] Correctly track SDNode dependencies for list debug values"
Reverted due to failure on the sanitizer-x86_64-linux-fast bot.

This reverts commit e10493eb50.
2021-04-08 17:55:45 +01:00
Stephen Tozer e10493eb50 [DebugInfo] Correctly track SDNode dependencies for list debug values
During SelectionDAG, we must track the SDNodes that each SDDbgValue depends on
to compute its value. These are ultimately derived from the location operands to
the SDDbgValue, but were stored in a separate vector prior to this patch. This
resulted in cases where one of the lists was updated incorrectly, resulting in
crashes during compilation. This patch fixes the issue by directly recomputing
the dependency list from the SDDbgOperands in getDependencies().

Differential Revision: https://reviews.llvm.org/D99423
2021-04-08 17:01:45 +01:00
Jay Foad a1a372dfb5 [AMDGPU] SIFoldOperands: remove an unneeded isReg check. NFC. 2021-04-08 16:37:43 +01:00
Jay Foad a250e91d10 [AMDGPU] SIFoldOperands: make use of emplace_back. NFC. 2021-04-08 14:34:10 +01:00
Jay Foad 2724b57ecd [AMDGPU] SIFoldOperands: remove an unneeded make_early_inc_range. NFC. 2021-04-08 14:32:36 +01:00
Jay Foad c28f79a0e3 [AMDGPU] SIFoldOperands: try harder to fold cndmask instructions
Look through copies to find more cases where the two values being
selected are identical. The motivation for this is just to be able to
remove the weird special case where tryFoldCndMask was called from
foldInstOperand, part way through folding a move-immediate into its
users, without regressing any lit tests.
2021-04-08 14:26:12 +01:00
Florian Hahn e4de3cdf3d [LV] Pass VPWidenPHIRecipe to widenPHIInstruction (NFC).
Instead of passing the start value and the defined value to
widenPHIInstruction, pass the VPWidenPHIRecipe directly, which can be
used to get both (and more in future patches).
2021-04-08 14:25:10 +01:00
Joseph Tremoulet b785e03612 Support: mapped_file_region: Pass MAP_NORESERVE to mmap
This allows mapping larger files, delaying OOM failures until too many
pages of them are accessed.  This is makes the behavior of the
mapped_file_region in this regard consistent between its "Unix" and
"Windows" implementations.

Guard the code witih #if defined(MAP_NORESERVE), consistent with other
uses of MAP_NORESERVE in llvm-project, because some FreeBSD versions do
not provide this flag.

Reviewed By: clayborg

Differential Revision: https://reviews.llvm.org/D96626
2021-04-08 09:07:25 -04:00
Jay Foad 3344cd3a14 [AMDGPU] SIFoldOperands: make tryFoldCndMask a member function. NFC. 2021-04-08 14:05:29 +01:00
Sebastian Neubauer c10cc4ea27 [AMDGPU] Fix computing live registers in prolog
ScratchExecCopy needs to be marked as live, we cannot use that register
while EXEC is stored in there.

Marking SGPRForFPSaveRestoreCopy and SGPRForBPSaveRestoreCopy as
available is unnecessary, they should not be live at that point anway.

Differential Revision: https://reviews.llvm.org/D100098
2021-04-08 14:52:50 +02:00
Paul C. Anagnostopoulos 14580ce2fd [TableGen] Make behavior of list slice suffix consistent across all values
Differential Revision: https://reviews.llvm.org/D99883
2021-04-08 08:38:44 -04:00
Paul C. Anagnostopoulos 3b9a15d910 [TableGen] Add support for the 'assert' statement in multiclasses 2021-04-08 08:36:03 -04:00
David Sherwood 1206313f82 [CodeGen][AArch64] Fix isel crash for truncating FP stores
When attempting to truncate a FP vector and store the result out
to memory we crashed because we had no pattern for truncating FP
stores. In fact, we don't support these types of stores and the
correct fix is to stop marking these truncating stores as legal.

Tests have been added here:

  CodeGen/AArch64/sve-fptrunc-store.ll

Differential Revision: https://reviews.llvm.org/D100025
2021-04-08 13:21:29 +01:00
Jay Foad 94a6fe43de [AMDGPU] SIFoldOperands: refactor tryFoldCndMask with early-outs. NFC. 2021-04-08 13:16:07 +01:00
Stephen Tozer 140757bfaa [DebugInfo] Prevent invalid debug info being produced during LoopStrengthReduce
During LoopStrengthReduce, some of the SSA values that are used by debug values
may be lost and/or salvaged. After LSR we attempt to recover any undef debug
values, including any that were salvaged but then lost their values afterwards,
by replacing the lost values with any live equal values (plus a possible
constant offset) that have been gathered prior to running LSR. When we do this
we restore the debug value's original DIExpression, to undo any salvaging (as we
have gone back to using the original debug value).

This process can currently produce invalid debug info if the number of operands
has changed by salvaging during LSR. Replacing old values during the
applyEqualValues step does not change the number of location operands, which
means that when we restore the old DIExpression we may have a mismatch between
the number of operands used by the debug value and the number of operands
referenced by the DIExpression. This patch fixes this by restoring the full
original location metadata at the start of the applyEqualValues step, so that
there is no mismatch in operand count between the debug value and its
DIExpression.

Differential Revision: https://reviews.llvm.org/D98644
2021-04-08 13:04:48 +01:00
Mikael Holmen 2a1f87167c [NVPTX] Fix compiler warning in NDEBUG build [NFC]
Without the fix we get

../lib/Target/NVPTX/NVPTXLowerArgs.cpp:236:24: error: lambda capture 'Arg' is not used [-Werror,-Wunused-lambda-capture]
  auto IsALoadChain = [Arg](Value *Start) {
                       ^~~
1 error generated.
2021-04-08 13:21:21 +02:00
Serguei Katkov a0e8738d45 [GreedyRA ORE] Add function level spill/reloads stats
Reviewers: reames, MatzeB, anemet, thegameg
Reviewed By: reames, thegameg
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D100014
2021-04-08 16:55:52 +07:00
David Green 8675ef100f [LV] Logical and/or select costs
D99674 stopped the folding of certain select operations into and/or, due
to incorrect folding in the presence of poison. D97360 added some costs
to attempt to account for the change, but only worked at the getUserCost
level, not the getCmpSelInstrCost that the vectorizer will use directly.
This adds similar logic into the vectorizer to handle these logical
and/or selects, treating them like and/or directly.

This fixes 60% performance regressions from code like the attached test
case.

Differential Revision: https://reviews.llvm.org/D99884
2021-04-08 10:39:47 +01:00
Fraser Cormack a5693445ca [RISCV] Support OR/XOR/AND reductions on vector masks
This patch adds RVV codegen support for OR/XOR/AND reductions for both
scalable- and fixed-length vector types. There are a few possible
codegen strategies for each -- vmfirst.m, vmsbf.m, and vmsif.m could be
used to some extent -- but the vpopc.m instruction was chosen since it
produces the scalar result in one instruction, after which scalar
instructions can finish off the computation.

The reductions are lowered identically for both scalable- and
fixed-length vectors, although some alternate strategies may be more
optimal on fixed-length vectors since it's cheaper to get the length of
those types.

Other reduction types were not deemed to be relevant for mask vectors.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D100030
2021-04-08 09:46:38 +01:00
LemonBoy edb18ea5a9 [AsmParser] Recognize more escaped characters between single quotes
The GNU AS manual states the following about single-character constants enclosed within single quotes:

>  Some backslash escapes apply to characters, \b, \f, \n, \r, \t, and \" with the same meaning as for strings, plus \' for a single quote.

Add two more characters to the switch handling this case to match GAS behaviour, plus a test to make sure nothing regresses.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D99609
2021-04-08 09:59:37 +02:00
Serguei Katkov 6b64c662c7 [GreedyRA ORE] Extract computeNumberOfSplillsReloads to use in different places. NFC.
Extract one basic block handling to introduce stat computation for method scope.

Reviewers: reames, MatzeB, anemet, thegameg
Reviewed By: reames, thegameg
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D100013
2021-04-08 14:40:45 +07:00
Serguei Katkov df25787797 [GreedyRA ORE] Extract stats in RAGreedyStats struct. NFC.
Combine all collected stats into separate struct RAGreedyStats
with add and report methods.

The motivation is to extend the number of statistics to capture and instead of
adding new parameters, just combine all of them into one structure.
Additionally I plan to use report from different places in future to report data
for function as well.

Reviewers: reames, MatzeB, anemet, thegameg
Reviewed By: thegameg
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D100012
2021-04-08 14:27:37 +07:00
Serguei Katkov 0a1c6637a1 [GreedyRA ORE] Compute ORE stats if extra analysis is enabled
To save compile time, avoid computation of stats if ORE will not emit it.
The motivation is to add more stats and compute it only if it will dumped.

Reviewers: reames, MatzeB, anemet, thegameg
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D100010
2021-04-08 14:24:18 +07:00
Esme-Yi 0c36da722a [Debug-Info] Use inlined strings in .dwinfo section by default for DBX.
Summary: Set the default DwarfInlinedStrings as inlined strings for DBX, due to DBX does not support .dwstr section for now.

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D99933
2021-04-08 07:20:22 +00:00
Hsiangkai Wang ba72bdef32 [RISCV] Add scalable offset under very large stack size.
If the stack size is larger than 12 bits, we have to use a scratch
register to store the stack size. Before we introduce the scalable stack
offset, we could simplify

%0 = ADDI %stack.0, 0

=>

%scratch = ... # sequence of instructions to move the offset into
%%scratch
%0 = ADD %fp, %scratch

However, if the offset contains scalable part, we need to consider it.

%0 = ADDI %stack.0, 0

=>

%scratch = ... # sequence of instructions to move the offset into
%%scratch
%scratch = ADD %fp, %scratch
%scalable_offset = ... # sequence of instructions for vscaled-offset.
%0 = ADD/SUB %scratch, %scalable_offset

Differential Revision: https://reviews.llvm.org/D100035
2021-04-08 14:46:05 +08:00
Juneyoung Lee 648544f998 [Constant] ConstantStruct/Array should not lower poison to undef
This is a (late) follow-up patch of 8871a4b4ca and
c95f39891a to make ConstantStruct::get/ConstantArray::getImpl
correctly return PoisonValue if all elements are poison.
This was found while discussing about the elements of a vector-typed UndefValue (D99853)
2021-04-08 15:23:12 +09:00
Hongtao Yu 2a2720a2de [CSSPGO] Move pseudo probes to the beginning of a block to unblock SelectionDAG combine.
Pseudo probes, when scattered in a block, can be chained dependencies of other regular DAG nodes and block DAG combine optimizations. To fix this, scattered probes in a block are grouped and placed at the beginning of the block. This shouldn't affect the profile quality.

Test Plan:

Reviewed By: wenlei, wmi

Differential Revision: https://reviews.llvm.org/D100002
2021-04-07 22:45:35 -07:00
Serge Pavlov 65b1103798 [RISCV] DAG nodes and pseudo instructions for CSR access
New custom DAG nodes were added to represent operations on CSR. These
nodes are lowered to corresponding pseudo instruction. Using the pseudo
instructions allows to specify different scheduling information for
operations on different system registers. It also make possible to
specify dependencies of instructions on specific system registers.

Differential Revision: https://reviews.llvm.org/D98936
2021-04-08 10:36:36 +07:00
hsmahesha ac64995ceb [AMDGPU] Only use ds_read/write_b128 for alignment >= 16
PS: Submitting on behalf of Jay.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D100008
2021-04-08 08:12:05 +05:30
Chen Zheng 74e77295e7 [PowerPC] fixup killed flags for ri + addi to ri transformation
Fixup killed flags if DefMI and MI are not in the same basic blocks.

Reviewed By: nemanjai

Differential Revision: https://reviews.llvm.org/D100023
2021-04-07 22:04:08 -04:00
Congzhe Cao 593cb46550 Revert "[LoopInterchange] Fix transformation bugs in loop interchange"
This reverts commit 6ec68bd815d00c1eec2a6b9766452554f0e6cb61.
2021-04-07 21:17:30 -04:00
CongzheUalberta f5645ea65f [LoopInterchange] Fix transformation bugs in loop interchange
After loop interchange, the (old) outer loop header should not jump to
`LoopExit`. Note that the old outer loop becomes the new inner loop
after interchange. If we branched to `LoopExit` then after interchange
we would jump directly from the (new) inner loop header to `LoopExit`
without executing the rest of (new) outer loop.

This patch modifies adjustLoopBranches() such that the old outer
loop header (which becomes the new inner loop header) jumps to the
old inner loop latch which becomes the new outer loop latch after
interchange.

Reviewed By: bmahjour

Differential Revision: https://reviews.llvm.org/D98475
2021-04-07 20:55:44 -04:00
Stanislav Mekhanoshin 37878de503 Disable use of SCC bit from asm
Differential Revision: https://reviews.llvm.org/D100069
2021-04-07 15:32:17 -07:00
Tony Tye 4658cd4c18 [AMDGPU] Update gfx90a memory model support
Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D100070
2021-04-07 22:17:58 +00:00
Stanislav Mekhanoshin d5d412f2ae [AMDGPU] Split GCNRegBankReassign
Allow pass to work separately with SGPR, VGPR registers or both.
This is NFC now but will be needed to split RA for separate
SGPR and VGPR passes.

Differential Revision: https://reviews.llvm.org/D100063
2021-04-07 14:45:13 -07:00
Sanjay Patel c0bbd0cc35 [InstCombine] fold not ops around min/max intrinsics
This is another step towards parity with the existing
cmp+select folds (see D98152).
2021-04-07 17:31:36 -04:00
Craig Topper 56ea2e2fdd [RISCV] Add a special case to lowerSELECT for select of 2 constants with a SETLT condition.
If the constants have a difference of 1 we can convert one to
the other by adding or subtracting the condition.

We have a DAG combine for this, but it only runs before type
legalization. If the select is introduced later during type
legalization or op legalization we will miss it.

We don't need a specific condition, but some conditions are
harder to materialize than others on RISCV. I know that SETLT
will be a single instruction and it is what is used by the
motivating pattern from signed saturating add/sub.

Differential Revision: https://reviews.llvm.org/D99021
2021-04-07 13:47:17 -07:00
Craig Topper 9895285191 [RISCV] Replace 'return ReplaceNode' with 'ReplaceNode; return;' NFC
ReplaceNode is a void function as is the function that we were
doing this in. While this is valid code, it was a bit confusing.
2021-04-07 12:18:41 -07:00
Jonas Hahnfeld 6415f424bc [AArch64] Materialize FP constant in code for large code model
When using the large code model with FastISel (for example via
clang -O0 which adds the optnone attribute), FP constants could
still be materialized using adrp + ldr. Unconditionally enable
the existing path for MachO to materialize the constant in code.

For testing, restore literal_pools_float.ll to exercise the constant
pool and add two optnone-functions that return a float and a double,
respectively. Consolidate fpimm.ll and add a new fast-isel-fpimm.ll
to check the code paths taken with FastISel.

Differential Revision: https://reviews.llvm.org/D99607
2021-04-07 21:02:05 +02:00
Arthur Eubanks 90af134473 Revert "[AsmPrinter] Delete dead takeDeletedSymbsForFunction()"
This reverts commit 9583a3f262.

This wasn't NFC as initially thought. Needed for D99707.
2021-04-07 11:40:44 -07:00
Abhina Sreeskantharajan 5c8462b5da [Windows] Remove global OF_None flag for Windows in ToolOutputFiles
Since we have created a new OF_TextWithCRLF flag, we no longer need to worry about OF_Text flag turning on CRLF translation. I can remove this workaround I added to globally open all ToolOutputFiles as binary on Windows.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D100034
2021-04-07 14:10:04 -04:00
Craig Topper f087d7544a [RISCV] Support vslide1up/down intrinsics for SEW=64 on RV32.
This can't use our normal strategy of splatting the scalar and using
a .vv operation instead of .vx.

Instead this patch bitcasts the vector to the equivalent SEW=32
vector and inserts the scalar parts using two vslide1up/down. We
do that unmasked and apply the mask separately at the end with
a vmerge.

For vslide1up there maybe some other options here like getting
i64 into element 0 and using vslideup.vi with this vector as
vd and the original source as vs1. Masking would still need to
be done afterwards.

That idea doesn't work for vslide1down. We need to slidedown and
then insert a single scalar at vl-1 which we could do with a
vslideup, but that assumes vl > 0 which I don't think we can assume.

The i32 double slide1down implemented here is the best I could come
up with and I just made vslide1up consistent.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D99910
2021-04-07 10:44:53 -07:00
Craig Topper 67953311e2 [SelectionDAG] Teach SelectionDAG::FoldConstantArithmetic to handle SPLAT_VECTOR
This allows FoldConstantArithmetic to handle SPLAT_VECTOR in
addition to BUILD_VECTOR. This allows it to support scalable
vectors. I'm also allowing fixed length SPLAT_VECTOR which is
used by some targets, but I'm not familiar enough to write tests
for those targets.

I had to block this function from running on CONCAT_VECTORS to
avoid calling getNode for a CONCAT_VECTORS of 2 scalars.
This can happen because the 2 operand getNode calls this
function for any opcode. Previously we were protected because
CONCAT_VECTORs of BUILD_VECTOR is folded to a larger BUILD_VECTOR
before that call. But it's not always possible to fold a CONCAT_VECTORS
of SPLAT_VECTORs, and we don't even try.

This fixes PR49781 where DAG combine thought constant folding
should be possible, but FoldConstantArithmetic couldn't do it.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D99682
2021-04-07 10:03:33 -07:00
Craig Topper 5fc0e98d9a [LoopIdiomRecognize] Minor cleanups to the FFS idiom matching. NFC
-Make sure of the CreateShl/LShr/AShr methods that take a uint64_t
instead of creating a ConstantInt for 1 ourselves.
-Use Builder.getInt1 or ConstantInt::getBool instead of a conditional.
-Pull out repeated calls to getType.
2021-04-07 10:03:14 -07:00
Roman Lebedev 24f67473dd
[InstCombine] foldAddWithConstant(): don't deal with non-immediate constants
All of the code that handles general constant here (other than the more
restrictive APInt-dealing code) expects that it is an immediate,
because otherwise we won't actually fold the constants, and increase
instruction count. And it isn't obvious why we'd be okay with
increasing the number of constant expressions,
those still will have to be run..

But after 2829094a8e
this could also cause endless combine loops.
So actually properly restrict this code to immediates.
2021-04-07 19:50:19 +03:00
Sanjay Patel 1894c6c59e [InstCombine] avoid infinite loop from partial undef vectors
This fixes the examples from
D99674 and
https://llvm.org/PR49878

The matchers succeed on partial undef/poison vector constants,
but the transform creates a full 'not' (-1) constant, so it
would undo a demanded vector elements change triggered by the
extractelement.

Differential Revision: https://reviews.llvm.org/D100044
2021-04-07 12:18:12 -04:00
wlei 6d5132b426 [CSSPGO] Fix incorrect probe distribution factor computation in top-down inliner
We see a regression related to low probe factor(0.01) which prevents some callsites being promoted in ICPPass and later cause the missing inline in CGSCC inliner. The root cause is due to redundant(the second) multiplication of the probe factor and this change try to fix it.

`Sum` does multiply a factor right after findCallSamples but later when using as the parameter in setProbeDistributionFactor, it multiplies one again.

This change could get ~2% perf back on mcf benchmark. In mcf, previously the corresponding factor is 1 and it's the recent feature introducing the <1 factor then trigger this bug.

Reviewed By: hoy, wenlei

Differential Revision: https://reviews.llvm.org/D99787
2021-04-07 08:48:59 -07:00
Abhina Sreeskantharajan 1bcf58b213 [SystemZ][z/OS][TableGen] TableGen files should be text
This patch sets tablegen files as text. It should have no effect on Windows after this patch landed https://reviews.llvm.org/rG82b3e28e836d2f5c8cfd6e1047b93c088522365a.

Reviewed By: anirudhp

Differential Revision: https://reviews.llvm.org/D100036
2021-04-07 11:23:00 -04:00
Sam Clegg f23b259e18 [WebAssembly] Improve error messages regarding missing indirect function table. NFC
Use report_fatal_error here since this is an internal error, and not
something the user can/should be trying to fix.

Also distinguish between the symbol being missing and the symbol having
the wrong type.

We have a failure internally where the symbol is missing.  Currently
trying to reduce the test case to something we can attach to an llvm
bug.

Differential Revision: https://reviews.llvm.org/D99960
2021-04-07 07:58:43 -07:00
Sebastian Neubauer 2dc6be5209 [AMDGPU] Update SGPRSpillVGPRCSR name. NFC
The struct is used for both, callee and caller-save registers now.
The frame index is not set for entrypoints, as we do not need to save
the registers then.
Update the struct name to reflect that.

Differential Revision: https://reviews.llvm.org/D99722
2021-04-07 16:30:40 +02:00
Jingu Kang 798b0fd36b [NPM] Fix typo inisLTOPreLink for loop rotate
Differential Revision: https://reviews.llvm.org/D100033
2021-04-07 15:08:37 +01:00
Simon Pilgrim 302e748065 [X86] Improve optimizeCompareInstr for signed comparisons after AND/OR/XOR instructions
Extend D94856 to handle 'and', 'or' and 'xor' instructions as well

We still fail on many i8/i16 cases as the test and the logic-op are performed on different widths
2021-04-07 14:28:42 +01:00
Alexey Bataev a78e86e6be [SLP]Avoid multiple attempts to vectorize CmpInsts.
No need to lookup through and/or try to vectorize operands of the
CmpInst instructions during attempts to find/vectorize min/max
reductions. Compiler implements postanalysis of the CmpInsts so we can
skip extra attempts in tryToVectorizeHorReductionOrInstOperands and save
compile time.

Differential Revision: https://reviews.llvm.org/D99950
2021-04-07 06:15:42 -07:00
Jay Foad bf6cab6f07 [AMDGPU] SIFoldOperands: don't dump extra '\n' after MachineInstr. NFC. 2021-04-07 14:13:00 +01:00
Sanjay Patel 0333ed8e0c [InstCombine] move abs transform to helper function; NFC
The swap of the operands can affect later transforms that
are expecting a constant as operand 1. I don't think we
can trigger a bug with the current code, but I hit that
problem while drafting a new transform for min/max intrinsics.
2021-04-07 08:35:07 -04:00
Simon Pilgrim 583258723f [X86] Improve optimizeCompareInstr for signed comparisons after BZHI instructions
Extend D94856 to handle 'bzhi' instructions as well
2021-04-07 12:07:26 +01:00
Yevgeny Rouban 3e738afae4 [Statepoint Lowering] Allow other than N byte sized types in deopt bundle
I do not see any bit-width restriction from the point of the
LLVM Lang Ref - Operand Bundles on the types of the deopt bundle
operands. Statepoint Lowering seems to be able to work with any
types.
This patch relaxes the two related assertions and adds a new test
for this change.

Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D100006
2021-04-07 17:48:31 +07:00
Roman Lebedev 2829094a8e
Reland [InstCombine] Fold `((X - Y) - Z)` to `X - (Y + Z)` (PR49858)
This reverts commit a547b4e26b,
relanding commit 31d219d299,
which was reverted because there was a conflicting inverse transform,
which was causing an endless combine loop, which has now been adjusted.

Original commit message:

https://alive2.llvm.org/ce/z/67w-wQ

We prefer `add`s over `sub`, and this particular xform
allows further folds to happen:

Fixes https://bugs.llvm.org/show_bug.cgi?id=49858
2021-04-07 12:06:25 +03:00
Roman Lebedev 93d1d94b74
[InstCombine] Restrict "C-(X+C2) --> (C-C2)-X" fold to immediate constants
I.e., if any/all of the consants is an expression, don't do it.
Since those constants won't reduce into an immediate,
but would be left as an constant expression, they could cause
endless combine loops after 31d219d299
added an inverse transformation.
2021-04-07 12:06:24 +03:00
Qiu Chaofan 033c9c2552 [PowerPC] Fix use check of swap-reduction
This will fix swap-reduction in DAGISel for cases where COPY_TO_REGCLASS
has multiple uses.
2021-04-07 15:55:52 +08:00
Max Kazantsev fee330824a [SCEV] Fix false-positive recognition of simple recurrences. PR49856
A value from reachable block may come to a Phi node as its input from
unreachable block. This may confuse matchSimpleRecurrence  which
has no access to DomTree and can falsely recognize something as a recurrency
because of this effect, as the attached test shows.

Patch `ae7b1e` deals with half of this problem, but it only accounts from
the case when an unreachable instruction comes to Phi as an input.

This patch provides a generalization by checking that no Phi block's
predecessor is unreachable (no matter what the input is).

Differential Revision: https://reviews.llvm.org/D99929
Reviewed By: reames
2021-04-07 13:55:17 +07:00
Petr Hosek a547b4e26b Revert "[InstCombine] Fold `((X - Y) - Z)` to `X - (Y + Z)` (PR49858)"
This reverts commit 31d219d299 which
causes an infinite loop when compiling the XRay runtime.
2021-04-06 22:30:28 -07:00
Jonas Devlieghere 162c2759b6 [dsymutil] Stop emulating dsymutil-classic CIE caching behavior
Stop emulating dsymutil-classic which only cached the last used CIE for
reuse.
2021-04-06 20:15:41 -07:00
Jonas Devlieghere 233c24330b [dsymutil] Don't keep old abbreviations
Don't keep the old abbreviations around. This code existed for
compatibility with dsymutil-classic.
2021-04-06 19:50:17 -07:00
Jonas Devlieghere 5d07dc8977 [dsymutil] Don't emit .debug_pubnames and .debug_pubtypes
Consider the .debug_pubnames and .debug_pubtypes their own kind of
accelerator and stop emitting them together with the Apple-style
accelerator tables. The only reason we were still emitting both was for
(byte-for-byte) compatibility with dsymutil-classic.

 - This patch adds a new accelerator table kind "Pub" which can be
   specified with --accelerator=Pub.
 - This patch removes the ability to emit both pubnames/types and apple
   style accelerator tables. I don't think anyone is relying on that but
   it's worth pointing out.
 - This patch removes the --minimize option and makes this behavior the
   default. Specifying the flag will result in a warning but won't abort
   the program.

Differential revision: https://reviews.llvm.org/D99907
2021-04-06 19:01:45 -07:00
Yevgeny Rouban b5c63e30ca [NewPM] Set verify-cfg-preserved=1 by default for debug builds 2021-04-07 08:34:30 +07:00
Craig Topper 01a23dccb1 [RISCV] Add an assertion to the ReplaceNodeResults handling of bitcasts to make sure the VT is always a scalar integer. 2021-04-06 16:48:40 -07:00
Nicolás Alvarez a1aada75f5 [docs] Fix doxygen comments wrongly attached to the llvm namespace
Looking at the Doxygen-generated documentation for the llvm namespace
currently shows all sorts of random comments from different parts of the
codebase. These are mostly caused by:

- File doc comments that aren't marked with \file, so they're attached to
  the next declaration, which is usually "namespace llvm {".
- Class doc comments placed before the namespace rather than before the
  class.
- Code comments before the namespace that (in my opinion) shouldn't be
  extracted by doxygen at all.

This commit fixes these comments. The generated doxygen documentation now
has proper docs for several classes and files, and the docs for the llvm
and llvm::detail namespaces are now empty.

Reviewed By: thakis, mizvekov

Differential Revision: https://reviews.llvm.org/D96736
2021-04-07 01:20:18 +02:00
Craig Topper 2641c1f15e [RISCV] Don't custom type legalize fixed vector to scalar integer bitcasts if the fixed vector type isn't legal.
We encountered a hang in our internal code base. I'm having trouble
creating a test case because the test that hit it was testing some
code that is not upstream.
2021-04-06 15:00:33 -07:00
Sidharth Baveja d81d9e8b86 [SplitEdge] Update SplitCriticalEdge to return a nullptr only when the edge is not critical
Summary:
The function SplitCriticalEdge (called by SplitEdge) can return a nullptr in
cases where the edge is a critical. SplitEdge uses SplitCriticalEdge assuming it
can always split all critical edges, which is an incorrect assumption.

The three cases where the function SplitCriticalEdge will return a nullptr is:
1. DestBB is an exception block
2. Options.IgnoreUnreachableDests is set to true and
isa(DestBB->getFirstNonPHIOrDbgOrLifetime()) is not equal to a nullptr
3. LoopSimplify form must be preserved (Options.PreserveLoopSimplify is true)
and it cannot be maintained for a loop due to indirect branches

For each of these situations they are handled in the following way:
1. Modified the function ehAwareSplitEdge originally from
llvm/lib/Transforms/Coroutines/CoroFrame.cpp to handle the cases when the DestBB
is an exception block. This function is called directly in SplitEdge.
SplitEdge does not call SplitCriticalEdge in this case
2. Options.IgnoreUnreachableDests is set to false by default, so this situation
does not apply.
3. Return a nullptr in this situation since the SplitCriticalEdge also returned
nullptr. Nothing we can do in this case.

Reviewed By: asbirlea

Differential Revision:https://reviews.llvm.org/D94619
2021-04-06 21:24:40 +00:00
Philip Reames 4bf8985f4f Replace calls to IntrinsicInst::Create with CallInst::Create [nfc]
There is no IntrinsicInst::Create.  These are binding to the method in the super type.  Be explicitly about which method is being called.
2021-04-06 13:23:58 -07:00
Philip Reames 908215b346 Use AssumeInst in a few more places [nfc]
Follow up to a6d2a8d6f5.  These were found by simply grepping for "::assume", and are the subset of that result which looked cleaner to me using the isa/dyn_cast patterns.
2021-04-06 13:18:53 -07:00
Artem Belevich d0615a93bb [NVPTX] Handle bitcast and ASC(101) when trying to avoid argument copy.
This allows us to skip the copy in few more cases.

Differential Revision: https://reviews.llvm.org/D99979
2021-04-06 13:06:00 -07:00
Philip Reames 9ef6aa020b Plumb AssumeInst through operand bundle apis [nfc]
Follow up to a6d2a8d6f5.  This covers all the public interfaces of the bundle related code.  I tried to cleanup the internals where the changes were obvious, but there's definitely more room for improvement.
2021-04-06 12:53:53 -07:00
Luís Marques 0c3bc1f3a4 [ASan][RISCV] Fix RISC-V memory mapping
Fixes the ASan RISC-V memory mapping (originally introduced by D87580 and
D87581). This should be an improvement both in terms of first principles
soundness and observed test failures --- test failures would occur
non-deterministically depending on the ASLR random offset.

On RISC-V Linux (64-bit), `TASK_UNMAPPED_BASE` is currently defined as
`PAGE_ALIGN(TASK_SIZE / 3)`. The non-power-of-two divisor makes the result
be the not very round number 0x1555556000. That address had to be further
rounded to ensure page alignment after the shadow scale shifting is applied.
Still, that value explains why the mapping table may look less regular than
expected.

Further cleanups:
- Moved the mapping table comment, to ensure that the two Linux/AArch64
tables stayed together;
- Removed mention of Sv48. Neither the original mapping nor this one are
compatible with an actual Linux Sv48 address space (mainline Linux still
operates Sv48 in Sv39 mode). A future patch can improve this;
- Removed the additional comments, for consistency.

Differential Revision: https://reviews.llvm.org/D97646
2021-04-06 20:46:17 +01:00
Amy Kwan bd6033eca7 [PowerPC] Materialize 34-bit constants with pli directly
Previously, 34-bit constants were materialized in selectI64Imm(), and we relied
on td pattern matching to instead produce a pli. This becomes problematic as
there is no guarantee that the 34-bit constant will reach the td pattern
selection for pli. It is also possible for other transformations (such as complex
bit permutations) to also produce and utilize the 34-bit constant materialized
through selectI64Imm().

This patch instead produces pli on Power10 directly whenever the constant fits
within 34-bits.

Differential Revision: https://reviews.llvm.org/D99906
2021-04-06 13:38:11 -05:00
Fangrui Song a7ef45bc5c [NewPM] Fix -Wunused-lambda-capture in -DLLVM_ENABLE_ASSERTIONS=off builds after D91327 2021-04-06 11:30:20 -07:00
Philip Reames fb41cae039 More precisely type code used for gc.relocate assertions [nfc] 2021-04-06 11:27:36 -07:00
Philip Reames a6d2a8d6f5 Add a subclass of IntrinsicInst for llvm.assume [nfc]
Add the subclass, update a few places which check for the intrinsic to use idiomatic dyn_cast, and update the public interface of AssumptionCache to use the new class.  A follow up change will do the same for the newer assumption query/bundle mechanisms.
2021-04-06 11:16:22 -07:00
Arthur Eubanks 4e83e59eb8 [GVN] Add missing ICF update
performScalarPREInsertion() inserts instructions into blocks that we
need to tell ImplicitControlFlowTracking about, otherwise the ICF cache
may be invalid.

Fixes PR49193.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D99909
2021-04-06 10:13:42 -07:00
Florian Hahn 4059c1c32d [SimplifyInst] Use correct type for GEPs with vector indices.
The current code does not properly handle vector indices unless they are
the first index.

At the moment LangRef gives the impression that the vector index must be
the one and only index (https://llvm.org/docs/LangRef.html#getelementptr-instruction).

But vector indices can appear at any position and according to the
verifier there may be multiple vector indices. If that's the case, the
number of elements must match.

This patch updates SimplifyGEPInst to properly handle those additional
cases.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D99961
2021-04-06 17:56:10 +01:00
Craig Topper 3ae03f67fe [RISCV] Add helper function to share some of the code for isel of vector load/store intrinsics.
Many of the operands are handled the same or in the same order
for all these intrinsics. Factor out the code for selecting and
pushing them into the Operands vector.

Differential Revision: https://reviews.llvm.org/D99923
2021-04-06 09:54:24 -07:00
Jay Foad 8f798566a3 [AMDGPU] SIFoldOperands: use isUseMIInFoldList. NFC. 2021-04-06 17:53:48 +01:00
Paul Robinson 04b3c8c52c Pass -fcrash-diagnostics-dir along to LLVM
This allows frontend and backend diagnostic files to all go into the
same place.  Have it control the Windows (mini-)dump location.

Differential Revision: https://reviews.llvm.org/D99199
2021-04-06 09:30:52 -07:00
Victor Huang f98567b3fe [AIX][TLS] Add support for TLS variables to XCOFF object writer
This patch adds support for TLS variables to the XCOFF object writer:
- Add TData and TBSS sections
- Add CsectGroups for the mapping classes XCOFF::XMC_TL and XCOFF::XMC_UL
- Add XMC_UL in the enum entry of CsectStorageMapping class to print the string
  while reading the symbol properties for TLS variables
- Fix the starting address of TData and TBSS sections

Reviewed by: hubert.reinterpretcast, DiggerLin

Differential Revision: https://reviews.llvm.org/D98946
2021-04-06 10:46:07 -05:00
Simon Pilgrim 53283cc2f1 [X86][SSE] canonicalizeShuffleWithBinOps - add MOVSD/MOVSS handling. 2021-04-06 16:42:18 +01:00
Philip Reames 21d4839948 Move GCRelocateInst and GCResultInst to IntrinsicInst.h [nfc]
These two are part of the IntrinsicInst class hierarchy and it helps to cut down on some redundant includes.
2021-04-06 08:33:15 -07:00
Konstantin Zhuravlyov 844012940e AMDGPU: Add isBranch=1 to SOPP branch instructions
Differential Revision: https://reviews.llvm.org/D99955
2021-04-06 10:59:30 -04:00
Philip Reames 52ecd94cfb Remove last remnants of PR49607 migration [NFC]
The key change (4f5e92c) to switch gc.result and gc.relocate to being readnone landed nearly two weeks ago, and we haven't seen any fallout.  Time to remove the code added to make reverting easy.
2021-04-06 07:56:55 -07:00
Jan Svoboda fb6a5237aa Revert "[IR] Ignore bitcasts of function pointers which are only used as callees in callbase instruction"
This reverts commit 167ea67d

This causes a bunch of build failures:
* http://lab.llvm.org:8011/#/builders/121/builds/6287
* http://green.lab.llvm.org/green/job/clang-stage1-RA/19915
2021-04-06 16:33:28 +02:00
Benjamin Kramer ce4acb01b3 Avoid unused variable warning in Release builds 2021-04-06 16:25:19 +02:00
Jay Foad efc7bf27f5 [AMDGPU] SIFoldOperands: use MachineRegisterInfo::hasOneNonDBGUser
NFC.
2021-04-06 15:23:58 +01:00
Jay Foad 005dcd196e [AMDGPU] SIFoldOperands: use range-based loops and make_early_inc_range
NFC.
2021-04-06 15:23:58 +01:00
Jay Foad ce9cca6c3a [AMDGPU] SIFoldOperands: rename tryFoldInst to tryFoldCndMask
This follows the pattern of the other tryFold* functions. NFC.
2021-04-06 15:23:58 +01:00
Jay Foad cf4f5292f6 [AMDGPU] SIFoldOperands: use getVRegDef instead of getUniqueVRegDef
We are in SSA so getVRegDef is equivalent but simpler. NFC.
2021-04-06 15:23:58 +01:00
Jay Foad e9608a84d8 [AMDGPU][SDag] Add IMG init also for image_gather4 instructions
This fixes an oversight in D99747 which moved the IMG init code from
SIAddIMGInit to AdjustInstrPostInstrSelection, but did not set the
hasPostISelHook flag on gather4 instructions.

Differential Revision: https://reviews.llvm.org/D99953
2021-04-06 14:47:20 +01:00
Kerry McLaughlin 7344f3d39a [LoopVectorize] Add strict in-order reduction support for fixed-width vectorization
Previously we could only vectorize FP reductions if fast math was enabled, as this allows us to
reorder FP operations. However, it may still be beneficial to vectorize the loop by moving
the reduction inside the vectorized loop and making sure that the scalar reduction value
be an input to the horizontal reduction, e.g:

  %phi = phi float [ 0.0, %entry ], [ %reduction, %vector_body ]
  %load = load <8 x float>
  %reduction = call float @llvm.vector.reduce.fadd.v8f32(float %phi, <8 x float> %load)

This patch adds a new flag (IsOrdered) to RecurrenceDescriptor and makes use of the changes added
by D75069 as much as possible, which already teaches the vectorizer about in-loop reductions.
For now in-order reduction support is off by default and controlled with the `-enable-strict-reductions` flag.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D98435
2021-04-06 14:45:34 +01:00
Simon Pilgrim 1dcb5b5e89 [X86] Improve optimizeCompareInstr for signed comparisons after ANDN instructions
Extend D94856 to handle 'andn' instructions as well
2021-04-06 14:16:16 +01:00
Roman Lebedev 31d219d299
[InstCombine] Fold `((X - Y) - Z)` to `X - (Y + Z)` (PR49858)
https://alive2.llvm.org/ce/z/67w-wQ

We prefer `add`s over `sub`, and this particular xform
allows further folds to happen:

Fixes https://bugs.llvm.org/show_bug.cgi?id=49858
2021-04-06 15:58:14 +03:00
Simon Pilgrim b8aba76a4e LoopFlatten - CanWidenIV - Fix uninitialized variable warnings and use for-range loop. NFCI.
Fix static analysis uninitialized variable warnings, and use for-range loop iteration across WideIVs array.
2021-04-06 12:24:20 +01:00
Abhina Sreeskantharajan 82b3e28e83 [SystemZ][z/OS][Windows] Add new OF_TextWithCRLF flag and use this flag instead of OF_Text
Problem:
On SystemZ we need to open text files in text mode. On Windows, files opened in text mode adds a CRLF '\r\n' which may not be desirable.

Solution:
This patch adds two new flags

  - OF_CRLF which indicates that CRLF translation is used.
  - OF_TextWithCRLF = OF_Text | OF_CRLF indicates that the file is text and uses CRLF translation.

Developers should now use either the OF_Text or OF_TextWithCRLF for text files and OF_None for binary files. If the developer doesn't want carriage returns on Windows, they should use OF_Text, if they do want carriage returns on Windows, they should use OF_TextWithCRLF.

So this is the behaviour per platform with my patch:

z/OS:
OF_None: open in binary mode
OF_Text : open in text mode
OF_TextWithCRLF: open in text mode

Windows:
OF_None: open file with no carriage return
OF_Text: open file with no carriage return
OF_TextWithCRLF: open file with carriage return

The Major change is in llvm/lib/Support/Windows/Path.inc to only set text mode if the OF_CRLF is set.
```
  if (Flags & OF_CRLF)
    CrtOpenFlags |= _O_TEXT;
```

These following files are the ones that still use OF_Text which I left unchanged. I modified all these except raw_ostream.cpp in recent patches so I know these were previously in Binary mode on Windows.
./llvm/lib/Support/raw_ostream.cpp
./llvm/lib/TableGen/Main.cpp
./llvm/tools/dsymutil/DwarfLinkerForBinary.cpp
./llvm/unittests/Support/Path.cpp
./clang/lib/StaticAnalyzer/Core/HTMLDiagnostics.cpp
./clang/lib/Frontend/CompilerInstance.cpp
./clang/lib/Driver/Driver.cpp
./clang/lib/Driver/ToolChains/Clang.cpp

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D99426
2021-04-06 07:23:31 -04:00
Kerry McLaughlin 857b8a73da [LoopVectorize] Change the identity element for FAdd
Changes getRecurrenceIdentity to always return a neutral value of -0.0 for FAdd.

Reviewed By: dmgreen, spatel

Differential Revision: https://reviews.llvm.org/D98963
2021-04-06 12:13:43 +01:00
Florian Hahn a6b06b785c [VPlan] Print VPValue operands for VPWidenPHI if possible.
For VPWidenPHIRecipes that model all incoming values as VPValue
operands, print those operands instead of printing the original PHI.

D99294 updates recipes of reduction PHIs to use the VPValue for the
incoming value from the loop backedge, making use of this new printing.
2021-04-06 12:11:21 +01:00
Dmitry Preobrazhensky 3eadcb86ab [AMDGPU][MC][GFX9] Corrected SMEM decoding
Corrected SMEM decoding when IMM=0 and OFFSET>127

Fixed bug 49819 (https://bugs.llvm.org/show_bug.cgi?id=49819)

Differential Revision: https://reviews.llvm.org/D99804
2021-04-06 14:10:46 +03:00
Simon Pilgrim 201877d572 [CostModel][X86] Improve accuracy of vXi8 multiply reduction costs
After rG47321c311bdbe0145b9bf45d822185c37b19fa50 we promote vXi8 reductions to vXi16 to create a much faster PMULLW mul reduction, followed by a (free) truncation. This avoids the high cost of repeated vXi8 multiplications (which extend+multiply+truncate to/from vXi16 types....).

Fixes the missing vXi8 mul reduction vectorization in PR42674 (Comment #20) 'mul16' test case.
2021-04-06 11:53:22 +01:00
madhur13490 167ea67d76 [IR] Ignore bitcasts of function pointers which are only used as callees in callbase instruction
This patch enhances hasAddressTaken() to ignore bitcasts as a
callee in callbase instruction. Such bitcast usage doesn't really take
the address in a useful meaningful way.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D98884
2021-04-06 09:23:46 +00:00
Simon Pilgrim ddbb58736a [KnownBits] Rename KnownBits::computeForMul to KnownBits::mul. NFCI.
As promised in D98866
2021-04-06 10:11:41 +01:00
Sjoerd Meijer d5f1131c81 [AArch64] Default to zero-cycle-zeroing FP registers
It is generally beneficial to prefer "movi d0, #0" over "fmov s0, wzr" as this
is most efficient across all cores; it is recognised as a zeroing idiom. For
newer cores, fmov instructions can also be eliminated early and there is no
difference with movi, but some implementations lack this so is not true for
other/older cores. Thus this standardises on using movi as this should always
gives the same or better performance than the fmov with wzr.

Differential Revision: https://reviews.llvm.org/D99586
2021-04-06 09:47:50 +01:00
Sjoerd Meijer ef05b08c61 [AArch64] Use 64-bit movi for zeroing halfs/floats
This was using the .2d variant which zeros 128 bits, but using the .2s variant
that zeros 64 bits is faster on some cores.

This is a prep step for D99586 to always using movi for zeroing floats.

Differential Revision: https://reviews.llvm.org/D99710
2021-04-06 08:42:13 +01:00
Yevgeny Rouban 98742e42fc [NewPM] Fix unused lambda capture build error
Fixes commit 39e3e3aa51d: Redesign of PreserveCFG Checker
2021-04-06 13:14:16 +07:00
Yevgeny Rouban 39e3e3aa51 [NewPM] Redesign of PreserveCFG Checker
The reason for the NewPM redesign is described in the commit
  cba3e783389a: [NewPM] Disable PreservedCFGChecker ...

The checker introduces an internal custom CFG analysis that tracks
current up-to date CFG snapshot. The analysis is invalidated along
any other CFG related analysis (the key is CFGAnalyses). If the CFG
analysis is not invalidated at a functional pass exit then the checker
asserts that the CFG snapshot taken from this analysis is equals to
a snapshot of the current CFG.

Along the way:
- the function CFG::printDiff() is simplified by removing function
  name calculation. The name is printed by the caller;
- fixed CFG invalidated condition (see CFG::invalidate());
- StandardInstrumentations::registerCallbacks() gets additional
  optional parameter of type FunctionAnalysisManager*, which is
  needed by the checker to get the custom CFG analysis;
- several PM related tests updated to explicitly set
  -verify-cfg-preserved=1 as they need.

This patch is safe to land as the CFGChecker is left switched off
(the options -verify-cfg-preserved is false by default). It will be
switched on by a separate patch to minimize possible reverts.

Reviewed By: skatkov, kuhar

Differential Revision: https://reviews.llvm.org/D91327
2021-04-06 12:35:49 +07:00
Serguei Katkov 0057ec8034 [Statepoint] Factor-out utility function to get non-foldable area of STATEPOINT like instructions. NFC
Reviewers: reames, dantrushin
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D99875
2021-04-06 11:44:37 +07:00
Craig Topper cb1028a0b9 [RISCV] When custom iseling masked stores, copy the mask into V0 instead of virtual register.
I missed a few intrinsics in 3dd4aa7d09
when I did this for masked loads and masked segment loads/stores.

Found while trying to share more code between these custom isel
functions.
2021-04-05 21:28:32 -07:00
Philip Reames 58ccbd0d08 Comment adjustments for a rename 2021-04-05 21:07:42 -07:00
Arthur Eubanks ea0e2ca1ac [SROA] Allow SROA on pointers with invariant group intrinsic uses
When we are able to SROA an alloca, we know all uses of it, meaning we
don't have to preserve the invariant group intrinsics and metadata.

It's possible that we could lose information regarding redundant
loads/stores, but that's unlikely to have any real impact since right
now the only user is Clang and vtables.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D99760
2021-04-05 19:53:40 -07:00
Philip Reames 13deb6aac7 Exact ashr/lshr don't loose any set bits and are thus trivially invertible
Use that fact to improve isKnownNonEqual.
2021-04-05 19:22:36 -07:00
Philip Reames dc8d864e3a Address minor post commit feedback on 0e59dd 2021-04-05 18:22:17 -07:00
Stanislav Mekhanoshin 30b3aab329 Copy syncscope when expanding atomicrmw into cmpxchg loop
Fixes: SWDEV-280070

Differential Revision: https://reviews.llvm.org/D99902
2021-04-05 17:29:38 -07:00
Sanjay Patel e2a0f512ea [InstSimplify] fix potential miscompile in select value equivalence
This is the sibling fix to c590a9880d -
as there, we can't subsitute a vector value the equality
compare replacement that we are trying requires that the
comparison is true for the entire value. Vector select
can be partly true/false.
2021-04-05 16:52:34 -04:00
Craig Topper 780a47285a [RISCV] Add SDTCisInt to the SDTRVVSlide1 since it is only used for vslide1up.vx/vslide1down.vx.
The scalar type is already marked as XLenVT. The floating point
version would need a different rule.
2021-04-05 13:03:39 -07:00
Craig Topper af2837675a [RISCV] Split RISCVISD::VMV_S_XF_VL into separate integer and FP.
It's a bit silly, but it allows us to write stricter type
constraints for isel. There's still some extra type checks in
the generated table due to some type interference limitations
around HWMode.
2021-04-05 12:57:35 -07:00
Philip Reames b0e59dd6e1 Extract a helper for figuring out if an operator is invertible [nfc]
For use in an uncoming patch.  Left out the phi case (which could otherwise fit in this framework) as it would cause infinite recursion in said patch.  We can probably also leverage this in instcombine to ensure we keep the two sets of related analysis and transforms in sync.
2021-04-05 12:14:21 -07:00
Craig Topper 7edda698c0 [RISCV] Move VSLIDE1UP_VX pattern out of a loop that includes FP types.
FP would need VFSLIDE1UP_VF which uses an FP register.
2021-04-05 12:05:54 -07:00
Ricky Taylor 4db18d62af [M68k] Add support for Motorola literal syntax to AsmParser
These look like $00A0cf for hex and  %001010101 for binary. They are used in Motorola assembly syntax.

Differential Revision: https://reviews.llvm.org/D98519
2021-04-05 20:02:29 +01:00
Tom Stellard 982396ddd7 Revert "Fix build rules for LLVM_WITH_Z3 after D95727"
This reverts commit d66f9c4f1e.

This was a follow up fix for 43ceb74eb1, which
will be reverted.
2021-04-05 10:46:19 -07:00
Cyndy Ishida 0116d04d04 [TextAPI] move source code files out of subdirectory, NFC
TextAPI/ELF has moved out into InterfaceStubs, so theres no longer a
need to seperate out TextAPI between formats.

Reviewed By: ributzka, int3, #lld-macho

Differential Revision: https://reviews.llvm.org/D99811
2021-04-05 10:24:42 -07:00
Ta-Wei Tu 6a82ace5f2 [LoopFusion] Bails out if only the second candidate is guarded (PR48060)
If only the second candidate loop is guarded while the first one is not, fusioning
two loops might not be valid but this check is currently missing.

Fixes https://bugs.llvm.org/show_bug.cgi?id=48060

Reviewed By: sidbav

Differential Revision: https://reviews.llvm.org/D99716
2021-04-06 01:08:56 +08:00
Fraser Cormack af3a839c70 [RISCV] Add support for bitcasts between scalars and fixed-length vectors
This patch supports bitcasts from scalar types to fixed-length vectors
and vice versa. It custom-lowers and custom-legalizes them to
EXTRACT_VECTOR_ELT/INSERT_VECTOR_ELT operations, using a single-element
vectors to hold the scalar where appropriate.

Previously, some of these would fail to select, others would be expanded
through stack loads and stores. Effort was made to ensure the codegen
avoids the stack for both legal and illegal scalar types.

Some of the codegen could be improved, but on first glance it looks like
a general optimization of EXTRACT_VECTOR_ELT when extracting an i64
element on RV32.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D99667
2021-04-05 17:21:55 +01:00
Sanjay Patel c590a9880d [InstCombine] fix potential miscompile in select value equivalence
As shown in the example based on:
https://llvm.org/PR49832
...and the existing test, we can't substitute
a vector value because the equality compare
replacement that we are attempting requires
that the comparison is true for the entire
value. Vector select can be partly true/false.
2021-04-05 12:25:40 -04:00
John Paul Adrian Glaubitz 62a94b725c [M68k] Mark public functions with the LLVM_EXTERNAL_VISIBILITY macro
In 0dbcb36394, most most target symbols were made hidden by default
with the public ones marked with LLVM_EXTERNAL_VISIBILITY. When the
M68k target was added, this particular change was forgotten so that
external tools cannot make use of the public M68k target functions
in libLLVM.so. Thus, add the missing LLVM_EXTERNAL_VISIBILITY macro
to all public target functions in the M68k backend.

Differential Revision: https://reviews.llvm.org/D99869
2021-04-05 09:24:30 -07:00
Fraser Cormack 3f0df4d7b0 [RISCV] Expand scalable-vector truncstores and extloads
Caught in internal testing, these operations are assumed legal by
default, even for scalable vector types. Expand them back into separate
truncations and stores, or loads and extensions.

Also add explicit fixed-length vector tests for these operations, even
though they should have been correct already.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D99654
2021-04-05 17:03:45 +01:00
Alexey Bataev 00a84f9a7f [SLP]Improve vectorization of the CmpInst instructions.
During vectorization better to postpone the vectorization of the CmpInst
instructions till the end of the basic block. Otherwise we may vectorize
it too early and may miss some vectorization patterns, like reductions.

Reworked part of D57059

Differential Revision: https://reviews.llvm.org/D99796
2021-04-05 06:22:51 -07:00
Alex Orlov 5f57793c4f * NFC. Refactored DIPrinter for better support of new print styles.
This patch introduces a DIPrinter interface to implement by different output style printer implementations. DIPrinterGNU and DIPrinterLLVM implement the GNU and LLVM output style printing respectively. No functional changes.

This refactoring clarifies and simplifies the code, and makes a new output style addition easier.

Reviewed By: jhenderson, dblaikie

Differential Revision: https://reviews.llvm.org/D98994
2021-04-05 15:40:41 +04:00
Simon Pilgrim 36d4f6d7f8 [X86] Fold xor(zext(xor(x,c1)),c2) -> xor(zext(x),xor(zext(c1),c2))
Fixes PR47603 (second case) by extending rG89afec348dbd3e5078f176e978971ee2d3b5dec8
2021-04-05 11:40:37 +01:00
Craig Topper 4708a05da0 [RISCV] Use gorciw for i32 orc.b intrinsic when Zbp is enabled.
The W version of orc.b does not exist in Zbp so we need to use
gorci encoding. If we have Zbp, we can use gorciw which can avoid a
sext.w in some cases.
2021-04-04 17:14:28 -07:00
Roman Lebedev 2760a808b9
[InstCombine] dropRedundantMaskingOfLeftShiftInput(): check that adding shift amounts doesn't overflow (PR49778)
This is identical to 781d077afb,
but for the other function.

For certain shift amount bit widths, we must first ensure that adding
shift amounts is safe, that the sum won't have an unsigned overflow.

Fixes https://bugs.llvm.org/show_bug.cgi?id=49778
2021-04-04 23:26:41 +03:00
Roman Lebedev dceb3e5996
[NFC][InstCombine] Extract canTryToConstantAddTwoShiftAmounts() as helper 2021-04-04 23:26:41 +03:00
Craig Topper 98d5db3e3a [RISCV] Lower orc.b intrinsic to RISCVISD::GORCI.
This will allow us to share any future known bits, demaned bits,
or sign bits improvements.
2021-04-04 12:31:41 -07:00
Sanjay Patel c0645f1324 [InstCombine] fold popcount of exactly one bit to shift
This is discussed in https://llvm.org/PR48999 ,
but it does not solve that request.

The difference in the vector test shows that some
other logic transform is limited to scalar types.
2021-04-04 11:43:49 -04:00
Nikita Popov 9bad7de9a3 [SimplifyCFG] Handle two equal cases in switch to select
When converting a switch with two cases and a default into a
select, also handle the denegerate case where two cases have the
same value.

Generate this case directly as

  %or = or i1 %cmp1, %cmp2
  %res = select i1 %or, i32 %val, i32 %default

rather than

  %sel1 = select i1 %cmp1, i32 %val, i32 %default
  %res = select i1 %cmp2, i32 %val, i32 %sel1

as InstCombine is going to canonicalize to the former anyway.
2021-04-04 17:27:28 +02:00
Nikita Popov 72e0846ef8 [LVI] Don't bail on overdefined value in select
Even if one of the operands is overdefined, we may still produce
a non-overdefined result, e.g. due to a min/max operation. This
matches our handling elsewhere, e.g. for binary operators.

The slot poisoning comment refers to a much older LVI cache
implementation.
2021-04-04 11:11:01 +02:00
Craig Topper a2ea003fcb [RISCV] Don't convert fshr/fshl to target specific FSL/FSR node if shift amount is a constant.
As long as it's a constant we can directly pattern match it
without any problems. It's only when it isn't a constant that
we need to add an AND.

In theory this should allow more target independent optimizations
to remain active.
2021-04-03 23:13:30 -07:00
Juneyoung Lee 5207cde5cb [InstCombine] Conditionally fold select i1 into and/or
This patch fixes llvm.org/pr49688 by conditionally folding select i1 into and/or:

```
select cond, cond2, false
->
and cond, cond2
```

This is not safe if cond2 is poison whereas cond isn’t.

Unconditionally disabling this transformation affects later pipelines that depend on and/or i1s.
To minimize its impact, this patch conservatively checks whether cond2 is an instruction that
creates a poison or its operand creates a poison.
This approach is similar to what InstSimplify's SimplifyWithOpReplaced is doing.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D99674
2021-04-04 14:11:28 +09:00
Mircea Trofin b32e76c6d5 [mlgo] fix build rules
This was prompted by D95727, which had the side-effect to break the
'release' mode build bot for ML-driven policies. The problem is that now
the pre-compiled object files don't get transitively carried through as
'source' anymore; that being said, the previous way of consuming them
was problematic, because it was only working for static builds; in
dynamic builds, the whole tf_xla_runtime was linked, which is
undesirable.

The alternative is to treat tf_xla_runtime as an archive, which then
leads to the desired effect.

Differential Revision: https://reviews.llvm.org/D99829
2021-04-03 12:49:03 -07:00
Roman Lebedev 7727cc242d
[NFC][X86] Split VPMOV* AVX2 instructions into their own sched class
At least on all three Zen's, all such instructions cleanly map
into this new class with no overrides needed.
2021-04-03 22:39:07 +03:00
Nikita Popov 665065821e [FastISel] Remove kill tracking
This is a followup to D98145: As far as I know, tracking of kill
flags in FastISel is just a compile-time optimization. However,
I'm not actually seeing any compile-time regression when removing
the tracking. This probably used to be more important in the past,
before FastRA was switched to allocate instructions in reverse
order, which means that it discovers kills as a matter of course.

As such, the kill tracking doesn't really seem to serve a purpose
anymore, and just adds additional complexity and potential for
errors. This patch removes it entirely. The primary changes are
dropping the hasTrivialKill() method and removing the kill
arguments from the emitFast methods. The rest is mechanical fixup.

Differential Revision: https://reviews.llvm.org/D98294
2021-04-03 15:50:13 +02:00
Simon Pilgrim 89afec348d [X86] Fold xor(truncate(xor(x,c1)),c2) -> xor(truncate(x),xor(truncate(c1),c2))
Fixes PR47603

This should probably be transferable to DAGCombine - the main limitation with the existing trunc(logicop) DAG fold is we don't know if legalization has tried to promote truncated logicops already. We might be able to peek through extensions as well.
2021-04-03 12:43:05 +01:00
Simon Pilgrim 7c17f1ea84 [X86][SSE] isHorizontalBinOp - use getTargetShuffleInputs helper (REAPPLIED)
Use the getTargetShuffleInputs helper for all shuffle decoding

Reapplied (after reversion in rGfa0aff6d6960) with fix+test for subvector splitting - we weren't accounting for peeking through bitcasts changing the vector element count of the shuffle sources.
2021-04-03 11:59:19 +01:00
Bjorn Pettersson d66f9c4f1e Fix build rules for LLVM_WITH_Z3 after D95727
Started to see build errors like this

../lib/Support/Z3Solver.cpp:19:10: fatal error: 'z3.h' file not found
#include <z3.h>
         ^~~~~~
1 error generated.

after commit 43ceb74eb1.

The -isystem path to the Z3_INCLUDE_DIR wen't missing in the compile
commands. No idea why target_include_directories stopped working with
that commit, but using include_directories seem to work better.
2021-04-03 12:25:37 +02:00
Nikita Popov b552e16b0b [Loads] Forward constant vector store to load of first element
InstCombine performs simple forwarding from stores to loads, but
currently only handles the case where the load and store have the
same size. This extends it to also handle a store of a constant
with a larger size followed by a load with a smaller size.

This is implemented through ConstantFoldLoadThroughBitcast() which
is fairly primitive (e.g. does not allow storing a large integer
and then loading a small one), but at least can forward the first
element of a vector store. Unfortunately it seems that we currently
don't have a generic helper for "read a constant value as a different
type", it's all tangled up with other logic in either
ConstantFolding or VNCoercion.

Differential Revision: https://reviews.llvm.org/D98114
2021-04-03 12:10:31 +02:00
Nikita Popov 9d20eaf9c0 [BasicAA] Don't store AATags in cache key (NFC)
The AAMDNodes part of the MemoryLocation is not used by the BasicAA
cache, so don't store it. This reduces the size of each cache entry
from 112 bytes to 48 bytes.
2021-04-03 11:32:01 +02:00
Nikita Popov 17b4e5d456 [BasicAA] Don't pass through AA metadata (NFCI)
BasicAA itself doesn't make use of AA metadata, but passes it
through to recursive queries and makes it part of the cache key.
Aliasing decisions that are based on AA metadata (i.e. TBAA and
ScopedAA) are based *only* on AA metadata, so checking them with
different pointer values or sizes is not useful, the result will
always be the same.

While this change is a mild compile-time improvement by itself,
the actual goal here is to reduce the size of AA cache keys in
a followup change.

Differential Revision: https://reviews.llvm.org/D90098
2021-04-03 11:21:50 +02:00
Simon Pilgrim 4ea5475a3f [KnownBits] Add KnownBits::haveNoCommonBitsSet helper. NFCI.
Include exhaustive test coverage.
2021-04-02 21:44:33 +01:00
Eric Astor 0499a9d688 [ms] [llvm-ml] Accept /WX to signal that warnings should be fatal.
Define -fatal-warnings to make warnings fatal, and accept /WX as an ML.EXE compatible alias for it.

Also make sure that if Warning() returns true, we always treat it as an error.

Reviewed By: thakis

Differential Revision: https://reviews.llvm.org/D92504
2021-04-02 15:13:20 -04:00
Levy Hsu f78d932cf2 [RISCV] Add IR intrinsics for Zbc extension
Head files are included in a separate patch in case the name needs to be changed.

RV32 / 64:
clmul
clmulh
clmulr

Differential Revision: https://reviews.llvm.org/D99711
2021-04-02 12:09:13 -07:00
Levy Hsu 944adbf285 Recommit "[RISCV] Add IR intrinsic for Zbb extension"
Forgot to amend the Author.

Original commit message:

Header files are included in a separate patch in case the name needs to be changed.

RV32 / 64:
orc.b

Differential Revision: https://reviews.llvm.org/D99320
2021-04-02 11:50:19 -07:00
Craig Topper 1f0b309f24 Revert "[RISCV] Add IR intrinsic for Zbb extension"
This reverts commit 1808194590.

I forgot to change the author.
2021-04-02 11:47:02 -07:00
Cyndy Ishida 3a223cd4f3 [TextAPI] run clang-format on violating sections, NFC 2021-04-02 11:44:33 -07:00
Craig Topper 1808194590 [RISCV] Add IR intrinsic for Zbb extension
Header files are included in a separate patch in case the name needs to be changed.

RV32 / 64:
orc.b
2021-04-02 11:23:57 -07:00
Fangrui Song 8e5f3d04f2 [SLPVectorizer] Fix divide-by-zero after D99719
Will add a test case later.
2021-04-02 11:13:51 -07:00
Eric Astor 15ec0ad77a [ms] [llvm-ml] Fix case-sensitivity for variables and textmacros
Make variables and text-macro references case-insensitive, to match ml.exe.

Also improve error handling for text-macro expansion.

Reviewed By: thakis

Differential Revision: https://reviews.llvm.org/D92503
2021-04-02 14:08:02 -04:00
Levy Hsu b001d574d7 [RISCV] Add IR intrinsic for Zbr extension
Implementation for RISC-V Zbr extension intrinsic.

Header files are included in separate patch in case the name needs to be changed

RV32 / 64:
        crc32b
        crc32h
        crc32w
        crc32cb
        crc32ch
        crc32cw

RV64 Only:
        crc32d
        crc32cd

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D99009
2021-04-02 10:58:45 -07:00
Craig Topper d7ffa82a8e [RISCV] Improve 64-bit integer constant materialization for more cases.
For positive constants we try shifting left to remove leading zeros
and fill the bottom bits with 1s. We then materialize that constant
shift it right.

This patch adds a new strategy to try filling the bottom bits with
zeros instead. This catches some additional cases.
2021-04-02 10:18:08 -07:00
Sanjay Patel 412fc74140 [InstCombine] fold not+or+neg
~((-X) | Y) --> (X - 1) & (~Y)

We generally prefer 'add' over 'sub', this reduces the
dependency chain, and this looks better for codegen on
x86, ARM, and AArch64 targets.

https://llvm.org/PR45755

https://alive2.llvm.org/ce/z/cxZDSp
2021-04-02 13:16:36 -04:00
Dimitry Andric 6abb92f210 [SCCP] Avoid modifying AdditionalUsers while iterating over it
When run under valgrind, or with a malloc that poisons freed memory,
this can lead to segfaults or other problems.

To avoid modifying the AdditionalUsers DenseMap while still iterating,
save the instructions to be notified in a separate SmallPtrSet, and use
this to later call OperandChangedState on each instruction.

Fixes PR49582.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D98602
2021-04-02 19:05:59 +02:00
Florian Hahn 8867fc69f0 [LV] Hoist mapping of IR operands to VPValues (NFC).
This patch moves mapping of IR operands to VPValues out of
tryToCreateWidenRecipe. This allows using existing VPValue operands when
widening recipes directly, which will be introduced in future patches.
2021-04-02 17:57:20 +01:00
Philip Reames 2c4548e18e [rs4gc] Use loops instead of straightline code for attribute stripping [nfc]
Mostly because I'm about to add more attributes and the straightline copies get much uglier.  What's currently there isn't too bad.
2021-04-02 09:25:15 -07:00
Philip Reames a505801e2b [rs4gc] Strip nofree and nosync attributes when lowering from abstract model
The safepoints being inserted exists to free memory, or coordinate with another thread to do so.  Thus, we must strip any inferred attributes and reinfer them after the lowering.

I'm not aware of any active miscompiles caused by this, but since I'm working on strengthening inference of both and leveraging them in the optimization decisions, I figured a bit of future proofing was warranted.
2021-04-02 09:12:24 -07:00
Brendon Cahoon 09a88278cb [GlobalISel] Allow different types for G_SBFX and G_UBFX operands
Change the definition of G_SBFX and G_UBFX so that the lsb and width
can have different types than the src and dst operands.

Differential Revision: https://reviews.llvm.org/D99739
2021-04-02 11:11:06 -04:00
Nikita Popov 4a3e006830 [LVI] Use range metadata on intrinsics
If we don't know how to handle an intrinsic, we should still
make use of normal call range metadata.
2021-04-02 16:45:31 +02:00
Alexey Bataev 5fcb07a070 [SLP]Fix a bug in min/max reduction, number of condition uses.
The ultimate reduction node may have multiple uses, but if the ultimate
reduction is min/max reduction and based on SelectInstruction, the
condition of this select instruction must have only single use.

Differential Revision: https://reviews.llvm.org/D99753
2021-04-02 07:09:44 -07:00
Nico Weber fa0aff6d69 Revert "[X86][SSE] isHorizontalBinOp - use getTargetShuffleInputs helper"
This reverts commit 500969f1d0.
Makes clang assert compiling avx2 code, see
https://bugs.chromium.org/p/chromium/issues/detail?id=1195353#c4
for a standalone repro.
2021-04-02 09:55:55 -04:00
Jun Ma 274ac9d40e [AArch64][SVE] Lowering sve.dot to DOT node
Differential Revision: https://reviews.llvm.org/D99699
2021-04-02 20:05:17 +08:00
Jun Ma ab3c5fb282 [NFC][SVE] Use SVE_4_Op_Imm_Pat for sve_intx_dot_by_indexed_elem 2021-04-02 20:05:17 +08:00
Jeroen Dobbelaere b82b305cf9 [InstCombine] Fix out-of-bounds ashr(shl) optimization
This fixes a crash found by the oss fuzzer and reported by @fhahn.
The suggestion of @RKSimon seems to be the correct fix here. (See D91343).

The oss fuzz report can be found here: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=32759

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D99792
2021-04-02 13:45:11 +02:00
Simon Pilgrim 500969f1d0 [X86][SSE] isHorizontalBinOp - use getTargetShuffleInputs helper
Use the getTargetShuffleInputs helper for all shuffle decoding
2021-04-02 11:50:18 +01:00
Sander de Smalen 0f7bbbc481 Always emit error for wrong interfaces to scalable vectors, unless cmdline flag is passed.
In order to bring up scalable vector support in LLVM incrementally,
we introduced behaviour to emit a warning, instead of an error, when
asking the wrong question of a scalable vector, like asking for the
fixed number of elements.

This patch puts that behaviour under a flag. The default behaviour is
that the compiler will always error, which means that all LLVM unit
tests and regression tests will now fail when a code-path is taken that
still uses the wrong interface.

The behaviour to demote an error to a warning can be individually enabled
for tools that want to support experimental use of scalable vectors.
This patch enables that behaviour when driving compilation from Clang.
This means that for users who want to try out scalable-vector support,
fixed-width codegen support, or build user-code with scalable vector
intrinsics, Clang will not crash and burn when the compiler encounters
such a case.

This allows us to do away with the following pattern in many of the SVE tests:
  RUN: .... 2>%t
  RUN: cat %t | FileCheck --check-prefix=WARN
  WARN-NOT: warning: ...

The behaviour to emit warnings is only temporary and we expect this flag
to be removed in the future when scalable vector support is more stable.

This patch also has fixes the following tests:
 unittests:
   ScalableVectorMVTsTest.SizeQueries
   SelectionDAGAddressAnalysisTest.unknownSizeFrameObjects
   AArch64SelectionDAGTest.computeKnownBitsSVE_ZERO_EXTEND_VECTOR_INREG

 regression tests:
   Transforms/InstCombine/vscale_gep.ll

Reviewed By: paulwalker-arm, ctetreau

Differential Revision: https://reviews.llvm.org/D98856
2021-04-02 10:55:22 +01:00
Florian Hahn 0f3230390b
[SLP] Better estimate cost of no-op extracts on target vectors.
The motivation for this patch is to better estimate the cost of
extracelement instructions in cases were they are going to be free,
because the source vector can be used directly.

A simple example is

    %v1.lane.0 = extractelement <2 x double> %v.1, i32 0
    %v1.lane.1 = extractelement <2 x double> %v.1, i32 1

    %a.lane.0 = fmul double %v1.lane.0, %x
    %a.lane.1 = fmul double %v1.lane.1, %y

Currently we only consider the extracts free, if there are no other
users.

In this particular case, on AArch64 which can fit <2 x double> in a
vector register, the extracts should be free, independently of other
users, because the source vector of the extracts will be in a vector
register directly, so it should be free to use the vector directly.

The SLP vectorized version of noop_extracts_9_lanes is 30%-50% faster on
certain AArch64 CPUs.

It looks like this does not impact any code in
SPEC2000/SPEC2006/MultiSource both on X86 and AArch64 with -O3 -flto.

This originally regressed after D80773, so if there's a better
alternative to explore, I'd be more than happy to do that.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D99719
2021-04-02 10:40:12 +01:00
Fraser Cormack 3b48d849d4 [RISCV] Optimize more redundant VSETVLIs
D99717 introduced some test cases which showed that the output of one
vsetvli into another would not be picked up by the RISCVCleanupVSETVLI
pass. This patch teaches the optimization about such a pattern. The
pattern is quite common when using the RVV vsetvli intrinsic to pass the
VL onto other intrinsics.

The second test case introduced by D99717 is left unoptimized by this
patch. It is a rarer case and will require us to rewire any uses of the
redundant vset[i]vli's output to the previous one's.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D99730
2021-04-02 10:04:07 +01:00
Evgeniy Brevnov 2388aae401 [NARY-REASSOCIATE] Support reassociation of min/max
Support reassociation for min/max. With that we should be able to transform min(min(a, b), c) -> min(min(a, c), b) if min(a, c) is already available.

Reviewed By: mkazantsev, lebedev.ri

Differential Revision: https://reviews.llvm.org/D88287
2021-04-02 15:30:13 +07:00
Roman Lebedev a26f1bf67e
[PassManager] Run additional LICM before LoopRotate
Loop rotation often has to perform code duplication
from header into preheader, which introduces PHI nodes.

>>! In D99204, @thopre wrote:
>
> With loop peeling, it is important that unnecessary PHIs be avoided or
> it will leads to spurious peeling. One source of such PHIs is loop
> rotation which creates PHIs for invariant loads. Those PHIs are
> particularly problematic since loop peeling is now run as part of simple
> loop unrolling before GVN is run, and are thus a source of spurious
> peeling.
>
> Note that while some of the load can be hoisted and eventually
> eliminated by instruction combine, this is not always possible due to
> alignment issue. In particular, the motivating example [1] was a load
> inside a class instance which cannot be hoisted because the `this'
> pointer has an alignment of 1.
>
> [1] http://lists.llvm.org/pipermail/llvm-dev/attachments/20210312/4ce73c47/attachment.cpp

Now, we could enhance LoopRotate to avoid duplicating code when not needed,
but instead hoist loop-invariant code, but isn't that a code duplication? (*sic*)
We have LICM, and in fact we already run it right after LoopRotation.

We could try to move it to before LoopRotation,
that is basically free from compile-time perspective:
https://llvm-compile-time-tracker.com/compare.php?from=6c93eb4477d88af046b915bc955c03693b2cbb58&to=a4bee6d07732b1184c436da489040b912f0dc271&stat=instructions
But, looking at stats, i think it isn't great that we would no longer do LICM after LoopRotation, in particular:
| statistic name                                   | LoopRotate-LICM | LICM-LoopRotate |     Δ |       % | abs(%) |
| asm-printer.EmittedInsts                         | 9015930         | 9015799         |  -131 |   0.00% |  0.00% |
| indvars.NumElimCmp                               | 3536            | 3544            |     8 |   0.23% |  0.23% |
| indvars.NumElimExt                               | 36725           | 36580           |  -145 |  -0.39% |  0.39% |
| indvars.NumElimIV                                | 1197            | 1187            |   -10 |  -0.84% |  0.84% |
| indvars.NumElimIdentity                          | 143             | 136             |    -7 |  -4.90% |  4.90% |
| indvars.NumElimRem                               | 4               | 5               |     1 |  25.00% | 25.00% |
| indvars.NumLFTR                                  | 29842           | 29890           |    48 |   0.16% |  0.16% |
| indvars.NumReplaced                              | 2293            | 2227            |   -66 |  -2.88% |  2.88% |
| indvars.NumSimplifiedSDiv                        | 6               | 8               |     2 |  33.33% | 33.33% |
| indvars.NumWidened                               | 26438           | 26329           |  -109 |  -0.41% |  0.41% |
| instcount.TotalBlocks                            | 1178338         | 1173840         | -4498 |  -0.38% |  0.38% |
| instcount.TotalFuncs                             | 111825          | 111829          |     4 |   0.00% |  0.00% |
| instcount.TotalInsts                             | 9905442         | 9896139         | -9303 |  -0.09% |  0.09% |
| lcssa.NumLCSSA                                   | 425871          | 423961          | -1910 |  -0.45% |  0.45% |
| licm.NumHoisted                                  | 378357          | 378753          |   396 |   0.10% |  0.10% |
| licm.NumMovedCalls                               | 2193            | 2208            |    15 |   0.68% |  0.68% |
| licm.NumMovedLoads                               | 35899           | 31821           | -4078 | -11.36% | 11.36% |
| licm.NumPromoted                                 | 11178           | 11154           |   -24 |  -0.21% |  0.21% |
| licm.NumSunk                                     | 13359           | 13587           |   228 |   1.71% |  1.71% |
| loop-delete.NumDeleted                           | 8547            | 8402            |  -145 |  -1.70% |  1.70% |
| loop-instsimplify.NumSimplified                  | 12876           | 11890           |  -986 |  -7.66% |  7.66% |
| loop-peel.NumPeeled                              | 1008            | 925             |   -83 |  -8.23% |  8.23% |
| loop-rotate.NumNotRotatedDueToHeaderSize         | 368             | 365             |    -3 |  -0.82% |  0.82% |
| loop-rotate.NumRotated                           | 42015           | 42003           |   -12 |  -0.03% |  0.03% |
| loop-simplifycfg.NumLoopBlocksDeleted            | 240             | 242             |     2 |   0.83% |  0.83% |
| loop-simplifycfg.NumLoopExitsDeleted             | 497             | 20              |  -477 | -95.98% | 95.98% |
| loop-simplifycfg.NumTerminatorsFolded            | 618             | 336             |  -282 | -45.63% | 45.63% |
| loop-unroll.NumCompletelyUnrolled                | 11028           | 11032           |     4 |   0.04% |  0.04% |
| loop-unroll.NumUnrolled                          | 12608           | 12529           |   -79 |  -0.63% |  0.63% |
| mem2reg.NumDeadAlloca                            | 10222           | 10221           |    -1 |  -0.01% |  0.01% |
| mem2reg.NumPHIInsert                             | 192110          | 192106          |    -4 |   0.00% |  0.00% |
| mem2reg.NumSingleStore                           | 637650          | 637643          |    -7 |   0.00% |  0.00% |
| scalar-evolution.NumBruteForceTripCountsComputed | 814             | 812             |    -2 |  -0.25% |  0.25% |
| scalar-evolution.NumTripCountsComputed           | 283108          | 282934          |  -174 |  -0.06% |  0.06% |
| scalar-evolution.NumTripCountsNotComputed        | 106712          | 106718          |     6 |   0.01% |  0.01% |
| simple-loop-unswitch.NumBranches                 | 5178            | 4752            |  -426 |  -8.23% |  8.23% |
| simple-loop-unswitch.NumCostMultiplierSkipped    | 914             | 503             |  -411 | -44.97% | 44.97% |
| simple-loop-unswitch.NumSwitches                 | 20              | 18              |    -2 | -10.00% | 10.00% |
| simple-loop-unswitch.NumTrivial                  | 183             | 95              |   -88 | -48.09% | 48.09% |

... but that actually regresses LICM (-12% `licm.NumMovedLoads`),
loop-simplifycfg (`NumLoopExitsDeleted`, `NumTerminatorsFolded`),
simple-loop-unswitch (`NumTrivial`).

What if we instead have LICM both before and after LoopRotate?
| statistic name                                | LoopRotate-LICM | LICM-LoopRotate-LICM |     Δ |       % | abs(%) |
| asm-printer.EmittedInsts                      | 9015930         | 9014474              | -1456 |  -0.02% |  0.02% |
| indvars.NumElimCmp                            | 3536            | 3546                 |    10 |   0.28% |  0.28% |
| indvars.NumElimExt                            | 36725           | 36681                |   -44 |  -0.12% |  0.12% |
| indvars.NumElimIV                             | 1197            | 1185                 |   -12 |  -1.00% |  1.00% |
| indvars.NumElimIdentity                       | 143             | 146                  |     3 |   2.10% |  2.10% |
| indvars.NumElimRem                            | 4               | 5                    |     1 |  25.00% | 25.00% |
| indvars.NumLFTR                               | 29842           | 29899                |    57 |   0.19% |  0.19% |
| indvars.NumReplaced                           | 2293            | 2299                 |     6 |   0.26% |  0.26% |
| indvars.NumSimplifiedSDiv                     | 6               | 8                    |     2 |  33.33% | 33.33% |
| indvars.NumWidened                            | 26438           | 26404                |   -34 |  -0.13% |  0.13% |
| instcount.TotalBlocks                         | 1178338         | 1173652              | -4686 |  -0.40% |  0.40% |
| instcount.TotalFuncs                          | 111825          | 111829               |     4 |   0.00% |  0.00% |
| instcount.TotalInsts                          | 9905442         | 9895452              | -9990 |  -0.10% |  0.10% |
| lcssa.NumLCSSA                                | 425871          | 425373               |  -498 |  -0.12% |  0.12% |
| licm.NumHoisted                               | 378357          | 383352               |  4995 |   1.32% |  1.32% |
| licm.NumMovedCalls                            | 2193            | 2204                 |    11 |   0.50% |  0.50% |
| licm.NumMovedLoads                            | 35899           | 35755                |  -144 |  -0.40% |  0.40% |
| licm.NumPromoted                              | 11178           | 11163                |   -15 |  -0.13% |  0.13% |
| licm.NumSunk                                  | 13359           | 14321                |   962 |   7.20% |  7.20% |
| loop-delete.NumDeleted                        | 8547            | 8538                 |    -9 |  -0.11% |  0.11% |
| loop-instsimplify.NumSimplified               | 12876           | 12041                |  -835 |  -6.48% |  6.48% |
| loop-peel.NumPeeled                           | 1008            | 924                  |   -84 |  -8.33% |  8.33% |
| loop-rotate.NumNotRotatedDueToHeaderSize      | 368             | 365                  |    -3 |  -0.82% |  0.82% |
| loop-rotate.NumRotated                        | 42015           | 42005                |   -10 |  -0.02% |  0.02% |
| loop-simplifycfg.NumLoopBlocksDeleted         | 240             | 241                  |     1 |   0.42% |  0.42% |
| loop-simplifycfg.NumTerminatorsFolded         | 618             | 619                  |     1 |   0.16% |  0.16% |
| loop-unroll.NumCompletelyUnrolled             | 11028           | 11029                |     1 |   0.01% |  0.01% |
| loop-unroll.NumUnrolled                       | 12608           | 12525                |   -83 |  -0.66% |  0.66% |
| mem2reg.NumPHIInsert                          | 192110          | 192073               |   -37 |  -0.02% |  0.02% |
| mem2reg.NumSingleStore                        | 637650          | 637652               |     2 |   0.00% |  0.00% |
| scalar-evolution.NumTripCountsComputed        | 283108          | 282998               |  -110 |  -0.04% |  0.04% |
| scalar-evolution.NumTripCountsNotComputed     | 106712          | 106691               |   -21 |  -0.02% |  0.02% |
| simple-loop-unswitch.NumBranches              | 5178            | 5185                 |     7 |   0.14% |  0.14% |
| simple-loop-unswitch.NumCostMultiplierSkipped | 914             | 925                  |    11 |   1.20% |  1.20% |
| simple-loop-unswitch.NumTrivial               | 183             | 179                  |    -4 |  -2.19% |  2.19% |
| simple-loop-unswitch.NumBranches              | 5178            | 4752                 |  -426 |  -8.23% |  8.23% |
| simple-loop-unswitch.NumCostMultiplierSkipped | 914             | 503                  |  -411 | -44.97% | 44.97% |
| simple-loop-unswitch.NumSwitches              | 20              | 18                   |    -2 | -10.00% | 10.00% |
| simple-loop-unswitch.NumTrivial               | 183             | 95                   |   -88 | -48.09% | 48.09% |

I.e. we end up with less instructions, less peeling, more LICM activity,
also note how none of those 4 regressions are here. Namely:

| statistic name                                   | LICM-LoopRotate | LICM-LoopRotate-LICM |     Δ |        % |   abs(%) |
| asm-printer.EmittedInsts                         | 9015799         | 9014474              | -1325 |   -0.01% |    0.01% |
| indvars.NumElimCmp                               | 3544            | 3546                 |     2 |    0.06% |    0.06% |
| indvars.NumElimExt                               | 36580           | 36681                |   101 |    0.28% |    0.28% |
| indvars.NumElimIV                                | 1187            | 1185                 |    -2 |   -0.17% |    0.17% |
| indvars.NumElimIdentity                          | 136             | 146                  |    10 |    7.35% |    7.35% |
| indvars.NumLFTR                                  | 29890           | 29899                |     9 |    0.03% |    0.03% |
| indvars.NumReplaced                              | 2227            | 2299                 |    72 |    3.23% |    3.23% |
| indvars.NumWidened                               | 26329           | 26404                |    75 |    0.28% |    0.28% |
| instcount.TotalBlocks                            | 1173840         | 1173652              |  -188 |   -0.02% |    0.02% |
| instcount.TotalInsts                             | 9896139         | 9895452              |  -687 |   -0.01% |    0.01% |
| lcssa.NumLCSSA                                   | 423961          | 425373               |  1412 |    0.33% |    0.33% |
| licm.NumHoisted                                  | 378753          | 383352               |  4599 |    1.21% |    1.21% |
| licm.NumMovedCalls                               | 2208            | 2204                 |    -4 |   -0.18% |    0.18% |
| licm.NumMovedLoads                               | 31821           | 35755                |  3934 |   12.36% |   12.36% |
| licm.NumPromoted                                 | 11154           | 11163                |     9 |    0.08% |    0.08% |
| licm.NumSunk                                     | 13587           | 14321                |   734 |    5.40% |    5.40% |
| loop-delete.NumDeleted                           | 8402            | 8538                 |   136 |    1.62% |    1.62% |
| loop-instsimplify.NumSimplified                  | 11890           | 12041                |   151 |    1.27% |    1.27% |
| loop-peel.NumPeeled                              | 925             | 924                  |    -1 |   -0.11% |    0.11% |
| loop-rotate.NumRotated                           | 42003           | 42005                |     2 |    0.00% |    0.00% |
| loop-simplifycfg.NumLoopBlocksDeleted            | 242             | 241                  |    -1 |   -0.41% |    0.41% |
| loop-simplifycfg.NumLoopExitsDeleted             | 20              | 497                  |   477 | 2385.00% | 2385.00% |
| loop-simplifycfg.NumTerminatorsFolded            | 336             | 619                  |   283 |   84.23% |   84.23% |
| loop-unroll.NumCompletelyUnrolled                | 11032           | 11029                |    -3 |   -0.03% |    0.03% |
| loop-unroll.NumUnrolled                          | 12529           | 12525                |    -4 |   -0.03% |    0.03% |
| mem2reg.NumDeadAlloca                            | 10221           | 10222                |     1 |    0.01% |    0.01% |
| mem2reg.NumPHIInsert                             | 192106          | 192073               |   -33 |   -0.02% |    0.02% |
| mem2reg.NumSingleStore                           | 637643          | 637652               |     9 |    0.00% |    0.00% |
| scalar-evolution.NumBruteForceTripCountsComputed | 812             | 814                  |     2 |    0.25% |    0.25% |
| scalar-evolution.NumTripCountsComputed           | 282934          | 282998               |    64 |    0.02% |    0.02% |
| scalar-evolution.NumTripCountsNotComputed        | 106718          | 106691               |   -27 |   -0.03% |    0.03% |
| simple-loop-unswitch.NumBranches                 | 4752            | 5185                 |   433 |    9.11% |    9.11% |
| simple-loop-unswitch.NumCostMultiplierSkipped    | 503             | 925                  |   422 |   83.90% |   83.90% |
| simple-loop-unswitch.NumSwitches                 | 18              | 20                   |     2 |   11.11% |   11.11% |
| simple-loop-unswitch.NumTrivial                  | 95              | 179                  |    84 |   88.42% |   88.42% |

{F15983613} {F15983615} {F15983616}
(this is vanilla llvm testsuite + rawspeed + darktable)

As an example of the code where early LICM only is bad, see:
https://godbolt.org/z/GzEbacs4K

This does have an observable compile-time regression of +~0.5% geomean
https://llvm-compile-time-tracker.com/compare.php?from=7c5222e4d1a3a14f029e5f614c9aefd0fa505f1e&to=5d81826c3411982ca26e46b9d0aff34c80577664&stat=instructions
but i think that's basically nothing, and there's potential that it might
be avoidable in the future by fixing clang to produce alignment information
on function arguments, thus making the second run unneeded.

Differential Revision: https://reviews.llvm.org/D99249
2021-04-02 11:11:42 +03:00
Wenlei He c5605857bb [CSSPGO] Skip dangling probe value when computing profile summary
Recently we switched to use InvalidProbeCount = UINT64_MAX (instead of 0) to represent dangling probe, but UINT64_MAX is not excluded when computing profile summary. This caused profile summary to produce incorrect hot/cold threshold. The change fixed it by excluding UINT64_MAX from summary builder.

Differential Revision: https://reviews.llvm.org/D99788
2021-04-01 22:49:11 -07:00
Juneyoung Lee c664769330 [AssumeBundles] offset should be added to correctly calculate align
This is a patch to fix the bug in alignment calculation (see https://reviews.llvm.org/D90529#2619492).

Consider this code:

```
call void @llvm.assume(i1 true) ["align"(i32* %a, i32 32, i32 28)]
%arrayidx = getelementptr inbounds i32, i32* %a, i64 -1
; aligment of %arrayidx?
```

The llvm.assume guarantees that `%a - 28` is 32-bytes aligned, meaning that `%a` is 32k + 28 for some k.
Therefore `a - 4` cannot be 32-bytes aligned but the existing code was calculating the pointer as 32-bytes aligned.

The reason why this happened is as follows.
`DiffSCEV` stores `%arrayidx - %a` which is -4.
`OffSCEV` stores the offset value of “align”, which is 28.
`DiffSCEV` + `OffSCEV` = 24 should be used for `a - 4`'s offset from 32k, but `DiffSCEV` - `OffSCEV` = 32 was being used instead.

Reviewed By: Tyker

Differential Revision: https://reviews.llvm.org/D98759
2021-04-02 12:32:05 +09:00
Yang Fan bc6001ce1e
[X86] Fix -Wunused-function warning (NFC)
GCC warning:
```
/llvm-project/llvm/lib/Target/X86/X86ISelLowering.cpp:9212:13: warning: ‘bool isHorizOp(unsigned int)’ defined but not used [-Wunused-function]
 9212 | static bool isHorizOp(unsigned Opcode) {
      |             ^~~~~~~~~
```
2021-04-02 09:38:12 +08:00
Philip Reames 91790c6785 [indvars[ Fix pr49802 by checking for SCEVCouldNotCompute
The code is assuming that having an exact exit count for the loop implies that exit counts for every exit are known.  This used to be true, but when we added handling for dead exits we broke this invariant.  The new invariant is that an exact loop count implies that any exits non trivially dead have exit counts.

We could have fixed this by either a) explicitly checking for a dead exit, or b) just testing for SCEVCouldNotCompute.  I chose the second as it was simpler.

(Debugging this took longer than it should have since I'd mistyped the original assert and it wasn't checking what it was meant to...)

p.s. Sorry for the lack of test case.  Getting things into a state to actually hit this is difficult and fragile.  The original repro involves loop-deletion leaving SCEV in a slightly inprecise state which lets us bypass other transforms in IndVarSimplify on the way to this one.  All of my attempts to separate it into a standalone test failed.
2021-04-01 17:53:44 -07:00
Philip Reames b23a314146 [funcattrs] Respect nofree attribute on callsites (not just callee) 2021-04-01 14:45:49 -07:00
Craig Topper 766d27dc85 [RISCV] Add isel patterns to handle vrsub intrinsic with 2 vector operands.
This occurs when we type legalize an i64 scalar input on RV32. We
need to manually splat, which requires a vector input. Rather
than special case this in lowering just pattern match it.
2021-04-01 14:10:21 -07:00
David Green da98177cda [ARM] Allow v6m runtime loop unrolling
This removes the restriction that only Thumb2 targets enable runtime
loop unrolling, allowing it for Thumb1 only cores as well. The existing
T2 heuristics are used (for the time being) to control when and how
unrolling is performed.

Differential Revision: https://reviews.llvm.org/D99588
2021-04-01 21:21:40 +01:00
Craig Topper dbbc95e3e5 [RISCV] Use softPromoteHalf legalization for fp16 without Zfh rather than PromoteFloat.
The default legalization strategy is PromoteFloat which keeps
half in single precision format through multiple floating point
operations. Conversion to/from float is done at loads, stores,
bitcasts, and other places that care about the exact size being 16
bits.

This patches switches to the alternative method softPromoteHalf.
This aims to keep the type in 16-bit format between every operation.
So we promote to float and immediately round for any arithmetic
operation. This should be closer to the IR semantics since we
are rounding after each operation and not accumulating extra
precision across multiple operations. X86 is the only other
target that enables this today. See https://reviews.llvm.org/D73749

I had to update getRegisterTypeForCallingConv to force f16 to
use f32 when the F extension is enabled. This way we can still
pass it in the lower bits of an FPR for ilp32f and lp64f ABIs.
The softPromoteHalf would otherwise always give i16 as the
argument type.

Reviewed By: asb, frasercrmck

Differential Revision: https://reviews.llvm.org/D99148
2021-04-01 12:41:57 -07:00
Philip Reames 1e69a5af92 [Attributor] Cleanup detection of non-relaxed atomics in nosync inference
The code was checking for cases which are disallowed by the verifier.  Delete dead code and adjust style.
2021-04-01 12:01:29 -07:00
Philip Reames 8e596f7e27 [Attributor] Cleanup intrinsic handling in nosync inference [mostly NFC]
Mostly stylistic adjustment, but the old code didn't handle the memcpy.inline intrinsic.  By using the matcher class, we now do.
2021-04-01 11:49:59 -07:00
Philip Reames 6ef4505298 [funcattrs] Infer nosync from readnone and non-convergent
This implements the most basic possible nosync inference. The choice of inference rule is taken from the comments in attributor and the discussion on the review of the change which introduced the nosync attribute (0626367202).

This is deliberately minimal. As noted in code comments, I do plan to add a more robust inference which actually scans the function IR directly, but a) I need to do some refactoring of the attributor code to use common interfaces, and b) I wanted to get something in. I also wanted to minimize the "interesting" analysis discussion since that's time intensive.

Context: This combines with existing nofree attribute inference to help prove dereferenceability in the ongoing deref-at-point semantics work.

Differential Revision: https://reviews.llvm.org/D99749
2021-04-01 11:37:34 -07:00
Philip Reames db357891f0 Infer dereferenceability from malloc and friends
Hookup TLI when inferring object size from allocation calls. This allows the analysis to prove dereferenceability for known allocation functions (such as malloc/new/etc) in addition to those marked explicitly with the allocsize attribute.

This is a follow up to 0129cd5 now that the bug fixed by e2c6621e6 is resolved.

As noted in the test, this relies on being able to prove that there is no free between allocation and context (e.g. hoist location). At the moment, this is handled conservatively. I'm working strengthening out ability to reason about no-free regions separately.

Differential Revision: https://reviews.llvm.org/D99737
2021-04-01 11:33:35 -07:00
Martin Storsjö 4391d764e1 [ARM] Remove an unused parameter in ARMWinCOFFObjectWriter. NFC.
This writer only ever operates on 32 bit arm code.

Differential Revision: https://reviews.llvm.org/D99575
2021-04-01 21:25:41 +03:00
Philip Reames ffa15e9463 Extract isVolatile helper on Instruction [NFCI]
We have this logic duplicated in several cases, none of which were exhaustive.  Consolidate it in one place.

I don't believe this actually impacts behavior of the callers.  I think they all filter their inputs such that their partial implementations were correct.  If not, this might be fixing a cornercase bug.
2021-04-01 11:24:02 -07:00
Nick Desaulniers 52338af569 [MC][ARM] add .w suffixes for RSB/RSBS T1
See also:
F5.1.167 RSB, RSBS (register) T1 shift or rotate by value variant
of the Arm ARM.

Link: https://github.com/ClangBuiltLinux/linux/issues/1309

Reviewed By: DavidSpickett

Differential Revision: https://reviews.llvm.org/D99542
2021-04-01 10:45:37 -07:00
Philip Reames 6b05d753e0 Mark unordered memset/memmove/memcpy as nosync
Mostly a means to remove a bit of code from attributor in advance of implementing a FuncAttr inference for nosync.
2021-04-01 10:38:54 -07:00
Craig Topper d157e3f387 [RISCV] Fix handling of nxvXi64 vmsgt(u).vx intrinsics on RV32.
We need to splat the scalar separately and use .vv, but there is
no vmsgt(u).vv. So add isel patterns to select vmslt(u).vv with
swapped operands.

We also need to get VT to use for the splat from an operand rather
than the result since the result VT is nxvXi1.

Reviewed By: HsiangKai

Differential Revision: https://reviews.llvm.org/D99704
2021-04-01 10:38:05 -07:00
Nick Desaulniers 1addc231cd [MC][ARM] add .w suffixes for ORN/ORNS T1
See also:
F5.1.128 ORN, ORNS (register) T1 shift or rotate by value variant
of the Arm ARM.

Link: https://github.com/ClangBuiltLinux/linux/issues/1309

Reviewed By: DavidSpickett

Differential Revision: https://reviews.llvm.org/D99538
2021-04-01 10:27:09 -07:00
Craig Topper b7c2e577cc [RISCV] Add custom type legalization to form MULHSU when possible.
There's no target independent ISD opcode for MULHSU, so custom
legalize 2*XLen multiplies ourselves. We have to be a little
careful to prefer MULHU or MULHSU.

I thought about doing this in isel by pattern matching the
(add (mul X, (srai Y, XLen-1)), (mulhu X, Y)) pattern. I decided
against this because the add might become part of a chain of adds.
I don't trust DAG combine not to reassociate with other adds making
it difficult to find both pieces again.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D99479
2021-04-01 10:15:55 -07:00
Jay Foad fdc4f19e2f [AMDGPU] Remove SIAddIMGInit pass which is now unused
Differential Revision: https://reviews.llvm.org/D99748
2021-04-01 18:13:17 +01:00
Jay Foad 3d07a6d891 [AMDGPU][GlobalISel] Add IMG init in selectImageIntrinsic
Doing this during instruction selection avoids the cost of running
SIAddIMGInit which is yet another pass over the MIR.

Differential Revision: https://reviews.llvm.org/D99670
2021-04-01 18:13:17 +01:00
Jay Foad 4af6251cea [AMDGPU][SDag] Add IMG init in AdjustInstrPostInstrSelection
Doing this in a post-isel hook avoids the cost of running SIAddIMGInit
which is yet another pass over the MIR.

Differential Revision: https://reviews.llvm.org/D99747
2021-04-01 18:13:17 +01:00
Craig Topper d61b40ed27 [RISCV] Improve 64-bit integer materialization for some cases.
This adds a new integer materialization strategy mainly targeted
at 64-bit constants like 0xffffffff where there are 32 or more trailing
ones with leading zeros. We can materialize these by using an addi -1
and srli to restore the leading zeros. This matches what gcc does.

I haven't limited to just these cases though. The implementation
here takes the constant, shifts out all the leading zeros and
shifts ones into the LSBs, creates the new sequence, adds an srli,
and checks if this is shorter than our original strategy.

I've separated the recursive portion into a standalone function
so I could append the new strategy outside of the recursion. Since
external users are no longer using the recursive function, I've
cleaned up the external interface to return the sequence instead of
taking a vector by reference.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D98821
2021-04-01 09:12:52 -07:00
Jay Foad b1fbfd9e4c [AMDGPU] Small cleanup to constructRetValue and its caller. NFC. 2021-04-01 16:36:16 +01:00
Philip Reames e2c6621e63 [deref-at-point] restrict inference of dereferenceability based on allocsize attribute
Support deriving dereferenceability facts from allocation sites with known object sizes while correctly accounting for any possibly frees between allocation and use site. (At the moment, we're conservative and only allowing it in functions where we know we can't free.)

This is part of the work on deref-at-point semantics. I'm making the change unconditional as the miscompile in this case is way too easy to trip by accident, and the optimization was only recently added (by me).

There will be a follow up patch wiring through TLI since that should now be doable without introducing widespread miscompiles.

Differential Revision: https://reviews.llvm.org/D95815
2021-04-01 08:34:40 -07:00
Mircea Trofin ce61def529 [regalloc] Ensure Query::collectInterferringVregs is called before interval iteration
The main part of the patch is the change in RegAllocGreedy.cpp: Q.collectInterferringVregs()
needs to be called before iterating the interfering live ranges.

The rest of the patch offers support that is the case: instead of  clearing the query's
InterferingVRegs field, we invalidate it. The clearing happens when the live reg matrix
is invalidated (existing triggering mechanism).

Without the change in RegAllocGreedy.cpp, the compiler ices.

This patch should make it more easily discoverable by developers that
collectInterferringVregs needs to be called before iterating.

I will follow up with a subsequent patch to improve the usability and maintainability of Query.

Differential Revision: https://reviews.llvm.org/D98232
2021-04-01 08:33:28 -07:00
Anirudh Prasad 7b921a6747 [AsmParser][SystemZ][z/OS] Add in support to accept "#" as part of an Identifier token
- This patch adds in support to accept the "#" character as part of an Identifier.
- This support is needed especially for the HLASM dialect since "#" is treated as part of the valid "Alphabet" range
- The way this is done is by making use of the previous precedent set by the `AllowAtInIdentifier` field in `MCAsmLexer.h`. A new field called `AllowHashInIdentifier` is introduced.
- The static function `IsIdentifierChar` is also updated to accept the `#` character if the `AllowHashInIdentifier` field is set to true.
Note: The field introduced in `MCAsmLexer.h` could very well be moved to `MCAsmInfo.h`. I'm not opposed to it. I decided to put it in `MCAsmLexer` since there seems to be some sort of precedent already with `AllowAtInIdentifier`.

Reviewed By: abhina.sreeskantharajan, nickdesaulniers, MaskRay

Differential Revision: https://reviews.llvm.org/D99277
2021-04-01 11:24:43 -04:00
Bradley Smith 2f45e632c0 [AArch64][SVE] Improve codegen for select nodes with fixed types
Additionally, move the existing fixed vselect tests to *-vselect.ll.

Differential Revision: https://reviews.llvm.org/D99418
2021-04-01 15:54:37 +01:00
Bradley Smith 0934fa4f5d [AArch64][SVE] SVE functions should use the SVE calling convention for fast calls
When an SVE function calls another SVE function using the C calling
convention we use the more efficient SVE VectorCall PCS.  However,
for the Fast calling convention we're incorrectly falling back to
the generic AArch64 PCS.

This patch adds the same "can use SVE vector calling convention"
detection used by CallingConv::C to CallingConv::Fast.

Co-authored-by: Paul Walker <paul.walker@arm.com>

Differential Revision: https://reviews.llvm.org/D99657
2021-04-01 15:52:08 +01:00
Brendon Cahoon 65c8bfb509 [AMDGPU] Enable output modifiers for double precision instructions
Update SIFoldOperands pass to recognize v_add_f64 and v_mul_f64
instructions for folding output modifiers.

Differential Revision: https://reviews.llvm.org/D99505
2021-04-01 10:08:17 -04:00
Alexey Bataev c03696da5e [SLP]Improve and fix getVectorElementSize.
1. Need to cleanup InstrElementSize map for each new tree, otherwise might
use sizes from the previous run of the vectorization attempt.
2. No need to include into analysis the instructions from the different basic
   blocks to save compile time.

Differential Revision: https://reviews.llvm.org/D99677
2021-04-01 06:51:26 -07:00
Simon Pilgrim 77d625f8d8 [DAG] MergeInnerShuffle with BinOps - sometimes accept undef mask elements
If the inner shuffle already contains undef elements, then accept them in the merged shuffle as well.

This helps some X86 HADD/SUB patterns where slow targets were ending up with HADD/SUB because the (un)merged shuffles were stuck either side of the ADD/SUB - meaning we ended up with a total cost much higher than the "2*shuffle+add" that a slow target usually expands a HADD/SUB to.
2021-04-01 14:33:00 +01:00
Alexey Bataev ce98a0556a [SLP]Remove `else` after `return`, NFC.` 2021-04-01 05:33:01 -07:00
Dmitry Preobrazhensky cd953434f2 [AMDGPU][MC][GFX10][GFX90A] Corrected _e32/_e64 suffices
Fixed bugs https://bugs.llvm.org//show_bug.cgi?id=49643, https://bugs.llvm.org//show_bug.cgi?id=49644, https://bugs.llvm.org//show_bug.cgi?id=49645.

Differential Revision: https://reviews.llvm.org/D99413
2021-04-01 14:21:00 +03:00
Simon Pilgrim abbe80fa52 [X86][SSE] Fold HOP(HOP(X,X),HOP(Y,Y)) -> HOP(PERMUTE(HOP(X,Y)),PERMUTE(HOP(X,Y))
For slow-hop targets, attempt to merge HADD/SUB pairs used in chains.
2021-04-01 11:54:10 +01:00
Simon Pilgrim 301319840e [X86][SSE] Enable (F)HADD/SUB handling to SimplifyMultipleUseDemandedVectorElts
Attempt to bypass unused horiz-op operands.

This is very similar to the PACKSS/PACKUS handling - we should try to merge these.
2021-04-01 11:54:09 +01:00
Simon Pilgrim f7aeaced65 [X86][SSE] Add isHorizOp helper function. NFCI. 2021-04-01 11:54:09 +01:00
Dmitry Preobrazhensky 0f5ebbcc7f [AMDGPU][MC] Added flag to identify VOP instructions which have a single variant
By convention, VOP1/2/C instructions which can be promoted to VOP3 have _e32 suffix while promoted instructions have _e64 suffix. Instructions which have a single variant should have no _e32/_e64 suffix. Unfortunately there was no simple way to identify single variant instructions - it was implemented by a hack. See bug https://bugs.llvm.org/show_bug.cgi?id=39086.

This fix simplifies handling of single VOP instructions by adding a dedicated flag.

Differential Revision: https://reviews.llvm.org/D99408
2021-04-01 13:53:12 +03:00
Yevgeny Rouban 1ed53d44d8 [LoopFlatten] Do not report CFG analyses as up-to-date
Removes CFGAnalyses from the preserved analyses set
returned by LoopFlattenPass::run().

Reviewed By: Dave Green, Ta-Wei Tu

Differential Revision: https://reviews.llvm.org/D99700
2021-04-01 15:52:36 +07:00
Sam Parker 92e7771483 [WebAssembly] Invert branch condition on xor input
A frequent pattern for floating point conditional branches use an xor
to invert the input for the branch. Instead we can fold away the xor
by swapping the branch target instead.

Differential Revision: https://reviews.llvm.org/D99171
2021-04-01 09:23:28 +01:00
Max Kazantsev a1d83776bf [NFC] Undo some erroneous renamings
Some vars renamed by mistake during auto-replacements. Undoing them.
2021-04-01 13:10:10 +07:00
Max Kazantsev 630818a850 [NFC] Disambiguate LI in GVN
Name GVN uses name 'LI' for two different unrelated things:
LoadInst and LoopInfo. This patch relates the variables with
former meaning into 'Load' to disambiguate the code.
2021-04-01 12:40:35 +07:00
KAWASHIMA Takahiro 5fac7c6046 [GVN] Propagate llvm.access.group metadata of loads
Before this change, the `llvm.access.group` metadata was dropped
when moving a load instruction in GVN. This prevents vectorizing
a C/C++ loop with `#pragma clang loop vectorize(assume_safety)`.
This change propagates the metadata as well as other metadata if
it is safe (the move-destination basic block and source basic
block belong to the same loop).

Differential Revision: https://reviews.llvm.org/D93503
2021-04-01 10:00:48 +09:00
qixingxue 62b74f7564 [GVN][NFC] Refactor analyzeLoadFromClobberingWrite
This commit adjusts the order of two swappable if statements to
make code cleaner.

Reviewed By: lattner, nikic
Differential Revision: https://reviews.llvm.org/D99648
2021-04-01 08:35:35 +08:00
Philip Reames 4af4828a6e [ValueTracking] Handle non-zero ashr/lshr recurrences
If we know we don't shift out bits (e.g. exact), all we need to know is that input is non-zero.
2021-03-31 16:48:32 -07:00
Philip Reames 115a42ad1e Add debug printers for KnownBits [nfc] 2021-03-31 15:36:07 -07:00
Simonas Kazlauskas 777a58e05b Support {S,U}REMEqFold before legalization
This allows these optimisations to apply to e.g. `urem i16` directly
before `urem` is promoted to i32 on architectures where i16 operations
are not intrinsically legal (such as on Aarch64). The legalization then
later can happen more directly and generated code gets a chance to avoid
wasting time on computing results in types wider than necessary, in the end.

Seems like mostly an improvement in terms of results at least as far as x86_64 and aarch64 are concerned, with a few regressions here and there. It also helps in preventing regressions in changes like {D87976}.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D88785
2021-04-01 01:35:41 +03:00
Craig Topper c88ee1a094 [RISCV] Add UnsupportedSchedZfh multiclass to reduce duplicate lines from RISCVSchedRocket.td and RISCVSchedSiFive7.td. NFC 2021-03-31 15:06:14 -07:00
YangKeao 1c268a8ff4 [X86] add dwarf annotation for inline stack probe
While probing stack, the stack register is moved without dwarf
information, which could cause panic if unwind the backtrace.
This commit only add annotation for the inline stack probe case.
Dwarf information for the loop case should be done in another
patch and need further discussion.

Reviewed By: nagisa

Differential Revision: https://reviews.llvm.org/D99579
2021-04-01 00:32:50 +03:00
Roman Lebedev 43ded90094
[NFC][LoopRotation] Count the number of instructions hoisted/cloned into preheader 2021-03-31 23:27:36 +03:00
Craig Topper 9e00b6660d [SelectionDAG] Remove unneeded vector resize from the end of FoldConstantArithmetic. NFC
There's an assert right before that makes sure the size already matches.
Earlier in this function's life, scalars and vectors shared more
code.
2021-03-31 12:33:10 -07:00
George Mitenkov 807b019ca2 [ConstantFolding] Fixing addo/subo with undef
When folding addo/subo with undef, the current
convention is to use { -1, false } for addo and
{ 0, false } for subo. This was fixed for InstSimplify in
https://reviews.llvm.org/rGf094d65beaa492e845b03561eddd75b5be653a01,
but not in ConstantFolding.

Reviewed By: nikic, lebedev.ri

Differential Revision: https://reviews.llvm.org/D99564
2021-03-31 21:47:29 +03:00
Huihui Zhang fe5c4a06a4 [LoopVectorize] Use SetVector to track uniform uses to prevent non-determinism.
Use SetVector instead of SmallPtrSet to track values with uniform use. Doing this
can help avoid non-determinism caused by iterating over unordered containers.

This bug was found with reverse iteration turning on,
--extra-llvm-cmake-variables="-DLLVM_REVERSE_ITERATION=ON".
Failing LLVM test consecutive-ptr-uniforms.ll .

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D99549
2021-03-31 11:21:07 -07:00
Thomas Lively 45783d0e8a [WebAssembly] Implement i64x2 comparisons
Removes the prototype builtin and intrinsic for i64x2.eq and implements that
instruction as well as the other i64x2 comparison instructions in the final SIMD
spec. Unsigned comparisons were not included in the final spec, so they still
need to be scalarized via a custom lowering.

Differential Revision: https://reviews.llvm.org/D99623
2021-03-31 10:46:17 -07:00
Juneyoung Lee df0b97dab0 [ValueTracking] Add with.overflow intrinsics to poison analysis functions
This is a patch teaching ValueTracking that `s/u*.with.overflow` intrinsics do not
create undef/poison and they propagate poison.
I couldn't write a nice example like the one with ctpop; ValueTrackingTest.cpp were simply updated
to check these instead.
This patch helps reducing regression while fixing https://llvm.org/pr49688 .

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D99671
2021-04-01 02:41:38 +09:00
Philip Reames ae7b1e8823 [SCEV] Handle unreachable binop when matching shift recurrence
This fixes an issue introduced with my change d4648e, and reported in pr49768.

The root problem is that dominance collapses in unreachable code, and that LoopInfo explicitly only models reachable code.  Since the recurrence matcher doesn't filter by reachability (and can't easily because not all consumers have domtree), we need to bailout before assuming that finding a recurrence implies we found a loop.
2021-03-31 10:33:34 -07:00
Craig Topper 437958d9fd [X86] Improve SMULO/UMULO codegen for vXi8 vectors.
The default expansion creates a MUL and either a MULHS/MULHU. Each
of those separately expand to sequences that use one or more
PMULLW instructions as well as additional instructions to
extend the types to vXi16. The MULHS/MULHU expansion computes the
whole 16-bit product, but only keeps the high part.

We can improve the lowering of SMULO/UMULO for some cases by using the MULHS/MULHU
expansion, but keep both the high and low parts. And we can use
those parts to calculate the overflow.

For AVX512 we might have vXi1 overflow outputs. We can improve those by using
vpcmpeqw to produce a k register if AVX512BW is enabled. This is a little better
than truncating the high result to use vpcmpeqb. If we don't have avx512bw we
can extend up to v16i32 to use vpcmpeqd to produce a k register.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D97624
2021-03-31 10:13:50 -07:00
Shimin Cui 00c0c8c87d [PowerPC] [MLICM] Enable hoisting of caller preserved registers on AIX
On ppc64 linux , MachineLICM will hoist caller preserved registers, including TOC loads of the global variable address, out of loops. This is to enable this on AIX for both ppc64 and ppc32.

Differential Revision: https://reviews.llvm.org/D99076
2021-03-31 12:46:25 -04:00
Craig Topper 50b8634a99 [X86] Improve optimizeCompareInstr for signed comparisons after BMI/TBM instructions
We previously couldn't optimize out a TEST if the branch/setcc/cmov
used the overflow flag. This patches allows the TEST to be removed
if the flag producing instruction is known to clear the OF flag.
Thats what the TEST instruction would have done so that should be
equivalent.

Need to add test cases. I'll try to get back to this if I have bandwidth.

Fixes PR48768.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D94856
2021-03-31 09:45:29 -07:00
Wael Yehia 563cdeaafd [LTO][Legacy] Decouple option parsing from LTOCodeGenerator
in this patch we add a new libLTO API to specify debug options independent of an lto_code_gen_t.
This allows clients to pass codegen flags (through libLTO) which otherwise today are ignored.

Reviewed By: steven_wu

Differential Revision: https://reviews.llvm.org/D92611
2021-03-31 16:43:26 +00:00
Craig Topper 2a8b7cab6a [RISCV] Add RISCVISD opcodes for CLZW and CTZW.
Our CLZW isel pattern is quite easily broken by surrounding code
preventing it from matching sometimes. This usually results in
failing to remove the and X, 0xffffffff inserted by type
legalization. The add with -32 that type legalization also inserts
will often gets combined into other add/sub nodes. That doesn't
usually result in extra code when we don't use clzw.

CTTZ seems to be less fragile, but I wanted to keep it consistent
with CTLZ.

Reviewed By: asb, HsiangKai

Differential Revision: https://reviews.llvm.org/D99317
2021-03-31 09:40:07 -07:00
Craig Topper 04f10ab367 [RISCV] Add isel patterns to select vsub_vx intrinsic to vadd.vi if it uses a small enough immediate
Also modify the simm5_plus1 check because Imm-1 is UB if Imm happens
to be INT64_MIN. I don't think the compiler would optimize based on that in this
usage, but it could fail UBSan or -ftrapv.

Reviewed By: HsiangKai, frasercrmck

Differential Revision: https://reviews.llvm.org/D99637
2021-03-31 09:26:41 -07:00
Sanjay Patel 1462bdf1b9 [InstCombine] fold abs(srem X, 2)
This is a missing optimization based on an example in:
https://llvm.org/PR49763

As noted there and the test here, we could add a more
general fold if that is shown useful.

https://alive2.llvm.org/ce/z/xEHdTv
https://alive2.llvm.org/ce/z/97dcY5
2021-03-31 11:29:20 -04:00
Sander de Smalen 7108b2dec1 [SVE] Fix LoopVectorizer test scalalable-call.ll
This marks FSIN and other operations to EXPAND for scalable
vectors, so that they are not assumed to be legal by the cost-model.

Depends on D97470

Reviewed By: dmgreen, paulwalker-arm

Differential Revision: https://reviews.llvm.org/D97471
2021-03-31 14:52:49 +01:00
Sander de Smalen 2f6f249a49 NFC: Change getIntrinsicInstrCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Depends on D97468

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D97469
2021-03-31 14:04:41 +01:00
Sander de Smalen 2f56e1c6b1 NFC: Change getTypeBasedIntrinsicCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Depends on D97466

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D97468
2021-03-31 14:04:41 +01:00
Liqiang Tao d2d6720a93 [InlineCost] Remove TODO comment that consider other forms of savings in the cost-benefit analysis
Attempts to compute savings more accurately cannot impact the set of critically important call sites.

Reviewed By: kazu

Differential Revision: https://reviews.llvm.org/D98577
2021-03-31 20:11:32 +08:00
Roman Lebedev ce548aa236
[X86] AMD Zen 3 has macro fusion
This is an improvement over Zen 2, where only branch fusion is supported,
as per Agner, 21.4 Instruction fusion.
AMD SOG 17h has no mention of fusion.

AMD SOG 19h, 2.9.3 Branch Fusion
The following flag writing instructions support branch fusion
with their reg/reg, reg/imm and reg/mem forms
* CMP
* TEST
* SUB
* ADD
* INC (no fusion with branches dependent on CF)
* DEC (no fusion with branches dependent on CF)
* OR
* AND
* XOR

Agner, 22.4 Instruction fusion
<...> This applies to CMP, TEST, ADD, SUB, AND, OR, XOR, INC, DEC and
all conditional jumps, except if the arithmetic or logic instruction has a rip-relative address or
both an address displacement and an immediate operand.
2021-03-31 14:31:50 +03:00
Fraser Cormack 10fc6e4358 [RISCV] Add support for the stepvector intrinsic
This adds almost everything required for supporting the new stepvector
intrinsic on RVV. It is lowered to the existing VID_VL SDNode.

The only exception is a limitation that RV32 cannot yet lower the
intrinsic on i64 vectors. This is because the step operand is
(currently) required to be at least as large as the vector element type.
I will look into patching that out and loosening the requirement to only
an integer pointer type.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D99594
2021-03-31 11:41:17 +01:00
Jay Foad 5d0e9ddfa5 [AMDGPU][GlobalISel] Add support for global atomicrmw fadd
This includes gfx908 which only has a no-return version of the
global_atomic_add_f32 instruction, using the same hack that was
previously implemented for selecting from the
llvm.amdgcn.global.atomic.fadd intrinsic.

Differential Revision: https://reviews.llvm.org/D97767
2021-03-31 11:13:00 +01:00
Florian Hahn 52e015081a
[AArch64] Avoid SCALAR_TO_VECTOR for single FP constant vector.
Currently the code only checks for integer constants (ConstantSDNode)
and triggers an infinite cycle for single-element floating point
vector constants. We need to check for both FP and integer constants.

Reviewed By: t.p.northover

Differential Revision: https://reviews.llvm.org/D99384
2021-03-31 10:17:36 +01:00
Sander de Smalen 3ccbd4f3c7 NFC: Change getUserCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Depends on D97382

Reviewed By: ctetreau, paulwalker-arm

Differential Revision: https://reviews.llvm.org/D97466
2021-03-31 10:13:09 +01:00
Lang Hames 0269a407f3 [JITLink] Switch from StringRef to ArrayRef<char>, add some generic x86-64 utils
Adds utilities for creating anonymous pointers and jump stubs to x86_64.h. These
are used by the GOT and Stubs builder, but may also be used by pass writers who
want to create pointer stubs for indirection.

This patch also switches the underlying type for LinkGraph content from
StringRef to ArrayRef<char>. This avoids any confusion when working with buffers
that contain null bytes in the middle like, for example, a newly added null
pointer content array. ;)
2021-03-30 21:07:24 -07:00
Chuanqi Xu eb51dd719f [Coroutine] [Debug] Insert dbg.declare to entry.resume to print alloca in the coroutine frame under O2
Summary: Try to insert dbg.declare to entry.resume basic block in resume
function. In this way, we could print alloca such as __promise in
gdb/lldb under O2, which would be beneficial to debug coroutine program.

Test Plan: check-llvm

Reviewed by: aprantl

Differential Revision: https://reviews.llvm.org/D96938
2021-03-31 10:37:06 +08:00
Fangrui Song 3e5ee194c0 [SimpleLoopUnswitch] Fix -Wunused-variable in -DLLVM_ENABLE_ASSERTIONS=off builds after 431a40e1e2 2021-03-30 19:27:10 -07:00
Craig Topper 5db19cc010 [RISCV] simm12_plus1 should not inherit from Operand. NFC
We only use this in Pat patterns, so it just needs to be an
ImmLeaf. If we did need it as an instruction operand, the
ParserMatchClass, EncoderMethod, and DecoderMethod were probably wrong.
2021-03-30 19:02:11 -07:00
Yang Fan 0d7fd9f0d0
[GlobalISel] Fix Wint-in-bool-context warning (NFC)
GCC warning:
```
/llvm-project/llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp: In member function ‘bool llvm::CombinerHelper::matchFunnelShiftToRotate(llvm::MachineInstr&)’:
/llvm-project/llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp:3882:35: warning: ?: using integer constants in boolean context, the expression will always evaluate to ‘true’ [-Wint-in-bool-context]
 3882 |       Opc == TargetOpcode::G_FSHL ? TargetOpcode::G_ROTL : TargetOpcode::G_ROTR;
      |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
```
2021-03-31 09:59:43 +08:00
Craig Topper 05998701b9 [RISCV] Remove some unused ImmLeafs. NFC
These got left behind when we switched RV32 to use selectImm to
match RV64.
2021-03-30 18:54:11 -07:00
Juneyoung Lee 431a40e1e2 [LoopUnswitch] Assert that branch condition is either and/or but not both
as suggested at https://reviews.llvm.org/rG5bb38e84d3d0#986321
2021-03-31 10:35:22 +09:00
Craig Topper f59ba0849f [StructLayout] Use TrailingObjects to allocate space for MemberOffsets.
MemberOffsets are stored at the end of StructLayout. The class
contains a single entry array to mark the start of the member
offsets. getStructLayout calculates the additional space needed
for additional elements before allocating memory.

This patch converts this to use TrailingObjects. This simplifies
the size computation in getStructLayout and gets rid of the
single entry array.

This is prep work, but to use TypeSize instead of uint64_t for
D98169. The single entry array doesn't work with TypeSize because
TypeSize doesn't have a default constructor. We thought this
change was an improvement by itself so we've separated it out.

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D99608
2021-03-30 17:36:50 -07:00
Heejin Ahn 144ec1c38e [WebAssembly] Encode numbers in ULEB128 in event section
The number of events and the type index should be encoded in ULEB128,
but they were incorrctly encoded in LEB128. The smallest number with
which its LEB128 and ULEB128 encodings are different is 64.
There's no way we can generate 64 events in the C++ toolchain
implementation so we can't test that, but the attached test tests when
the type index is 64.

Reviewed By: dschuff

Differential Revision: https://reviews.llvm.org/D99627
2021-03-30 16:21:58 -07:00
Wei Mi d535a05ca1 [ThinLTO] During module importing, close one source module before open
another one for distributed mode.

Currently during module importing, ThinLTO opens all the source modules,
collect functions to be imported and append them to the destination module,
then leave all the modules open through out the lto backend pipeline. This
patch refactors it in the way that one source module will be closed before
another source module is opened. All the source modules will be closed after
importing phase is done. It will save some amount of memory when there are
many source modules to be imported.

Note that this patch only changes the distributed thinlto mode. For in
process thinlto mode, one source module is shared acorss different thinlto
backend threads so it is not changed in this patch.

Differential Revision: https://reviews.llvm.org/D99554
2021-03-30 14:37:29 -07:00
David Green 3a6365a439 [ARM] Add FeatureHasNoBranchPredictor for Thumb1 cores
Mark v6m/v8m-baseline cores as having no branch predictors. This should
not alter very much on its own, but is more correct as the cores do not
have branch predictors and can help in the future.
2021-03-30 21:45:26 +01:00
Fangrui Song 73adc05ced [GlobalISel] Fix -Wunused-variable in -DLLVM_ENABLE_ASSERTIONS=off builds after D99463 2021-03-30 12:52:56 -07:00
Sanjay Patel c2ebad8d55 [InstCombine] add fold for demand of low bit of abs()
This is one problem shown in https://llvm.org/PR49763

https://alive2.llvm.org/ce/z/cV6-4K
https://alive2.llvm.org/ce/z/9_3g-L
2021-03-30 15:14:37 -04:00
Huihui Zhang d857a81437 [VPlan] Use SetVector for VPExternalDefs to prevent non-determinism.
Use SetVector instead of SmallPtrSet for external definitions created for VPlan.
Doing this can help avoid non-determinism caused by iterating over unordered containers.

This bug was found with reverse iteration turning on,
 --extra-llvm-cmake-variables="-DLLVM_REVERSE_ITERATION=ON".
Failing LLVM-Unit test VPRecipeTest.dump.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D99544
2021-03-30 12:10:56 -07:00
Amara Emerson a35c2c7942 [GlobalISel] Implement fewerElements legalization for vector reductions.
This patch adds 3 methods, one for power-of-2 vectors which use tree
reductions using vector ops, before a final reduction op. For non-pow-2
types it generates multiple narrow reductions and combines the values with
scalar ops.

Differential Revision: https://reviews.llvm.org/D97163
2021-03-30 11:19:21 -07:00
Amara Emerson 1bc90847ee [AArch64][GlobalISel] Define some legalization rules for G_ROTR and G_ROTL.
For imported pattern purposes, we have a custom rule that promotes the rotate
amount to 64b as well.

Differential Revision: https://reviews.llvm.org/D99463
2021-03-30 11:11:19 -07:00
Amara Emerson 91887cd4ec [AArch64][GlobalISel] Combine funnel shifts to rotates.
Differential Revision: https://reviews.llvm.org/D99388
2021-03-30 11:00:36 -07:00
Sourabh Singh Tomar f13f050551 [DebugInfo] Support for signed constants inside DIExpression
Negative numbers are represented using DW_OP_consts along with signed representation
of the number as the argument.

Test case IR is generated using Fortran front-end.

Reviewed By: aprantl

Differential Revision: https://reviews.llvm.org/D99273
2021-03-30 23:20:38 +05:30
spupyrev 22998738e8 [SamplePGO] Keeping prof metadata for IndirectBrInst
Currently prof metadata with branch counts is added only for BranchInst and SwitchInst, but not for IndirectBrInst. As a result, BPI/BFI make incorrect inferences for indirect branches, which can be very hot.
This diff adds metadata for IndirectBrInst, in addition to BranchInst and SwitchInst.

Reviewed By: wmi, wenlei

Differential Revision: https://reviews.llvm.org/D99550
2021-03-30 10:44:48 -07:00
Hongtao Yu 3e3fc431df [CSSPGO] Top-down processing order based on full profile.
Use profiled call edges to augment the top-down order. There are cases that the top-down order computed based on the static call graph doesn't reflect real execution order. For example:

1. Incomplete static call graph due to unknown indirect call targets. Adjusting the order by considering indirect call edges from the profile can enable the inlining of indirect call targets by allowing the caller processed before them.

2. Mutual call edges in an SCC. The static processing order computed for an SCC may not reflect the call contexts in the context-sensitive profile, thus may cause potential inlining to be overlooked. The function order in one SCC is being adjusted to a top-down order based on the profile to favor more inlining.

3. Transitive indirect call edges due to inlining. When a callee function is inlined into into a caller function in LTO prelink, every call edge originated from the callee will be transferred to the caller. If any of the transferred edges is indirect, the original profiled indirect edge, even if considered, would not enforce a top-down order from the caller to the potential indirect call target in LTO postlink since the inlined callee is gone from the static call graph.

4. #3 can happen even for direct call targets, due to functions defined in header files. Header functions, when included into source files, are defined multiple times but only one definition survives due to ODR. Therefore, the LTO prelink inlining done on those dropped definitions can be useless based on a local file scope. More importantly, the inlinee, once fully inlined to a to-be-dropped inliner, will have no profile to consume when its outlined version is compiled. This can lead to a profile-less prelink compilation for the outlined version of the inlinee function which may be called from external modules. while this isn't easy to fix, we rely on the postlink AutoFDO pipeline to optimize the inlinee. Since the survived copy of the inliner (defined in headers) can be inlined in its local scope in prelink, it may not exist in the merged IR in postlink, and we'll need the profiled call edges to enforce a top-down order for the rest of the functions.

Considering those cases, a profiled call graph completely independent of the static call graph is constructed based on profile data, where function objects are not even needed to handle case #3 and case 4.

I'm seeing an average 0.4% perf win out of SPEC2017. For certain benchmark such as Xalanbmk and GCC, the win is bigger, above 2%.

The change is an enhancement to https://reviews.llvm.org/D95988.

Reviewed By: wmi, wenlei

Differential Revision: https://reviews.llvm.org/D99351
2021-03-30 10:42:22 -07:00
Jessica Paquette 700431128e [GlobalISel][AArch64] Combine G_SEXT_INREG + right shift -> G_SBFX
Basically a port of isBitfieldExtractOpFromSExtInReg in AArch64ISelDAGToDAG.

This is only done post-legalization for now. Once the legalizer knows how to
decompose these back into shifts, this requirement can probably be removed.

Differential Revision: https://reviews.llvm.org/D99230
2021-03-30 10:14:30 -07:00
Craig Topper a33fcafaf0 [RISCV] Pass 'half' in the lower 16 bits of an f32 value when F extension is enabled, but Zfh is not.
Without Zfh the half type isn't legal, but it could still be
used as an argument/return in IR. Clang will not generate this today.

Previously we promoted the half value to float for arguments and
returns if the F extension is enabled but Zfh isn't. Then depending on
which ABI is enabled we would pass it in either an FPR or a GPR in
float format.

If the F extension isn't enabled, it would get passed in the lower
16 bits of a GPR in half format.

With this patch the value will always in half format and will be
in the lower bits of a GPR or FPR. This should be consistent
with where the bits are located when Zfh is enabled.

I've based this implementation off of how this is done on ARM.

I've manually nan-boxed the value to 32 bits using integer ops.
It looks like flw, fsw, fmv.s, fmv.w.x, fmf.x.w won't
canonicalize nans so should leave the value alone. I think those
are the instructions that could get used on this value.

Reviewed By: kito-cheng

Differential Revision: https://reviews.llvm.org/D98670
2021-03-30 09:47:54 -07:00
Amara Emerson f5e9be6fdb [GlobalISel] Implement lowering for G_ROTR and G_ROTL.
This is a straightforward port.

Differential Revision: https://reviews.llvm.org/D99449
2021-03-30 09:44:41 -07:00
Tomas Matheson a9968c0a33 [NFC][CodeGen] Tidy up TargetRegisterInfo stack realignment functions
Currently needsStackRealignment returns false if canRealignStack returns false.
This means that the behavior of needsStackRealignment does not correspond to
it's name and description; a function might need stack realignment, but if it
is not possible then this function returns false. Furthermore,
needsStackRealignment is not virtual and therefore some backends have made use
of canRealignStack to indicate whether a function needs stack realignment.

This patch attempts to clarify the situation by separating them and introducing
new names:

 - shouldRealignStack - true if there is any reason the stack should be
   realigned

 - canRealignStack - true if we are still able to realign the stack (e.g. we
   can still reserve/have reserved a frame pointer)

 - hasStackRealignment = shouldRealignStack && canRealignStack (not target
   customisable)

Targets can now override shouldRealignStack to indicate that stack realignment
is required.

This change will make it easier in a future change to handle the case where we
need to realign the stack but can't do so (for example when the register
allocator creates an aligned spill after the frame pointer has been
eliminated).

Differential Revision: https://reviews.llvm.org/D98716

Change-Id: Ib9a4d21728bf9d08a545b4365418d3ffe1af4d87
2021-03-30 17:31:39 +01:00
Craig Topper f069000b43 [RISCV] Remove floating point condition code legalization from lowerFixedLengthVectorSetccToRVV.
After D98939, this is done by LegalizeVectorOps making this code dead.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D99519
2021-03-30 09:11:56 -07:00
Sebastian Neubauer 1c3b74f0ab [AMDGPU] Remove outdated TODOs. NFC
spillSGPRToVGPR is already respected in these places since D95768.

Differential Revision: https://reviews.llvm.org/D99570
2021-03-30 15:18:49 +02:00
Krasimir Georgiev c51e91e046 Revert "[Passes] Add relative lookup table converter pass"
This reverts commit 5178ffc7cf.

Compiling `llvm-profdata` with a compiler build from this produces a
crashing binary.
2021-03-30 14:13:37 +02:00
Sanjay Patel e694e19a79 [x86] enhance matching of pmaddwd
This was crashing with the example from:
https://llvm.org/PR49716
...and that was avoided with a283d72583 ,
but as we can see from the SSE vs. AVX test code diff,
we can try harder to match the pattern.

This matcher code was adapted from another pmadd pattern
match in D49636, but it needs different ops to deal with
size mismatches.

Differential Revision: https://reviews.llvm.org/D99531
2021-03-30 07:28:33 -04:00
Juneyoung Lee 6b4b1dc6ec [LoopUnswitch] Simplify branch condition if it is select with constant operands
This fixes the miscompilation reported in https://reviews.llvm.org/rG5bb38e84d3d0#986154 .

`select _, true, false` matches both m_LogicalAnd and m_LogicalOr, making later
transformations confused.
Simplify the branch condition to not have the form.
2021-03-30 20:09:42 +09:00
Sander de Smalen f71ed5dfe2 NFC: Migrate PartialInlining to work on InstructionCost
This patch migrates cost values and arithmetic to work on InstructionCost.
When the interfaces to TargetTransformInfo are changed, any InstructionCost
state will propagate naturally.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D97382
2021-03-30 11:59:45 +01:00
David Green d4b3380dfe [ARM] Handle Splats in MVE lane interleaving
As another addition to MVE lane interleaving, this handles Splat shuffle
vectors, as the shuffle of a splat is a splat.

Differential Revision: https://reviews.llvm.org/D97291
2021-03-30 11:19:16 +01:00
David Sherwood a08c7736a7 [LoopVectorize] Add support for scalable vectorization of induction variables
This patch adds support for the vectorization of induction variables when
using scalable vectors, which required the following changes:

1. Removed assert from InnerLoopVectorizer::getStepVector.
2. Modified InnerLoopVectorizer::createVectorIntOrFpInductionPHI to use
   a runtime determined value for VF and removed an assert.
3. Modified InnerLoopVectorizer::buildScalarSteps to work for scalable
   vectors. I did this by calculating the full vector value for each Part
   of the unroll factor (UF) and caching this in the VP state. This means
   that we are always able to extract an arbitrary element from the vector
   if necessary. In addition to this, I also permitted the caching of the
   individual lane values themselves for the known minimum number of elements
   in the same way we do for fixed width vectors. This is a further
   optimisation that improves the code quality since it avoids unnecessary
   extractelement operations when extracting the first lane.
4. Added an assert to InnerLoopVectorizer::widenPHIInstruction, since while
   testing some code paths I noticed this is currently broken for scalable
   vectors.

Various tests to support different cases have been added here:

  Transforms/LoopVectorize/AArch64/sve-inductions.ll

Differential Revision: https://reviews.llvm.org/D98715
2021-03-30 11:13:31 +01:00
Krasimir Georgiev 8e7df996e3 Revert "[loop-idiom] Hoist loop memcpys to loop preheader"
This reverts commit 92ddd3c1b6.

Causes multistage clang crashes, e.g.:
https://lab.llvm.org/buildbot/#/builders/36/builds/6678
2021-03-30 11:47:12 +02:00
Joe Ellis a7dde4c5f7 [AArch64][SVE] Lower fixed length INSERT_VECTOR_ELT
Differential Revision: https://reviews.llvm.org/D98496
2021-03-30 09:37:11 +00:00
Joe Ellis c4d39f64d0 [AArch64][SVE] Lower fixed length EXTRACT_VECTOR_ELT
Differential Revision: https://reviews.llvm.org/D98625
2021-03-30 09:35:44 +00:00
Bing1 Yu 0c63b862c4 Revert "[X86] Pass to transform tdpbsud&tdpbusd&tdpbuud intrinsics to scalar operation"
This reverts commit 275df61f04.
2021-03-30 16:33:07 +08:00
Sander de Smalen 4ca860742d [InstructionCost] Don't conflate Invalid costs with Unknown costs.
We previously made a change to getUserCost to return a Invalid cost
when one of the TTI costs returned '-1' (meaning 'unknown' or
'infinitely expensive'). It makes no sense to say that:

  shufflevector <2 x i8> %x, <2 x i8> %y, <4 x i32> <i32 0, i32 1, i32 2, i32 3>

has an invalid cost. Perhaps the cost is not known, but the IR is valid
and can be code-generated. Invalid should only be used for IR that
cannot possibly be code-generated and where a cost is nonsensical.

With more passes now asserting that the cost must be valid, it is possible
that those assertions will fail for perfectly valid IR. An incomplete
cost-model probably shouldn't be a reason for the compiler to break.

It's better to consider these costs as 'very expensive' and ignore them
for other reasons. At some point, we should consider replacing -1 with
some other mechanism.

Reviewed By: paulwalker-arm, dmgreen

Differential Revision: https://reviews.llvm.org/D99502
2021-03-30 09:29:42 +01:00
Bing1 Yu 275df61f04 [X86] Pass to transform tdpbsud&tdpbusd&tdpbuud intrinsics to scalar operation
Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D99244
2021-03-30 16:21:10 +08:00
Stefan Gränitz c352a2b829 [lli] Add option -lljit-platform=Inactive to disable platform support explicitly
This option tells LLJIT to disable platform support explicitly: JITDylibs aren't scanned for special init/deinit symbols and no runtime API interposes are injected.
It's useful in two cases: for platforms that don't have such requirements and platforms for which we have no explicit support yet and that don't work well with the generic IR platform.

Reviewed By: lhames

Differential Revision: https://reviews.llvm.org/D99416
2021-03-30 09:29:45 +02:00
Han Zhu 92ddd3c1b6 [loop-idiom] Hoist loop memcpys to loop preheader
For a simple loop like:
```
struct S {
  int x;
  int y;
  char b;
};

unsigned foo(S* __restrict__ a, S* b, int n) {
  for (int i = 0; i < n; i++)
    a[i] = b[i];

  return sizeof(a[0]);
}
```
We could eliminate the loop and convert it to a large memcpy of 12*n bytes. Currently this is not handled. Output of `opt -loop-idiom -S < memcpy_before.ll`
```
%struct.S = type { i32, i32, i8 }

define dso_local i32 @_Z3fooP1SS0_i(%struct.S* noalias nocapture %a, %struct.S* nocapture readonly %b, i32 %n) local_unnamed_addr {
entry:
  %cmp7 = icmp sgt i32 %n, 0
  br i1 %cmp7, label %for.body.preheader, label %for.cond.cleanup

for.body.preheader:                               ; preds = %entry
  br label %for.body

for.cond.cleanup.loopexit:                        ; preds = %for.body
  br label %for.cond.cleanup

for.cond.cleanup:                                 ; preds = %for.cond.cleanup.loopexit, %entry
  ret i32 12

for.body:                                         ; preds = %for.body, %for.body.preheader
  %i.08 = phi i32 [ %inc, %for.body ], [ 0, %for.body.preheader ]
  %idxprom = zext i32 %i.08 to i64
  %arrayidx = getelementptr inbounds %struct.S, %struct.S* %b, i64 %idxprom
  %arrayidx2 = getelementptr inbounds %struct.S, %struct.S* %a, i64 %idxprom
  %0 = bitcast %struct.S* %arrayidx2 to i8*
  %1 = bitcast %struct.S* %arrayidx to i8*
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* nonnull align 4 dereferenceable(12) %0, i8* nonnull align 4 dereferenceable(12) %1, i64 12, i1 false)
  %inc = add nuw nsw i32 %i.08, 1
  %cmp = icmp slt i32 %inc, %n
  br i1 %cmp, label %for.body, label %for.cond.cleanup.loopexit
}

; Function Attrs: argmemonly nofree nosync nounwind willreturn
declare void @llvm.memcpy.p0i8.p0i8.i64(i8* noalias nocapture writeonly, i8* noalias nocapture readonly, i64, i1 immarg) #0

attributes #0 = { argmemonly nofree nosync nounwind willreturn }

```
The loop idiom pass currently only handles load and store instructions. Since struct S is too big to fit in a register, the loop body contains a memcpy intrinsic.

With this change, re-run `opt -loop-idiom -S < memcpy_before.ll`. The loop memcpy is promoted to loop preheader. For this trivial case, the loop is dead and will be removed by another pass.
```
%struct.S = type { i32, i32, i8 }

define dso_local i32 @_Z3fooP1SS0_i(%struct.S* noalias nocapture %a, %struct.S* nocapture readonly %b, i32 %n) local_unnamed_addr {
entry:
  %a1 = bitcast %struct.S* %a to i8*
  %b2 = bitcast %struct.S* %b to i8*
  %cmp7 = icmp sgt i32 %n, 0
  br i1 %cmp7, label %for.body.preheader, label %for.cond.cleanup

for.body.preheader:                               ; preds = %entry
  %0 = zext i32 %n to i64
  %1 = mul nuw nsw i64 %0, 12
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 4 %a1, i8* align 4 %b2, i64 %1, i1 false)
  br label %for.body

for.cond.cleanup.loopexit:                        ; preds = %for.body
  br label %for.cond.cleanup

for.cond.cleanup:                                 ; preds = %for.cond.cleanup.loopexit, %entry
  ret i32 12

for.body:                                         ; preds = %for.body, %for.body.preheader
  %i.08 = phi i32 [ %inc, %for.body ], [ 0, %for.body.preheader ]
  %idxprom = zext i32 %i.08 to i64
  %arrayidx = getelementptr inbounds %struct.S, %struct.S* %b, i64 %idxprom
  %arrayidx2 = getelementptr inbounds %struct.S, %struct.S* %a, i64 %idxprom
  %2 = bitcast %struct.S* %arrayidx2 to i8*
  %3 = bitcast %struct.S* %arrayidx to i8*
  %inc = add nuw nsw i32 %i.08, 1
  %cmp = icmp slt i32 %inc, %n
  br i1 %cmp, label %for.body, label %for.cond.cleanup.loopexit
}

; Function Attrs: argmemonly nofree nosync nounwind willreturn
declare void @llvm.memcpy.p0i8.p0i8.i64(i8* noalias nocapture writeonly, i8* noalias nocapture readonly, i64, i1 immarg) #0

attributes #0 = { argmemonly nofree nosync nounwind willreturn }
```

Reviewed By: zino

Differential Revision: https://reviews.llvm.org/D97667
2021-03-29 23:36:26 -07:00
Han Zhu 2bd4049ceb Revert "[loop-idiom] Hoist loop memcpys to loop preheader"
This reverts commit deb5095833.

Bad commit message.
2021-03-29 23:35:35 -07:00
Han Zhu deb5095833 [loop-idiom] Hoist loop memcpys to loop preheader
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

Blame Revision:

Differential Revision: https://phabricator.intern.facebook.com/D26380397
2021-03-29 23:14:42 -07:00
Alok Kumar Sharma 9fb0025f70 [DebugInfo] Upgrade DISubragne::count to accept DIExpression also
This is needed for Fortran assumed shape arrays whose dimensions are
defined as,
  - 'count' is taken from array descriptor passed as parameter by
    caller, access from descriptor is defined by type DIExpression.
  - 'lowerBound' is defined by callee.
The current alternate way represents using upperBound in place of
count, where upperBound is calculated in callee in a temp variable
using lowerBound and count

Representation with count (DIExpression) is not only clearer as
compared to upperBound (DIVariable) but it has another advantage that
variable count is accessed by being parameter has better chance of
survival at higher optimization level than upperBound being local
variable.

Reviewed By: aprantl

Differential Revision: https://reviews.llvm.org/D99335
2021-03-30 09:16:55 +05:30
Rahman Lavaee 90c401cab6 [Propeller] Do not generate the BB address map for empty functions.
Empty functions (functions with no real code) are irrelevant for propeller optimizations and their addresses sometimes conflict with other functions which obfuscates the analysis.
This simple change skips the BB address map emission for such functions.

Reviewed By: tmsriram

Differential Revision: https://reviews.llvm.org/D99395
2021-03-29 20:15:01 -07:00
Jun Ma 65462a08bf [NFC][SVE] Remove redundant pattern 2021-03-30 10:35:08 +08:00
Jun Ma 1af373c673 [AArch64][SVE] Codegen dup_lane for dup(vector_extract)
Differential Revision: https://reviews.llvm.org/D99324
2021-03-30 10:35:08 +08:00
Jun Ma b0db2dbc29 [AArch64][SVEIntrinsicOpts] Optimize tbl+dup into dup+extractelement
Differential Revision: https://reviews.llvm.org/D99412
2021-03-30 10:35:08 +08:00
Evandro Menezes fd94cfeeb5 [RISCV] Move scheduling resources for B into a separate file (NFC)
Differential Revision: https://reviews.llvm.org/D99557
2021-03-29 20:37:22 -05:00
Adrian Prantl 8573c28a51 Add debug support for set types
This commit adds debugging support for set types defined in languages
such as Pascal and Modula-2.

Patch by Peter McKinna!

Differential Revision: https://reviews.llvm.org/D76115
2021-03-29 18:04:48 -07:00
Thomas Lively a1b8b0739a [WebAssembly] Fix i8x16.popcnt opcode
When I updated the SIMD opcodes in f5764a8654, I accidentally missed updating
i8x16.popcnt. This patch fixes the omission.

Differential Revision: https://reviews.llvm.org/D99536
2021-03-29 17:23:15 -07:00
Huihui Zhang ca721042f1 [IPO][SampleContextTracker] Use SmallVector to track context profiles to prevent non-determinism.
Use SmallVector instead of SmallSet to track the context profiles mapped. Doing this
can help avoid non-determinism caused by iterating over unordered containers.

This bug was found with reverse iteration turning on,
--extra-llvm-cmake-variables="-DLLVM_REVERSE_ITERATION=ON".
Failing LLVM test profile-context-tracker-debug.ll .

Reviewed By: MaskRay, wenlei

Differential Revision: https://reviews.llvm.org/D99547
2021-03-29 16:37:10 -07:00
Jonas Devlieghere e0577b3130 [dsymutil] Relocate DW_TAG_label
dsymutil is not relocating the DW_AT_low_pc for a DW_TAG_label. This
patch fixes that and adds a test.

Differential revision: https://reviews.llvm.org/D99534
2021-03-29 15:45:48 -07:00
Gulfem Savrun Yeniceri 5178ffc7cf [Passes] Add relative lookup table converter pass
Lookup tables generate non PIC-friendly code, which requires dynamic relocation as described in:
https://bugs.llvm.org/show_bug.cgi?id=45244

This patch adds a new pass that converts lookup tables to relative lookup tables to make them PIC-friendly.

Differential Revision: https://reviews.llvm.org/D94355
2021-03-29 21:53:32 +00:00
Florian Hahn 482283042f
[AArch64] Remove custom zext/sext legalization code.
Currently performExtendCombine assumes that the src-element bitwidth * 2
is a valid MVT. But this is not the case for i1 and it causes a crash on
the v64i1 test cases added in this patch.

It turns out that this code appears to not be needed; the same patterns are
handled by other code and we end up with the same results, even without the
custom lowering. I also added additional test cases in a50037aaa6.

Let's just remove the unneeded code.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D99437
2021-03-29 22:22:05 +01:00
Nikita Popov 7669455df4 [X86][FastISel] Fix with.overflow eflags clobber (PR49587)
If the successor block has a phi node, then additional moves may
be inserted into predecessors, which may clobber eflags. Don't try
to fold the with.overflow result into the branch in that case.

This is done by explicitly checking for any phis in successor
blocks, not sure if there's some more principled way to address
this. Other fused compare and branch patterns avoid the issue by
emitting the comparison when handling the branch, so that no
instructions may be inserted in between. In this case, the
with.overflow call is emitted separately (and I don't think this
is avoidable, as it will generally have at least two users).

Fixes https://bugs.llvm.org/show_bug.cgi?id=49587.

Differential Revision: https://reviews.llvm.org/D98600
2021-03-29 23:08:47 +02:00
Stanislav Mekhanoshin 619b88849e [AMDGPU] Fix "Sequence" spelling. NFC. 2021-03-29 12:11:36 -07:00
Joe Nash 45fd7c02af Revert "[AMDGPU] Mark additional VOP3 as commutable"
This reverts commit d35d8da7d6.
2021-03-29 14:48:11 -04:00
Joe Nash d35d8da7d6 [AMDGPU] Mark additional VOP3 as commutable
Note, only src0 and src1 will be commuted if the isCommutable flag
is set. This patch does not change that, it just makes it possible
to commute src0 and src1 of more instructions.

Reviewed By: foad, rampitec

Differential Revision: https://reviews.llvm.org/D99376

Change-Id: I61e20490962d95ea429beb355c55f55c024dafdc
2021-03-29 14:22:20 -04:00
Roger Ferrer Ibanez 489ca73ac4 [PrologEpilogInserter][AMDGPU] Only adjust offset for emergency spill slots if the stack grows down
D89239 adjusts the stack offset of emergency spill slots for overaligned
stacks. However the adjustment is not valid for targets whose stack
grows up (such as AMDGPU).

This change makes the adjustment conditional only to those targets whose
stack grows down.

Fixes https://bugs.llvm.org/show_bug.cgi?id=49686

Differential Revision: https://reviews.llvm.org/D99504
2021-03-29 17:26:58 +00:00
Craig Topper 3dd4aa7d09 [RISCV] When custom iseling masked loads/stores, copy the mask into V0 instead of virtual register.
This matches what we do in our isel patterns. In our internal
testing we've found this is needed to make the fast register
allocator happy at -O0. Otherwise it may assign V0 to an earlier
operand and find itself with no registers left when it reaches
the mask operand. By using V0 explicitly, the fast register allocator
will see it when it checks for phys register usages before it
starts allocating vregs. I'll try to update this with a test case.

Unfortunately, this does appear to prevent some instruction reordering
by the pre-RA scheduler which leads to the increased spills seen in
some tests. I suspect that problem could already occur for other
instructions that already used V0 directly.

There's a lot of repeated code here that could do with some
wrapper functions. Not sure if that should be at the level of the
new code that deals with V0. That would require multiple output
parameters to pass the glue, chain and register back. Maybe it
should be at a higher level over the entire set of push_backs.

Reviewed By: frasercrmck, HsiangKai

Differential Revision: https://reviews.llvm.org/D99367
2021-03-29 10:20:43 -07:00
Craig Topper 54bacaf311 [X86] Always use rip-relative addressing on 64-bit when rematerializing all zeros/ones registers using a folded load.
Previously we only used RIP relative when PIC was enabled. But
we know we're in small/kernel code model here so we should
be able to always use RIP-relative which will give a smaller
encoding.

Here's a godbolt link that demonstrates the current codegen https://godbolt.org/z/j3158o
Note in the non-PIC version the load from .LCPI0_0 doesn't use
RIP-relative addressing, but if you change the constant in the
source from 0.0 to 1.0 it will become RIP-relative.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D97208
2021-03-29 10:06:17 -07:00
Roger Ferrer Ibanez ef76a333fa [RISCV] Fix offset computation for RVV
In D97111 we changed the RVV frame layout when using sp or bp to address
the stack slots so we could address the emergency stack slot. The idea
is to put the RVV objects as far as possible (in offset terms) from the
frame reference register (sp / fp / bp).

When using fp this happens naturally because the RVV objects are already
the top of the stack and due to the constraints of RVV (VLENB being a
power of two >= 128) the stack remains aligned. The rest of this summary
does not apply to this case.

When using sp / bp we need to skip the non-RVV stack slots. The size of
the the non-RVV objects is computed subtracting the callee saved
register size (whose computation is added in D97111 itself) to the total
size of the stack (which does not account for RVV stack slots). However,
when doing so we round to 16 bytes when computing that size and we end
emitting a smaller offset that may belong to a scalar stack slot (see
D98801). So this change removes that rounding.

Also, because we want the RVV objects be between the non-RVV stack slots
and the callee-saved register slots, we need to make sure the RVV
objects are properly aligned to 8 bytes. Adding a padding of 8 would
render the stack unaligned. So when allocating space for RVV (only when
we don't use fp) we need to have extra padding that preserves the stack
alignment. This way we can round to 8 bytes the offset that skips the
non-RVV objects and we do not misalign the whole stack in the way. In
some circumstances this means that the RVV objects may have padding
before (=lower offsets from sp/bp) and after (before the CSR stack
slots).

Differential Revision: https://reviews.llvm.org/D98802
2021-03-29 17:03:49 +00:00
Wenlei He 30b0232336 [CSSPGO][llvm-profgen] Context-sensitive global pre-inliner
This change sets up a framework in llvm-profgen to estimate inline decision and adjust context-sensitive profile based on that. We call it a global pre-inliner in llvm-profgen.

It will serve two purposes:
  1) Since context profile for not inlined context will be merged into base profile, if we estimate a context will not be inlined, we can merge the context profile in the output to save profile size.
  2) For thinLTO, when a context involving functions from different modules is not inined, we can't merge functions profiles across modules, leading to suboptimal post-inline count quality. By estimating some inline decisions, we would be able to adjust/merge context profiles beforehand as a mitigation.

Compiler inline heuristic uses inline cost which is not available in llvm-profgen. But since inline cost is closely related to size, we could get an estimate through function size from debug info. Because the size we have in llvm-profgen is the final size, it could also be more accurate than the inline cost estimation in the compiler.

This change only has the framework, with a few TODOs left for follow up patches for a complete implementation:
  1) We need to retrieve size for funciton//inlinee from debug info for inlining estimation. Currently we use number of samples in a profile as place holder for size estimation.
  2) Currently the thresholds are using the values used by sample loader inliner. But they need to be tuned since the size here is fully optimized machine code size, instead of inline cost based on not yet fully optimized IR.

Differential Revision: https://reviews.llvm.org/D99146
2021-03-29 09:46:14 -07:00
Wei Mi 3cbf44190b [SampleFDO] Do not scale the magic number NOMORE_ICP_MAGICNUM in value profile
during profile update.

When we inline a function and update the profile, the value profiles of the
indirect call in the inliner and inlinee will be scaled. In
https://reviews.llvm.org/D96806 and https://reviews.llvm.org/D97350, we start
using the magic number NOMORE_ICP_MAGICNUM (-1) to mark targets which have
been promoted. The magic number shouldn't be scaled during the profile update.

Although the problem has been suppressed by https://reviews.llvm.org/D98187
for SampleFDO, which stops profile update for inlining in sampleFDO, the patch
is still wanted since it will be more consistent to handle the magic number
properly in profile update.

Differential Revision: https://reviews.llvm.org/D99394
2021-03-29 09:34:37 -07:00
Florian Hahn c773d0f973
Recommit "[LV] Move runtime pointer size check to LVP::plan()."
Re-apply 25fbe803d4, with a small update to emit the right remark
class.

Original message:
    [LV] Move runtime pointer size check to LVP::plan().

    This removes the need for the remaining doesNotMeet check and instead
    directly checks if there are too many runtime checks for vectorization
    in the planner.

    A subsequent patch will adjust the logic used to decide whether to
    vectorize with runtime to consider their cost more accurately.

    Reviewed By: lebedev.ri
2021-03-29 16:14:27 +01:00
Bradley Smith 9745dce8c3 [SelectionDAG][AArch64][SVE] Perform SETCC condition legalization in LegalizeVectorOps
This is currently performed in SelectionDAGLegalize, here we make it also
happen in LegalizeVectorOps, allowing a target to lower the SETCC condition
codes first in LegalizeVectorOps and then lower to a custom node afterwards,
without having to duplicate all of the SETCC condition legalization in the
target specific lowering.

As a result of this, fixed length floating point SETCC nodes can now be
properly lowered for SVE.

Differential Revision: https://reviews.llvm.org/D98939
2021-03-29 15:32:25 +01:00
Florian Hahn 485c8ce733
Revert "[LV] Move runtime pointer size check to LVP::plan()."
This reverts commit 25fbe803d4.

This breaks a clang test which filters for the wrong remark type.
2021-03-29 14:41:53 +01:00
Sanjay Patel da381cf7ce [SLP] allow matching integer min/max intrinsics as reduction ops
This is a 2nd try of:
3c8473ba53
which was reverted at:
 a26312f9d4
because of crashing.

This version includes extra code and tests to avoid the known
crashing examples as discussed in PR49730.

Original commit message:
As noted in D98152, we need to patch SLP to avoid regressions when
we start canonicalizing to integer min/max intrinsics.
Most of the real work to make this possible was in:
7202f47508

Differential Revision: https://reviews.llvm.org/D98981
2021-03-29 09:38:18 -04:00
Paul C. Anagnostopoulos 5f473a04af [TableGen] Add support for the 'assert' statement in class definitions.
Differential Revision: https://reviews.llvm.org/D99275
2021-03-29 09:20:29 -04:00
Florian Hahn 25fbe803d4
[LV] Move runtime pointer size check to LVP::plan().
This removes the need for the remaining doesNotMeet check and instead
directly checks if there are too many runtime checks for vectorization
in the planner.

A subsequent patch will adjust the logic used to decide whether to
vectorize with runtime to consider their cost more accurately.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D98634
2021-03-29 14:12:29 +01:00
Matt Arsenault 9a0c9402fa Reapply "OpaquePtr: Turn inalloca into a type attribute"
This reverts commit 07e46367ba.
2021-03-29 08:55:30 -04:00
Jingu Kang e4abb64100 [LoopUnswitch] Use reference variables instead of pointer one
Differential Revision: https://reviews.llvm.org/D99496
2021-03-29 13:08:46 +01:00
Hans Wennborg c6e5c4654b Don't use $ as suffix for symbol names in ThinLTOBitcodeWriter and other places
Using $ breaks demangling of the symbols. For example,

$ c++filt _Z3foov\$123
_Z3foov$123

This causes problems for developers who would like to see nice stack traces
etc., but also for automatic crash tracking systems which try to organize
crashes based on the stack traces.

Instead, use the period as suffix separator, since Itanium demanglers normally
ignore such suffixes:

$ c++filt _Z3foov.123
foo() [clone .123]

This is already done in some places; try to do it everywhere.

Differential revision: https://reviews.llvm.org/D97484
2021-03-29 13:03:52 +02:00
Oliver Stannard 07e46367ba Revert "Reapply "OpaquePtr: Turn inalloca into a type attribute""
Reverting because test 'Bindings/Go/go.test' is failing on most
buildbots.

This reverts commit fc9df30991.
2021-03-29 11:32:22 +01:00
Simon Pilgrim 805148eaf2 [X86][SSE] combineHorizOpWithShuffle - consistently use getTargetShuffleInputs to decode shuffles
Minor cleanup before I start trying to merge the unary/binary shuffle combining paths.
2021-03-29 11:31:19 +01:00
Nashe Mncube 19601a4c6c [SVE][Analysis]Instruction costs for ops on scalable-vec
The following operations have no associated cost for them
when applied to scalable vectors, and as a consequence
can trigger a crash when a call is made to
AArch64TTIImpl::getCastInstrCost():
- fptrunc
- trunc
- fpext
- fpto(u,s)i

This patch adds costs for these operations and
relevant regression tests.

Differential Revision: https://reviews.llvm.org/D98934
2021-03-29 11:15:50 +01:00
Jingu Kang cfe87d4edd [NFC][LoopUnswitch] Move hasPartialIVCondition to LoopUtils
Differential revision: https://reviews.llvm.org/D99490
2021-03-29 10:29:45 +01:00
David Green 3a68c6d26c [ARM] Extend MVE lane interleaving to handle other non-instruction leaves
This extends the recent MVE lane interleaving passto handle other
non-instruction leaves, for which a new shuffle is added. This helps
especially for constants and potentially for arguments.

Differential Revision: https://reviews.llvm.org/D97289
2021-03-29 09:05:45 +01:00
Lang Hames 666df2e2cb [ORC][C-bindings] Fix some ORC C bindings function names and signatures.
LLVMOrcDisposeObjectLayer and LLVMOrcExecutionSessionGetJITDylibByName did not
have matching signatures between the C-API header and binding implementations.
Fixes http://llvm.org/PR49745.

Patch by Mats Larsen. Thanks Mats!

Reviewed by: lhames

Differential Revision: https://reviews.llvm.org/D99478
2021-03-28 16:30:47 -07:00
David Green 6c88ffeda3 [ARM] Fix the Changed value in the MVE lane interleaving pass. 2021-03-28 23:47:53 +01:00
Nikita Popov ce066da81c [BasicAA] Make sure types match in constant offset heuristic
This can only happen if offset types that are larger than the
pointer size are involved. The previous implementation did not
assert in this case because it initialized the APInts to the
width of one of the variables -- though I strongly suspect it
did not compute correct results in this case.

Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=32621
reported by fhahn.
2021-03-28 21:38:09 +02:00
Craig Topper 69bdf35dc7 [X86] Optimize vXi8 MULHS on targets where we can't sign_extend to the next register size.
For these cases we need to extract the upper or lower elements,
multiply them using 16-bit multiplies and repack them.

Previously we used punpcklbw/punpckhbw+psraw or pmovsxbw+pshudfd to
extract and sign extend so we could use pmullw to compute the 16-bit
product and then shift down the high bits.

We can avoid the need to sign extend if we unpack the bytes into
the high byte of each word and fill the lower byte with 0 using
pxor. This puts the sign bit of each byte into the sign bit of
each word. Since the LHS and RHS have 8 trailing zeros, the full
32-bit product of those 16-bit values will have 16 trailing zeros.
This means the 16-bit product of the original bytes is in the upper
16 bits which we can calculate using pmulhw.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D98587
2021-03-28 11:41:29 -07:00
David Green 7b6f760fcd [ARM] MVE vector lane interleaving
MVE does not have a single sext/zext or trunc instruction that takes the
bottom half of a vector and extends to a full width, like NEON has with
MOVL. Instead it is expected that this happens through top/bottom
instructions. So the MVE equivalent VMOVLT/B instructions take either
the even or odd elements of the input and extend them to the larger
type, producing a vector with half the number of elements each of double
the bitwidth. As there is no simple instruction for a normal extend, we
often have to expand sext/zext/trunc into a series of lane moves (or
stack loads/stores, which we do not do yet).

This pass takes vector code that starts at truncs, looks for
interconnected blobs of operations that end with sext/zext and
transforms them by adding shuffles so that the lanes are interleaved and
the MVE VMOVL/VMOVN instructions can be used. This is done pre-ISel so
that it can work across basic blocks.

This initial version of the pass just handles a limited set of
instructions, not handling constants or splats or FP, which can all come
as extensions to this base.

Differential Revision: https://reviews.llvm.org/D95804
2021-03-28 19:34:58 +01:00
Matt Arsenault fc9df30991 Reapply "OpaquePtr: Turn inalloca into a type attribute"
This reverts commit 20d5c42e0e.
2021-03-28 13:35:21 -04:00
Sanjay Patel 01ae6e5ead [InstCombine] sink min/max intrinsics with common op after select
This is another step towards parity with cmp+select min/max idioms.

See D98152.
2021-03-28 13:13:04 -04:00
Nico Weber 20d5c42e0e Revert "OpaquePtr: Turn inalloca into a type attribute"
This reverts commit 4fefed6563.
Broke check-clang everywhere.
2021-03-28 13:02:52 -04:00
Matt Arsenault 4fefed6563 OpaquePtr: Turn inalloca into a type attribute
I think byval/sret and the others are close to being able to rip out
the code to support the missing type case. A lot of this code is
shared with inalloca, so catch this up to the others so that can
happen.
2021-03-28 11:12:23 -04:00