Commit Graph

149978 Commits

Author SHA1 Message Date
Jon Chesterfield 21d91a8ef3 [libomptarget][devicertl] Replace lanemask with uint64 at interface
Use uint64_t for lanemask on all GPU architectures at the interface
with clang. Updates tests. The deviceRTL is always linked as IR so the zext
and trunc introduced for wave32 architectures will fold after inlining.

Simplification partly motivated by amdgpu gfx10 which will be wave32 and
is awkward to express in the current arch-dependant typedef interface.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D108317
2021-08-18 20:47:33 +01:00
Anton Afanasyev cfb6dfcbd1 [AggressiveInstCombine] Add logical shift right instr to `TruncInstCombine` DAG
Add `lshr` instruction to the DAG post-dominated by `trunc`, allowing
TruncInstCombine to reduce bitwidth of expressions containing
these instructions.

We should be shifting by less than the target bitwidth.
Also it is sufficient to require that all truncated bits
of the value-to-be-shifted are zeros: https://alive2.llvm.org/ce/z/_LytbB

Alive2 variable-length proof:
https://godbolt.org/z/1srE1aqzf => s/32/8/ => https://alive2.llvm.org/ce/z/StwPia

Part of https://reviews.llvm.org/D107766

Differential Revision: https://reviews.llvm.org/D108201
2021-08-18 22:20:58 +03:00
Ali Sedaghati cc7bcef3e3 Reapply: [NFC] factor out unrolling decision logic
reverting ffd8a268bd (reapplying
4d559837e8) - removed spurious inclusion
of <optional>

Differential Revision: https://reviews.llvm.org/D106001
2021-08-18 12:04:33 -07:00
Geoffrey Martin-Noble ffd8a268bd Revert "[NFC] factor out unrolling decision logic"
This patch added a requirement for C++17, while LLVM is supposed to
build with C++14
(https://llvm.org/docs/CodingStandards.html#c-standard-versions). Posted
a note to the original review thread (https://reviews.llvm.org/D106001).

This reverts commit 4d559837e8.

Differential Revision: https://reviews.llvm.org/D108314
2021-08-18 11:38:48 -07:00
Joe Nash 9dbc968ed9 [AMDGPU] Fix atomic float max/min intrinsics
Hooked up raw.buffer.atomic.fmin/max.f64
This instruction should be available on GFX6, GFX7, and GFX10.
It was implemented for GFX90a with a different name.

Added intrinsic def for image_atomic_fmin/fmax; the instruction
defs were already there.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D108208

Change-Id: I473f98d28b2afbeeb2c27822d9686b5e86634e2f
2021-08-18 14:12:42 -04:00
Nikita Popov 3dd8c9176b [LICM] Remove AST-based implementation
MSSA-based LICM has been enabled by default for a few years now.
This drops the old AST-based implementation. Using loop(licm) will
result in a fatal error, the use of loop-mssa(licm) is required
(or just licm, which defaults to loop-mssa).

Note that the core canSinkOrHoistInst() logic has to retain AST
support for now, because it is shared with LoopSink.

Differential Revision: https://reviews.llvm.org/D108244
2021-08-18 20:21:53 +02:00
Ali Sedaghati 4d559837e8 [NFC] factor out unrolling decision logic
Decoupling the unrolling logic into three different functions. The shouldPragmaUnroll() covers the 1st and 2nd priorities of the previous code, the shouldFullUnroll() covers the 3rd, and the shouldPartialUnroll() covers the 5th. The output of each function, Optional<unsigned>, could be a value for UP.Count, which means unrolling factor has been set, or None, which means decision hasn't been made yet and should try the next priority.

Reviewed By: mtrofin, jdoerfert

Differential Revision: https://reviews.llvm.org/D106001
2021-08-18 11:21:40 -07:00
Arthur Eubanks fde0eb1f9a [NFC] A couple more removeAttribute() cleanups 2021-08-18 11:15:20 -07:00
Arthur Eubanks 2fc075948c [NFC] Remove some unnecessary AttributeList methods
These rely on methods I'm trying to cleanup.
2021-08-18 11:15:20 -07:00
Craig Topper 3f9b37ccb1 [RISCV] Remove sext_inreg+add/sub/mul/shl isel patterns.
Let the sext_inreg be selected to sext.w. Remove unneeded sext.w
during PostProcessISelDAG.

This gives opportunities for some other isel patterns to match
like the ADDIPair or matching mul with immediate to shXadd.

This becomes possible after D107658 started selecting W instructions
based on users. The sext.w will be considered a W user so isel
will often select a W instruction for the sext.w input and we can
just remove the sext.w. Otherwise we can combine the sext.w with
a ADD/SUB/MUL/SLLI to create a new W instruction in parallel
to the the original instruction.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D107708
2021-08-18 11:07:11 -07:00
Jessica Paquette 791006fb8c [GlobalISel] Implement lowering for G_ISNAN + use it in AArch64
GlobalISel equivalent to `TargetLowering::expandISNAN`.

Use it in AArch64 and add a testcase.

Differential Revision: https://reviews.llvm.org/D108227
2021-08-18 10:54:25 -07:00
Han Zhu 687f046c97 [NFC][loop-idiom] Rename Stores to IgnoredInsts; Fix a typo
When dealing with memmove, we also add the load instruction to the ignored
instructions list passed to `mayLoopAccessLocation`. Renaming "Stores" to
"IgnoredInsts" to be more precise.

Differential Revision: https://reviews.llvm.org/D108275
2021-08-18 10:52:16 -07:00
Jessica Paquette d9873711cb [GlobalISel] Add IRTranslator support for G_ISNAN
Translate the `@llvm.isnan` intrinsic to G_ISNAN when we see it.

This is pretty much the same as the associated SelectionDAGBuilder code. Main
difference is that we don't expand it here. It makes more sense to do that
during legalization in GlobalISel. GlobalISel will just legalize the generated
illegal types.

Differential Revision: https://reviews.llvm.org/D108226
2021-08-18 10:48:10 -07:00
Craig Topper 6d7ea597ef [RISCV] Insert sext_inreg when type legalizing add/sub/mul with constant LHS.
We already do this for non-constants RHS. This just removes the
special case. I believe the special case may have been needed
because the ANY_EXTEND of a constant used to create zero extended
constants, but we recently changed that to produce sign extended
constants.

D107658 is needed to prevent some regressions.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D107697
2021-08-18 10:44:25 -07:00
Jessica Paquette 0a2b1ba33a [GlobalISel] Add G_ISNAN
Add a generic opcode equivalent to the `llvm.isnan` intrinsic +
MachineVerifier support for it.

We need an opcode here because we may want target-specific lowering later on.

Differential Revision: https://reviews.llvm.org/D108222
2021-08-18 10:42:05 -07:00
Craig Topper 20e6265873 [RISCV] Improve constant materialization for stores of i16 or i32 negative constants.
DAGCombiner::visitStore can clear the upper bits of constants
used by stores. This leads prevents them from being recognized as
sign extended negative values making them more expensive to
materialize.

This patch uses the hasAllNBitUsers method from D107658 to make
a negative constant if none of the users care about the upper bits.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D108052
2021-08-18 10:25:12 -07:00
Craig Topper d9ba1a9c5c [RISCV] Teach isel to select ADDW/SUBW/MULW/SLLIW when only the lower 32-bits are used.
We normally select these when the root node is a sext_inreg, but
SimplifyDemandedBits can sometimes bypass the sext_inreg for some
users. This can create situation where sext_inreg+add/sub/mul/shl
is selected to a W instruction, and then the add/sub/mul/shl is
separately selected to a non-W instruction with the same inputs.

This patch tries to detect when it would still be ok to use a W
instruction without the sext_inreg by checking the direct users.
This can allow the W instruction to CSE with one created for a
sext_inreg+add/sub/mul/shl. To minimize complexity and cost of
checking, we make no attempt to determine if the CSE will happen
and just always use a W instruction when we can.

Differential Revision: https://reviews.llvm.org/D107658
2021-08-18 10:22:00 -07:00
Craig Topper f70238914a [RISCV] Add zext.h/zext.w to RISCVTTIImpl::getIntImmCostInst.
If we have these instructions, we don't need to hoist the immediate
for an AND that would match them.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D107783
2021-08-18 09:40:40 -07:00
Arthur Eubanks 7557d6c896 [NFC] Cleanup calls to CallBase::getAttribute() 2021-08-18 09:39:33 -07:00
Florian Mayer 164e09de2e [hwasan] Default -hwasan-use-stack-safety to off.
This very occasionally causes to an assertion failure in the compiler.
Turning off until we can get to the bottom of this.

Reviewed By: hctim

Differential Revision: https://reviews.llvm.org/D108282
2021-08-18 17:21:32 +01:00
Kazu Hirata e0ff1e9659 [Bitcode] Remove unused declaration writeGlobalVariableMetadataAttachment (NFC)
The declaration was introduced without a corresponding definition on
May 31, 2016 in commit cceae7feda.
2021-08-18 09:16:05 -07:00
David Sherwood 219d4518fc [Analysis][AArch64] Make fixed-width ordered reductions slightly more expensive
For tight loops like this:

  float r = 0;
  for (int i = 0; i < n; i++) {
    r += a[i];
  }

it's better not to vectorise at -O3 using fixed-width ordered reductions
on AArch64 targets. Although the resulting number of instructions in the
generated code ends up being comparable to not vectorising at all, there
may be additional costs on some CPUs, for example perhaps the scheduling
is worse. It makes sense to deter vectorisation in tight loops.

Differential Revision: https://reviews.llvm.org/D108292
2021-08-18 17:01:56 +01:00
Joseph Huber 13d8f000d7 [OpenMP][NFC] Improve debug message for shared memory
Summary:
Make the debug message for HeapToShared more helpful by showing the
actual call.
2021-08-18 11:56:09 -04:00
Arthur Eubanks 3af250ff1e Add some Function method definitions accidentally removed
In cc327bd523.
2021-08-18 08:28:57 -07:00
Joseph Huber 58f9326487 [OpenMP] Change AAKernelInfo to ignore non-kernels
Currently, AAKernelInfo will fail on an assertion if we attempt to run
it on a kernel without the init / deinit runtime calls. However, this
occurs for global constructors on the device. This will cause OpenMPOpt
to crash whenever global constructors are present. This patch removes
this assertion and just gives up instead.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D108258
2021-08-18 11:24:29 -04:00
Bing1 Yu ffe58de393 [X86] [AMX] Fix the test case failure caused by D107544.
The issue can be duplicated when EXPENSIVE_CHECKS is specified for llvm
build. Thank Simon report this issue at
https://bugs.llvm.org/show_bug.cgi?id=51513. We need return correct
value for the changed IR.

Reviewed By: RKSimon, LuoYuanke

Differential Revision: https://reviews.llvm.org/D108269
2021-08-18 22:27:22 +08:00
Lang Hames 45ac5f5441 Revert "[ORC-RT][ORC] Introduce ELF/*nix Platform and runtime support."
This reverts commit e256445bff.

This commit broke some of the bots (see e.g.
https://lab.llvm.org/buildbot/#/builders/112/builds/8599). Reverting while I
investigate.
2021-08-18 20:42:23 +10:00
Tim Northover 8eb054a87d AArch64: compare correct type for multi-valued SDNode.
If Orig produces more than one value (rare) with different types (rarer) then
we need to make sure we check against the one that Orig actually represents,
not just the first type.

Unfortunately because of the combination of things that need to happen I wasn't
able to produce a test.
2021-08-18 09:35:31 +01:00
Lang Hames 29ff2e879f [JILink][ELF] Include binding and visibility values in error messages.
This should make it easier to track down JITLink errors for unrecognized
binding or visibility types, e.g.
https://lab.llvm.org/buildbot#builders/112/builds/8599.
2021-08-18 18:19:06 +10:00
Amara Emerson 284006079e [AArch64][GlobalISel] Add support for selection of s8:fpr = G_UNMERGE <8 x s8> 2021-08-18 00:34:06 -07:00
Petr Hosek 2d4470ab89 Revert "Allow rematerialization of virtual reg uses"
This reverts commit 877572cc19 which
introduced PR51516.
2021-08-18 00:12:41 -07:00
Petr Hosek 1c84167149 [InstrProfiling][NFC] Initialize MadeChange variable
This addresses an issue introduced in 389dc94d4b which triggers
a crash on Windows.
2021-08-17 23:33:38 -07:00
Christudasan Devadasan 4f5ba46e16 [AMDGPU] Set wait state for meta instructions to zero
It looked more reasonable to set the wait state to
zero for all non-instructions. With that we can avoid
the special handling for them in `getWaitStatesSince`
and `AdvanceCycle`. This NFC patch makes the handling
more generic.
2021-08-18 01:46:59 -04:00
Anton Afanasyev 803270c0c6 [AggressiveInstCombine] Fix unsigned overflow
Fix issue reported here: https://reviews.llvm.org/D108091#2950930
2021-08-18 08:42:46 +03:00
Lang Hames e256445bff [ORC-RT][ORC] Introduce ELF/*nix Platform and runtime support.
This change adds support to ORCv2 and the Orc runtime library for static
initializers, C++ static destructors, and exception handler registration for
ELF-based platforms, at present Linux and FreeBSD on x86_64. It is based on the
MachO platform and runtime support introduced in bb5f97e3ad.

Patch by Peter Housel. Thanks very much Peter!

Reviewed By: lhames

Differential Revision: https://reviews.llvm.org/D108081
2021-08-18 15:00:22 +10:00
Arthur Eubanks 3f4d00bc3b [NFC] More get/removeAttribute() cleanup 2021-08-17 21:05:41 -07:00
Arthur Eubanks de0ae9e89e [NFC] Cleanup more AttributeList::addAttribute() 2021-08-17 21:05:41 -07:00
Arthur Eubanks cc327bd523 [NFC] Cleanup attribute methods in Function 2021-08-17 21:05:40 -07:00
Arthur Eubanks ad727ab7d9 [NFC] Migrate some callers away from Function/AttributeLists methods that take an index
These methods can be confusing.
2021-08-17 21:05:40 -07:00
Arthur Eubanks 46cf82532c [NFC] Replace Function handling of attributes with less confusing calls
To avoid magic constants and confusing indexes.
2021-08-17 21:05:40 -07:00
Jun Ma 9934a5b2ed [CVP] processSwitch: Remove default case when switch cover all possible values.
Differential Revision: https://reviews.llvm.org/D106056
2021-08-18 10:23:13 +08:00
Qiu Chaofan 5ca250a03d [RegAlloc] Remove addAllocPriorityToGlobalRanges hook
It was introduced in 1a6dc92 and only enabled on PowerPC/AMDGPU. That
should be enabled for all targets.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D108010
2021-08-18 10:21:27 +08:00
jacquesguan a7ebc4d145 [DAGCombiner] Teach isKnownToBeAPowerOfTwo handle SPLAT_VECTOR
Make DAGCombine turn mul by power of 2 into shl for scalable vector.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D107883
2021-08-18 10:10:40 +08:00
Wang, Pengfei 2379949aad [X86] AVX512FP16 instructions enabling 3/6
Enable FP16 conversion instructions.

Ref.: https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D105265
2021-08-18 09:03:41 +08:00
Philip Reames 94d0914292 [runtimeunroll] Support multiple exits to latch exit w/epilogue loop
This patch extends the runtime unrolling infrastructure to support unrolling a loop with multiple exiting blocks branching to the same exit block used by the latch. It intentionally does not include a cost model change to enable this functionality unless appropriate force flags are used.

I decided to restrict this to the epilogue case. Given the changes ended up being pretty generic, we may be able to unblock the prolog case too, but I want to do that in a separate change to reduce the amount of code we all have to understand at one time.

Differential Revision: https://reviews.llvm.org/D107381
2021-08-17 17:52:04 -07:00
Florian Mayer 8f750e8814 [hwasan] [NFC] pull out helper function.
Reviewed By: hctim

Differential Revision: https://reviews.llvm.org/D107334
2021-08-17 23:31:47 +01:00
Mark Danial 4018d25da8 LoopNest Analysis expansion to return instructions that prevent a Loop
Nest from being perfect

Expand LoopNestAnalysis to return the full list of instructions that
cause a loop nest to be imperfect. This is useful for other passes to
know if they should continue for in the inner loops.
Added New function getInterveningInstructions
that returns a small vector with the instructions that prevent a loop
for being perfect. Also added a couple of helper functions to reduce
code duplication.

Reviewed By: Whitney

Differential Revision: https://reviews.llvm.org/D107773
2021-08-17 22:25:49 +00:00
Arthur Eubanks 16890e0040 [GlobalOpt] Check stored once value's type before setting global initializer
In the provided test case, we were trying to set the global's
initializer to `i32* null` when the global's value type was `@0`.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D108232
2021-08-17 14:34:29 -07:00
Simon Pilgrim 359cfa2af7 [X86] EmitInstrWithCustomInserter - silence uninitialized variable warnings. NFCI.
Consistently use llvm_unreachable for the default case in the inner switch statements.
2021-08-17 22:01:07 +01:00
Adrian Prantl 8ae5e0b154 Add missing nullptr check
Unfortunatley the IR Verifier doesn't reject debug intrinsics that
have nullptr as arguments, so coro::salvageDebugInfo for now also
needs to deal with them.

rdar://81979541
2021-08-17 13:59:52 -07:00
Nikita Popov e918ba6958 [LICM] Drop -licm-n2-threshold option
This was a diagnostic option used to demonstrate a weakness in
the AST-based LICM implementation. This problem does not exist
in the MSSA-based LICM implementation, which has been enabled
for a long time now. As such, this option is no longer relevant.
2021-08-17 22:41:31 +02:00
Nikita Popov f58a642da1 [PassBuilder] Use loop-mssa for licm
Currently specifying -licm or -passes=licm will implicitly create
-passes=loop(licm). This does not match the intended default (used
by the legacy PM and by the default pipeline) of using the
MemorySSA-based LICM implementation. As I plan to drop the non-MSSA
implementation, this will stop working entirely...

This special-cases licm to create a loop-mssa manager instead. At
this point it's still possible to use -passes='loop(licm)' to opt
into the AST-based implementation.

Differential Revision: https://reviews.llvm.org/D108155
2021-08-17 21:23:11 +02:00
Sanjay Patel 50c1138796 [InstCombine] add TODO about another min/max fold; NFC
Suggested in post-commit for d0975b7cb0
2021-08-17 14:14:25 -04:00
Craig Topper 8f6cea43e7 [RISCV] Use RISCV::RVVBitsPerBlock for RGK_ScalableVector in getRegisterBitWidth.
I might be wrong, but I think this is should be width of the known
min size we use for scalable vectors. It shouldn't scale with
minimum vlen.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D107945
2021-08-17 11:13:15 -07:00
Simon Pilgrim d7f288502f SelectionDAGBuilder::visitInlineAsm - don't dereference dyn_cast<> results.
dyn_cast<> can return nullptr if the cast is illegal, use cast<> instead which will assert that the cast is correct.

Fixes static analyser warning.
2021-08-17 18:40:59 +01:00
Simon Pilgrim caff2acae1 [AArch64] AArch64DAGToDAGISel::tryReadRegister/tryWriteRegister - don't dereference dyn_cast<> results.
dyn_cast<> can return nullptr if the cast is illegal, use cast<> instead which will assert that the cast is correct.

Fixes static analyser warnings.
2021-08-17 18:40:59 +01:00
Simon Pilgrim 1e770f0388 [ARM] ARMDAGToDAGISel::tryReadRegister/tryWriteRegister - don't dereference dyn_cast<> results.
dyn_cast<> can return nullptr if the cast is illegal, use cast<> instead which will assert that the cast is correct.

Fixes static analyser warnings.
2021-08-17 18:40:59 +01:00
Simon Pilgrim fb81271e8b [AMDGPU] Fix lowering of AMDGPU::G_CTTZ_ZERO_UNDEF to AMDGPU::G_AMDGPU_FFBL_B32
As mentioned on D107474, there was a copy+paste typo repeating G_CTLZ_ZERO_UNDEF that coverity reported as dead code.

Differential Revision: https://reviews.llvm.org/D108210
2021-08-17 18:09:57 +01:00
Fraser Cormack f3e9047249 [VP] Add vector-predicated reduction intrinsics
This patch adds vector-predicated ("VP") reduction intrinsics corresponding to
each of the existing unpredicated `llvm.vector.reduce.*` versions. Unlike the
unpredicated reductions, all VP reductions have a start value. This start value
is returned when the no vector element is active.

Support for expansion on targets without native vector-predication support is
included.

This patch is based on the ["reduction
slice"](https://reviews.llvm.org/D57504#1732277) of the LLVM-VP reference patch
(https://reviews.llvm.org/D57504).

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D104308
2021-08-17 17:56:35 +01:00
Joseph Huber 339aa76526 [OpenMP][NFC] Add option to print module after OpenMPOpt for debugging
This patch adds an extra option to print the module after running one of
the OpenMPOpt passes if debugging is enabled. This makes it much easier
to inspect the effects of this pass when doing debugging.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D108146
2021-08-17 12:46:10 -04:00
Philip Reames 982da7a20c [SCEVExpander] Stop hoisting IR when reusing phis
his is a fix for PR43678, and is an alternate patch to D105723.

The basic issue we're running into is that LSR + SCEVExpander are moving the very instruction whose operand we're in the process of expanding. This breaks the subtle and ill-documented invariant which let LSR work. (Full story can be found here: https://reviews.llvm.org/D105723#2878473)

Rather than attempting a fix, this change just removes the optimization entirely. The code is entirely untested, and removing it appears to have no impact I can find.  This code was added back in 2014 by 1e12f8563d with a single test which does not seem to actually test the hoisting logic.

From a philosophical standpoint, it also seems very strange to have the expander implementing optimizations which should live in a dedicated transform pass.

Differential Revision: https://reviews.llvm.org/D106178
2021-08-17 09:38:32 -07:00
Fangrui Song 78cb1adc5c [Object] Move llvm-nm's symbol version utility to ELFObjectFile::readDynsymVersions
The utility can be reused by llvm-objdump -T.

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D108096
2021-08-17 09:06:39 -07:00
Roman Lebedev 2078c4ecfd
[X86] Lower insertions into upper half of an 256-bit vector as broadcast+blend (PR50971)
Broadcast is not worse than extract+insert of subvector.
https://godbolt.org/z/aPq98G6Yh

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D105390
2021-08-17 18:45:10 +03:00
Tozer 5c6f748cbc [MCParser] Correctly handle CRLF line ends when consuming line comments
Fixes issue: https://bugs.llvm.org/show_bug.cgi?id=47983

The AsmLexer currently has an issue with lexing line comments in files
with CRLF line endings, in which it reads the carriage return as being
part of the line comment. This causes an error for certain valid comment
layouts; this patch fixes this by excluding the carriage return from the
line comment.

Differential Revision: https://reviews.llvm.org/D90234
2021-08-17 15:52:51 +01:00
Kazu Hirata a14920c002 [Bitcode] Remove unused declaration writeBitcodeHeader (NFC)
The corresponding definition was removed on Nov 29, 2016 in commit
5a0a2e648c.
2021-08-17 07:10:51 -07:00
Dylan Fleming ef198cd99e [SVE] Remove usage of getMaxVScale for AArch64, in favour of IR Attribute
Removed AArch64 usage of the getMaxVScale interface, replacing it with
the vscale_range(min, max) IR Attribute.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D106277
2021-08-17 14:42:47 +01:00
David Green 52e0cf9d61 [ARM] Enable subreg liveness
This enables subreg liveness in the arm backend when MVE is present,
which allows the register allocator to detect when subregister are
alive/dead, compared to only acting on full registers. This can helps
produce better code on MVE with the way MQPR registers are made up of
SPR registers, but is especially helpful for MQQPR and MQQQQPR
registers, where there are very few "registers" available and being able
to split them up into subregs can help produce much better code.

Differential Revision: https://reviews.llvm.org/D107642
2021-08-17 14:10:33 +01:00
David Green 62e892fa2d [ARM] Add MQQPR and MQQQQPR spill and reload pseudo instructions
As a part of D107642, this adds pseudo instructions for MQQPR and
MQQQQPR register classes, that can spill and reloads entire registers
whilst keeping them combined, not splitting them into multiple D subregs
that a VLDMIA/VSTMIA would use. This can help certain analyses, and
helps to prevent verifier issues with subreg liveness.
2021-08-17 13:51:34 +01:00
Sanjay Patel e73f4e1123 [InstCombine] remove unused function argument; NFC 2021-08-17 08:10:42 -04:00
Sanjay Patel d0975b7cb0 [InstCombine] fold signed min/max intrinsics with negated operands
If both operands are negated, we can invert the min/max and do
the negation after:
smax (neg nsw X), (neg nsw Y) --> neg nsw (smin X, Y)
smin (neg nsw X), (neg nsw Y) --> neg nsw (smax X, Y)

This is visible as a remaining regression in D98152. I don't see
a way to generalize this for 'unsigned' or adapt Negator to
handle it. This only appears to be safe with 'nsw':
https://alive2.llvm.org/ce/z/GUy1zJ

Differential Revision: https://reviews.llvm.org/D108165
2021-08-17 08:10:42 -04:00
Sebastian Neubauer fbae34635d [GlobalISel] Add combine for PTR_ADD with regbanks
Combine two G_PTR_ADDs, but keep the register bank of the constant.
That way, the combine can be used in post-regbank-select combines.

Introduce two helper methods in CombinerHelper, getRegBank and
setRegBank that get and set an optional register bank to a register.
That way, they can be used before and after register bank selection.

Differential Revision: https://reviews.llvm.org/D103326
2021-08-17 13:58:16 +02:00
Tiehu Zhang 9cfa9b44a5 [CodeGenPrepare] The instruction to be sunk should be inserted before its user in a block
In current implementation, the instruction to be sunk will be inserted before the target instruction without considering the def-use tree,
which may case Instruction does not dominate all uses error. We need to choose a suitable location to insert according to the use chain

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D107262
2021-08-17 18:58:15 +08:00
Jeremy Morse 708cbda577 [DebugInfo][InstrRef] Honour too-much-debug-info cutouts
This reapplies 54a61c94f9, its follow up in 547b712500, which were
reverted 95fe61e639. Original commit message:

VarLoc based LiveDebugValues will abandon variable location propagation if
there are too many blocks and variable assignments in the function. If it
didn't, and we had (say) 1000 blocks and 1000 variables in scope, we'd end
up with 1 million DBG_VALUEs just at the start of blocks.

Instruction-referencing LiveDebugValues should honour this limitation too
(because the same limitation applies to it). Hoist the relevant command
line options into LiveDebugValues.cpp and pass it down into the
implementation classes as an argument to ExtendRanges. I've duplicated all
the run-lines in live-debug-values-cutoffs.mir to have an
instruction-referencing flavour.

Differential Revision: https://reviews.llvm.org/D107823
2021-08-17 11:34:49 +01:00
Simon Pilgrim 895ed64009 [AArch64] LowerCONCAT_VECTORS - merge getNumOperands() calls. NFCI.
Improves on the unused variable fix from rG4357562067003e25ab343a2d67a60bd89cd66dbf
2021-08-17 11:23:03 +01:00
Anton Afanasyev 1f3e35b6d1 [AggressiveInstCombine] Add shift left instruction to `TruncInstCombine` DAG
Add `shl` instruction to the DAG post-dominated by `trunc`, allowing
TruncInstCombine to reduce bitwidth of expressions containing left shifts.

The only thing we need to check is that the target bitwidth must be wider
than the maximal shift amount: https://alive2.llvm.org/ce/z/AwArqu

Part of https://reviews.llvm.org/D107766

Differential Revision: https://reviews.llvm.org/D108091
2021-08-17 12:44:37 +03:00
Bing1 Yu bcec4ccd04 [X86] [AMX] Replace bitcast with specific AMX intrinsics with X86 specific cast.
There is some discussion on the bitcast for vector and x86_amx at https://reviews.llvm.org/D99152. This patch is to introduce a x86 specific cast for vector and x86_amx, so that it can avoid some unnecessary optimization by middle-end. On the other way, we have to optimize the x86 specific cast by ourselves. This patch also optimize the cast operation to eliminate redundant code.

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D107544
2021-08-17 17:04:26 +08:00
David Stuttard ebdb0d09a4 AMDGPU: During img instruction ret value construction cater for non int values
Make sure return type is int type.

Differential Revision: https://reviews.llvm.org/D108131

Change-Id: Ic02f07d1234cd51b6ed78c3fecd2cb1d6acd5644
2021-08-17 09:08:24 +01:00
Kazu Hirata 8f5e9d65d6 [AsmParser] Remove MDConstant (NFC)
The last use was removed on Sep 22, 2016 in commit
fcee2d8001.
2021-08-16 21:21:11 -07:00
Whitney Tsang a41c95c0e3 [LNICM] Fix infinite loop
There is a bug introduced by https://reviews.llvm.org/D107219 which causes an infinite loop, when there are more than 2 levels PHINode chain.

Reviewed By: uint256_t

Differential Revision: https://reviews.llvm.org/D108166
2021-08-17 12:55:22 +09:00
Christudasan Devadasan 686607676f [AMDGPU] Skip pseudo MIs in hazard recognizer
Instructions like WAVE_BARRIER and SI_MASKED_UNREACHABLE
are only placeholders to prevent certain unwanted
transformations and will get discarded during assembly
emission. They should not be counted during nop insertion.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D108022
2021-08-16 23:11:14 -04:00
Carl Ritson 99c790dc21 [AMDGPU] Make BVH isel consistent with other MIMG opcodes
Suffix opcodes with _gfx10.
Remove direct references to architecture specific opcodes.
Add a BVH flag and apply this to diassembly.
Fix a number of disassembly errors on gfx90a target caused by
previous incorrect BVH detection code.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D108117
2021-08-17 10:42:22 +09:00
Hongtao Yu f27fee623d [SamplePGO][NFC] Dump function profiles in order
Sample profiles are stored in a string map which is basically an unordered map. Printing out profiles by simply walking the string map doesn't enforce an order. I'm sorting the map in the decreasing order of total samples to enable a more stable dump, which is good for comparing two dumps.

Reviewed By: wenlei, wlei

Differential Revision: https://reviews.llvm.org/D108147
2021-08-16 17:22:30 -07:00
Arthur Eubanks 0d822da2bd [NFC] Remove/replace some confusing attribute getters on Function 2021-08-16 16:12:37 -07:00
Min-Yih Hsu eec3495a9d [M68k] Do not pass llvm::Function& to M68kCCState
Previously we're passing `llvm::Function&` into `M68kCCState` to lower
arguments in fastcc. However, that reference might not be available if
it's a library call and we only need its argument types. Therefore,
now we're simply passing a list of argument llvm::Type-s.

This fixes PR-50752.

Differential Revision: https://reviews.llvm.org/D108101
2021-08-16 15:33:08 -07:00
Afanasyev Ivan 913b5d2f7a [AsmPrinter] fix nullptr dereference for MBBs with hasAddressTaken property without BB
Basic block pointer is dereferenced unconditionally for MBBs with
hasAddressTaken property.

MBBs might have hasAddressTaken property without reference to BB.
Backend developers must assign fake BB to MBB to workaround this issue
and it should be fixed.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D108092
2021-08-16 15:32:09 -07:00
David Green 9236dea255 [ARM] Create MQQPR and MQQQQPR register classes
Similar to the MQPR register class as the MVE equivalent to QPR, this
adds MQQPR and MQQQQPR register classes for the MVE equivalents of QQPR
and QQQQPR registers. The MVE MQPR seemed have worked out quite well,
and adding MQQPR and MQQQQPR allows us to a little more accurately
specify the number of registers, calculating register pressure limits a
little better.

Differential Revision: https://reviews.llvm.org/D107463
2021-08-16 22:58:12 +01:00
Anshil Gandhi f22ba51873 [Remarks] Emit optimization remarks for atomics generating CAS loop
Implements ORE in AtomicExpand pass to report atomics generating a
compare and swap loop.

Differential Revision: https://reviews.llvm.org/D106891
2021-08-16 14:56:01 -06:00
Stanislav Mekhanoshin 877572cc19 Allow rematerialization of virtual reg uses
Currently isReallyTriviallyReMaterializableGeneric() implementation
prevents rematerialization on any virtual register use on the grounds
that is not a trivial rematerialization and that we do not want to
extend liveranges.

It appears that LRE logic does not attempt to extend a liverange of
a source register for rematerialization so that is not an issue.
That is checked in the LiveRangeEdit::allUsesAvailableAt().

The only non-trivial aspect of it is accounting for tied-defs which
normally represent a read-modify-write operation and not rematerializable.

The test for a tied-def situation already exists in the
/CodeGen/AMDGPU/remat-vop.mir,
test_no_remat_v_cvt_f32_i32_sdwa_dst_unused_preserve.

The change has affected ARM/Thumb, Mips, RISCV, and x86. For the targets
where I more or less understand the asm it seems to reduce spilling
(as expected) or be neutral. However, it needs a review by all targets'
specialists.

Differential Revision: https://reviews.llvm.org/D106408
2021-08-16 12:42:42 -07:00
Rong Xu 9b8425e42c Reapply commit b7425e956
The commit b7425e956: [NFC] fix typos
is harmless but was reverted by accident. Reapply.
2021-08-16 12:18:40 -07:00
Stanislav Mekhanoshin b9e433b02a Prevent machine licm if remattable with a vreg use
Check if a remateralizable nstruction does not have any virtual
register uses. Even though rematerializable RA might not actually
rematerialize it in this scenario. In that case we do not want to
hoist such instruction out of the loop in a believe RA will sink
it back if needed.

This already has impact on AMDGPU target which does not check for
this condition in its isTriviallyReMaterializable implementation
and have instructions with virtual register uses enabled. The
other targets are not impacted at this point although will be when
D106408 lands.

Differential Revision: https://reviews.llvm.org/D107677
2021-08-16 12:09:00 -07:00
Nikita Popov 735a590471 [MemorySSA] Remove -enable-mssa-loop-dependency option
This option has been enabled by default for quite a while now.
The practical impact of removing the option is that MSSA use
cannot be disabled in default pipelines (both LPM and NPM) and
in manual LPM invocations. NPM can still choose to enable/disable
MSSA using loop vs loop-mssa.

The next step will be to require MSSA for LICM and drop the
AST-based implementation entirely.

Differential Revision: https://reviews.llvm.org/D108075
2021-08-16 20:59:37 +02:00
Nikita Popov 570c9beb8e [MemorySSA] Remove unnecessary MSSA dependencies
LoopLoadElimination, LoopVersioning and LoopVectorize currently
fetch MemorySSA when construction LoopAccessAnalysis. However,
LoopAccessAnalysis does not actually use MemorySSA and we can pass
nullptr instead.

This saves one MemorySSA calculation in the default pipeline, and
thus improves compile-time.

Differential Revision: https://reviews.llvm.org/D108074
2021-08-16 20:40:55 +02:00
Nikita Popov 0a031449b2 [PassBuilder] Don't use MemorySSA for standalone LoopRotate passes
Two standalone LoopRotate passes scheduled using
createFunctionToLoopPassAdaptor() currently enable MemorySSA.
However, while LoopRotate can preserve MemorySSA, it does not use
it, so requiring MemorySSA is unnecessary.

This change doesn't have a practical compile-time impact by itself,
because subsequent passes still request MemorySSA.

Differential Revision: https://reviews.llvm.org/D108073
2021-08-16 20:34:18 +02:00
Kostya Kortchinsky 80ed75e7fb Revert "[NFC] Fix typos"
This reverts commit b7425e956b.
2021-08-16 11:13:05 -07:00
Rong Xu b7425e956b [NFC] Fix typos
s/senstive/senstive/g
2021-08-16 10:15:30 -07:00
Jordan Rupprecht 4357562067 [NFC][AArch64] Fix unused var in release build 2021-08-16 10:04:32 -07:00
Paul Robinson 94b4598d77 [PS4] stp[n]cpy not available on PS4 2021-08-16 09:06:52 -07:00
Craig Topper 92abb1cf90 [TypePromotion] Don't mutate the result type of SwitchInst.
SwitchInst should have a void result type.

Add a check to the verifier to catch this error.

Reviewed By: samparker

Differential Revision: https://reviews.llvm.org/D108084
2021-08-16 08:54:34 -07:00
Simon Pilgrim d6fe8d37c6 [DAG] Fold concat_vectors(concat_vectors(x,y),concat_vectors(a,b)) -> concat_vectors(x,y,a,b)
Follow-up to D107068, attempt to fold nested concat_vectors/undefs, as long as both the vector and inner subvector types are legal.

This exposed the same issue in ARM's MVE LowerCONCAT_VECTORS_i1 (raised as PR51365) and AArch64's performConcatVectorsCombine which both assumed concat_vectors only took 2 subvector operands.

Differential Revision: https://reviews.llvm.org/D107597
2021-08-16 16:06:54 +01:00
Jeremy Morse 95fe61e639 Revert 54a61c94f9 and its follow up in 547b712500
These were part of D107823, however asan  has found something excitingly
wrong happening:

https://lab.llvm.org/buildbot/#/builders/5/builds/10543/steps/13/logs/stdio
2021-08-16 15:48:56 +01:00