Commit Graph

160922 Commits

Author SHA1 Message Date
Archibald Elliott b20fe2c25b [docs][AArch64] Label Features with Arm ARM Names
This patch adds the names of the Arm Architecture Reference Manual (ARM)
features to the corresponding Subtarget Features in the AArch64 backend
and target parser.

The aim of this is to make it clearer what architectural features a
subtarget feature might enable (so, which features a CPU must provide to
support that subtarget feature), and so make it easier to add new CPUs
in the future.

Differential Revision: https://reviews.llvm.org/D131257
2022-08-09 18:45:50 +01:00
Adrian Prantl 68f97d2f78 LiveDebugValues: Fix another crash related to unreachable blocks
This is a follow-up patch to D130999. In the test, the MIR contains an
unreachable MBB but the code attempts to look it up in MLocs. This
patch fixes this issue by checking for the default-constructed value.

rdar://97226240

Differential Revision: https://reviews.llvm.org/D131453
2022-08-09 10:34:57 -07:00
Sanjay Patel 926e7312b2 [InstCombine] fold usub.with.overflow to icmp when there's no use of the math value
https://alive2.llvm.org/ce/z/UE48FH

This is part of solving issue #56926.
2022-08-09 13:13:48 -04:00
Sanjay Patel 6bfe5361b7 [InstCombine] add helper function for extract of with-overflow-intrinsic; NFC
We can do more with these patterns, so this block is going to grow.
2022-08-09 12:38:11 -04:00
zhongyunde c2ab65ddaf [IndVars] Eliminate redundant type cast with different sizes
Deal with different sizes between the itofp and fptoi with
trunc or sext/zext, depend on D129756.
Fixes https://github.com/llvm/llvm-project/issues/55505.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D129958
2022-08-09 23:59:42 +08:00
Simon Pilgrim ed162d455a [DAG] Avoid hasOneUse() calls if the cheaper !AssumeSingleUse test has already failed. NFC.
Very minor optimization, but every little helps..
2022-08-09 16:42:19 +01:00
Simon Pilgrim d79e7dc939 [DAG] SimplifyDemandedVectorElts - and/mul(x,y) - if a demanded element of y is known zero then we don't need to demand it in x
This fixes most of the remaining regressions from the fixes in rG293899c64b75
2022-08-09 16:24:08 +01:00
Yaxun (Sam) Liu e780648a15 [AMDGPU] Unify unreachable intrinsics
si-annotate-control-flow does depth first traversal of BB's of
a function to insert amdgcn if intrinsics for conditional
branches so that isel can generate correct instructions later.

si-annotate-control-flow checks whether the successor BB for the 'else'
branch of a conditional branch has been visited. If it has been
visited, si-annotate-control-flow assumes the conditional
branch has been handled and will not try to insert if intrinsic
for it.

This assumption is not correct when the IR contains multiple
unreachable BB's. Then 'if' intrinscs are not inserted and incorrect
ISA are generated.

This patch fixes the issue by let amdgpu-unify-divergent-exit-nodes
unify unreachables even if they are uniformly reached. In this way
the IR will not contain multiple exits, and structurizer is able to
structurize the IR containing one unified exit.

Reviewed by: Ruiling Song, Matt Arsenault

Differential Revision: https://reviews.llvm.org/D131181

Fixes: SWDEV-343244
2022-08-09 10:23:32 -04:00
Nikita Popov f5ed0cb217 [RISCV] Add target feature to force-enable atomics
This adds a +forced-atomics target feature with the same semantics
as +atomics-32 on ARM (D130480). For RISCV targets without the +a
extension, this forces LLVM to assume that lock-free atomics
(up to 32/64 bits for riscv32/64 respectively) are available.

This means that atomic load/store are lowered to a simple load/store
(and fence as necessary), as these are guaranteed to be atomic
(as long as they're aligned). Atomic RMW/CAS are lowered to __sync
(rather than __atomic) libcalls. Responsibility for providing the
__sync libcalls lies with the user (for privileged single-core code
they can be implemented by disabling interrupts). Code using
+forced-atomics and -forced-atomics are not ABI compatible if atomic
variables cross the ABI boundary.

For context, the difference between __sync and __atomic is that the
former are required to be lock-free, while the latter requires a
shared global lock provided by a shared object library. See
https://llvm.org/docs/Atomics.html#libcalls-atomic for a detailed
discussion on the topic.

This target feature will be used by Rust's riscv32i target family
to support the use of atomic load/store without atomic RMW/CAS.

Differential Revision: https://reviews.llvm.org/D130621
2022-08-09 16:04:46 +02:00
gonglingqin cf75ef460c [LoongArch] Add codegen support for ISD::ROTL and ISD::ROTR
Differential Revision: https://reviews.llvm.org/D131231
2022-08-09 19:39:17 +08:00
WANG Xuerui 7d48a9e1ae [LoongArch] Support register-register-addressed GPR loads/stores
Differential Revision: https://reviews.llvm.org/D131380
2022-08-09 19:13:36 +08:00
Alex Richardson 6db15a82cc [ARM] Use getSymbolPreferLocal() in GetARMGVSymbol
This allows relaxing some relocations to STT_SECTION symbol+offset
instead of emitting a relocation against a symbol.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D131433
2022-08-09 09:53:47 +00:00
Alex Richardson 9a2b14afa0 [ARM] Emit local aliases (.Lfoo$local) for functions
ARMAsmPrinter::emitFunctionEntryLabel() was not calling the base class
function so the $local alias was not being emitted. This should not have
any function effect right now since ARM does not generate different code
for the $local symbols, but it could be improved in the future.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D131392
2022-08-09 09:53:47 +00:00
Simon Pilgrim 2724143551 [DAG] canCreateUndefOrPoison - add freeze(ctpop(x)) -> ctpop(freeze(x)) and freeze(parity(x)) -> parity(freeze(x)) support
Both are guaranteed not to create undef/poison
2022-08-09 10:10:29 +01:00
Nikita Popov 4ac00789e1 [RelLookupTableConverter] Bail on invalid pointer size (x32)
The RelLookupTableConverter pass currently only supports 64-bit
pointers.  This is currently enforced using an isArch64Bit() check
on the target triple. However, we consider x32 to be a 64-bit target,
even though the pointers are 32-bit. (And independently of that
specific example, there may be address spaces with different pointer
sizes.)

As such, add an additional guard for the size of the pointers that
are actually part of the lookup table.

Differential Revision: https://reviews.llvm.org/D131399
2022-08-09 09:36:39 +02:00
gonglingqin d3580c2eb6 [LoongArch] Add codegen support for not
Differential Revision: https://reviews.llvm.org/D131384
2022-08-09 14:05:09 +08:00
wanglei 8716513e65 [LoongArch] Implement branch analysis
This allows a number of optimisation passes to work.
E.g. BranchFolding and MachineBlockPlacement.

Differential Revision: https://reviews.llvm.org/D131316
2022-08-09 14:03:09 +08:00
WANG Xuerui f35cb7ba34 [LoongArch] Add codegen support for bswap
Differential Revision: https://reviews.llvm.org/D131352
2022-08-09 13:42:03 +08:00
Luo, Yuanke aaf6c7b05c [globalisel] Select register bank for DBG_VALUE
The register operand of DBG_VALUE is not selected to a proper register
bank in both AArch64 and X86. This would cause getRegClass crash after
global ISel. After discussion, we think the MIR should assume all
vritual register should be set proper register class after global ISel,
so this patch is to fix the gap of DBG_VALUE for AArch64 and X86.

Differential Revision: https://reviews.llvm.org/D129037
2022-08-09 13:11:51 +08:00
Chen Zheng 22e475f5ac [NFC] fix warning 2022-08-09 00:22:01 -04:00
Yuta Mukai 5357dd2f43 [MachinePipeliner] Fix Phi generation failure for large stages
The previous code overwrites VRMap for prologue stages during Phi
generation if a register spans many stages.
As a result, the wrong register is used as the one coming from
the prologue in Phis at later stages. (A process exists to correct
this, but it does not work in all cases.)
In addition, VRMap for prologue must be preserved until addBranches().

This patch fixes them by separating the map for Phis into a different
variable (VRMapPhi).

Reviewed By: bcahoon

Differential Revision: https://reviews.llvm.org/D127840
2022-08-09 13:14:26 +09:00
Yuta Mukai 3f561996bf [AArch64] Fix and add A64FX scheduling resource/latency info
1. Missing instruction information (FTSSEL, FMSB, PFIRST and RDFFR)
   is added and CompleteModel is set to one.

2. Information for pseudo SVE instructions is added. Those
   instructions are present at the time of scheduling.

3. Resource and latency information for SVE instructions is modified
   to be more accurate.
   For example, the description for CMPEQ, which consumes one cycle
   each of unit FLA and PPR, is as follows.
```
Previous:
  def A64FXGI01 : ProcResGroup<[A64FXIPFLA, A64FXIPPR]>;
  def A64FXWrite_4Cyc_GI01 : SchedWriteRes<[A64FXGI01]> {...
Modified:
  def A64FXGI0 : ProcResGroup<[A64FXIPFLA]>;
  def A64FXGI1 : ProcResGroup<[A64FXIPPR]>;
  def A64FXWrite_CMP : SchedWriteRes<[A64FXGI0, A64FXGI1]> {...
```

Reference: A64FX Microarchitecture Manual (Table 16-3)
https://github.com/fujitsu/A64FX/blob/master/doc/A64FX_Microarchitecture_Manual_en_1.7.pdf

Reviewed By: dmgreen, kawashima-fj

Differential Revision: https://reviews.llvm.org/D131165
2022-08-09 10:53:40 +09:00
Chen Zheng d9004dfbab [PowerPC] mapping hardward loop intrinsics to powerpc pseudo
Map hardware loop intrinsics loop_decrement and set_loop_iteration
to the new PowerPC pseudo instructions, so that the hardware loop
intrinsics will be expanded to normal cmp+branch form or ctrloop
form based on the CTR register usage on MIR level.

Reviewed By: lkail

Differential Revision: https://reviews.llvm.org/D123366
2022-08-08 21:34:20 -04:00
Martin Storsjö 0c52ab3968 [MC] [Win64EH] Fix the calculation of the end of epilogs
Exclude the terminating end opcode from the epilog - it doesn't
correspond to an actual instruction that is included in the epilog
itself (within the .seh_startepilogue/.seh_endepilogue range).

In most (all?) cases, an epilog is followed by a matching terminating
instruction though (a ret or a branch to a tail call), but it's not
strictly within the .seh_startepilogue/.seh_endepilogue range.

This fixes a number of failed asserts in cases where the codegen
has incorrectly reoredered SEH opcodes so they don't match up
exactly with their instructions.

However this still just avoids failing the assertion; the root cause
of generating unexpected epilogs is still present (and fixing that is
a less obvious issue).

Differential Revision: https://reviews.llvm.org/D131393
2022-08-08 23:03:17 +03:00
Fangrui Song de9d80c1c5 [llvm] LLVM_FALLTHROUGH => [[fallthrough]]. NFC
With C++17 there is no Clang pedantic warning or MSVC C5051.
2022-08-08 11:24:15 -07:00
Ruobing Han f756f06cc4 [SimpleLoopUnswitch] Skip non-trivial unswitching of cold loops
With profile data, non-trivial LoopUnswitch will only apply on non-cold loops, as unswitching cold loops may not gain much benefit but significantly increase the code size.

Reviewed By: aeubanks, asbirlea

Differential Revision: https://reviews.llvm.org/D129599
2022-08-08 18:12:04 +00:00
Daniel Thornburgh bf48b128b0 [Symbolizer] Implement pc element in symbolizing filter.
Implements the pc element for the symbolizing filter, including it's
"ra" and "pc" modes. Return addresses ("ra") are adjusted by
decrementing one. By default, {{{pc}}} elements are assumed to point to
precise code ("pc") locations. Backtrace elements will adopt the
opposite convention.

Along the way, some minor refactors of value printing and colorization.

Reviewed By: peter.smith

Differential Revision: https://reviews.llvm.org/D131115
2022-08-08 11:08:48 -07:00
Vang Thao 257251247a [SROA] Try harder to find a vector promotion viable type when rewriting
We are seeing significant performance loss when an alloca fails to get promoted
to register. I have observed that this is due to the common type found when
attempting to rewrite partition users being unviable for promotion. While if we
would have continue looking for a type, we would have found a subtype in the
original allocated type that would have enabled promotion. Thus first check if
the initial common type found is promotion viable and if not then continue
looking instead of stopping with the initial common type found.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D128073
2022-08-08 11:04:01 -07:00
Craig Topper e2bfbed2bb [RISCV] Add ReadFStoreData as a SchedRead.
The floating point stores use a different register class, it
probably makes sense to have a different SchedRead.

Reviewed By: monkchiang

Differential Revision: https://reviews.llvm.org/D131379
2022-08-08 09:33:19 -07:00
Simon Pilgrim 6f2bee667a [DAG] canCreateUndefOrPoison - add freeze(bswap(x)) -> bswap(freeze(x)) and freeze(bitreverse(x)) -> bitreverse(freeze(x)) support
Both are guaranteed not to create undef/poison
2022-08-08 17:27:17 +01:00
Simon Pilgrim e4b2c52420 [DAG] canCreateUndefOrPoison - add freeze(sext(x)) -> sext(freeze(x)) and freeze(zext(x)) -> zext(freeze(x)) support
Both are guaranteed not to create undef/poison
2022-08-08 16:43:40 +01:00
Arthur Eubanks 81c4e58e2a [StandardInstrumentations] Handle case where block order changes
Previously we'd go off the end of the BI iterator because we expected
that the relative positions of common blocks before and after were
consistent. That's not always true though, for example with
jump-threading.

Reviewed By: jamieschmeiser

Differential Revision: https://reviews.llvm.org/D130596
2022-08-08 07:41:39 -07:00
Krzysztof Parzyszek 0f5385b70e Recommit [RDF] Remove explicit template arguments from Print
The build breakages should be addressed by d4abdd2e3d:
  [CMake] Check CMAKE_CXX_STANDARD and error if it's to old

Thanks to Tobias and Roy for addressing these issues.
2022-08-08 07:28:45 -07:00
Simon Pilgrim 9641a201a5 [DAG] Add initial SelectionDAG::canCreateUndefOrPoison support
This patch adds basic support for a DAG variant of the canCreateUndefOrPoison call and updates DAGCombiner::visitFREEZE to use it, further Opcodes (including target specific Opcodes) can be handled when we have test coverage.

So far, I've left visitFREEZE to just use this for unary nodes (which currently means the existing BITCAST/FREEZE cases) - later patches will add other unary opcodes (with test coverage) and we can also refactor visitFREEZE to support a general number of operands like we do in InstCombinerImpl::pushFreezeToPreventPoisonFromPropagating.

I'm not aware of any vector test freeze coverage so the DemandedElts (and the Depth) args are not being used yet - but they are in place. Similarly we will be able to handle poison generating SDNodeFlags as and when it becomes an issue.

Part of the work for D106675 / PR50468

Differential Revision: https://reviews.llvm.org/D130646
2022-08-08 15:16:06 +01:00
Tobias Hieta 576375a2d6
[LLD][COFF] Ignore DEBUG_S_XFGHASH_TYPE/VIRTUAL
These are new debug types that ships with the latest
Windows SDK and would warn and finally fail lld-link.

The symbols seems to be related to Microsoft's XFG
which is their version of CFG. We can't handle any of
this yet, so for now we can just ignore these types
so that lld doesn't fail with a new version of Windows
SDK.

Fixes: #56285

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D129378
2022-08-08 15:53:52 +02:00
Simon Pilgrim 9ea54ac9ce [X86] X86ISelDAGToDAG.cpp - use auto for all values derived from cast/dyn_cast (style). NFC. 2022-08-08 14:35:06 +01:00
Denis Antrushin 36cc533471 [EarlyCSE][OpaquePointers]Replace assert with return for mask type check.
When EarlyCSE tries to common vector masked loads/stores, it first checks that
they have same base operand and then assumes that this is enough for mask types
to be equal. This is true for typed pointers but false for opaque ones -
two loads of different vector sizes from same base pointer '%b' are the same,
`ptr %b`. (For typed pointers, `%b` was cast to vector pointer type so bases
were different).
Change assert to return from lambda `isSubmask` so this transformation properly
works with opaque pointers.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D131251
2022-08-08 16:14:42 +03:00
Simon Pilgrim b334709467 Remove superfluous ; outside of a function 2022-08-08 12:14:03 +01:00
Shubham Narlawar ab4fc87a9d [DAG] Emit table lookup from TargetLowering::expandCTTZ()
This patch emits table lookup in expandCTTZ.

Context -
https://reviews.llvm.org/D113291 transforms set of IR instructions to
cttz intrinsic but there are some targets which does not support CTTZ or
CTLZ. Hence, I generate a table lookup in TargetLowering::expandCTTZ().

Differential Revision: https://reviews.llvm.org/D128911
2022-08-08 12:08:05 +01:00
Simon Pilgrim e5e93b6130 [DAG] FoldConstantArithmetic - add initial support for undef elements in bitcasted binop constant folding
FoldConstantArithmetic can fold constant vectors hidden behind bitcasts (e.g. vXi64 -> v2Xi32 on 32-bit platforms), but currently bails if either vector contains undef elements. These undefs can often occur due to SimplifyDemandedBits/VectorElts calls recognising that the upper bits are often unnecessary (e.g. funnel-shift/rotate implicit-modulo and AND masks).

This patch adds a basic 'FoldValueWithUndef' handler that will attempt to constant fold if one or both of the ops are undef - so far this just handles the AND and MUL cases where we always fold to zero.

The RISCV codegen increase is interesting - it looks like the BUILD_VECTOR lowering was loading a constant pool entry but now (with all elements defined constant) it can materialize the constant instead?

Differential Revision: https://reviews.llvm.org/D130839
2022-08-08 11:53:56 +01:00
Simon Tatham 72017e9b16 [llvm-objdump,ARM] Fix big-endian AArch32 disassembly.
The ABI for big-endian AArch32, as specified by AAELF32, is above-
averagely complicated. Relocatable object files are expected to store
instruction encodings in byte order matching the ELF file's endianness
(so, big-endian for a BE ELF file). But executable images can
//either// do that //or// store instructions little-endian regardless
of data and ELF endianness (to support BE32 and BE8 platforms
respectively). They signal the latter by setting the EF_ARM_BE8 flag
in the ELF header.

(In the case of the Thumb instruction set, this all means that each
16-bit halfword of a Thumb instruction is stored in one or other
endianness. The two halfwords of a 32-bit Thumb instruction must
appear in the same order no matter what, because the first halfword is
the one that must avoid overlapping the encoding of any 16-bit Thumb
instruction.)

llvm-objdump was unconditionally expecting Arm instructions to be
stored little-endian. So it would correctly disassemble a BE8 image,
but if you gave it a BE32 image or a BE object file, it would retrieve
every instruction in byte-swapped form and disassemble it to
nonsense. (Even an object file output by LLVM itself, because
ARMMCCodeEmitter outputs instructions big-endian in big-endian mode,
which is correct for writing an object file.)

This patch allows llvm-objdump to correctly disassemble all three of
those classes of Arm ELF file. It does it by introducing a new
SubtargetFeature for big-endian instructions, setting it from the ELF
image type and flags during llvm-objdump setup, and teaching both
ARMDisassembler and llvm-objdump itself to pay attention to it when
retrieving instruction data from a section being disassembled.

Differential Revision: https://reviews.llvm.org/D130902
2022-08-08 10:49:51 +01:00
Anubhab Ghosh 1eee6de873 [Orc][JITLink] Slab based memory allocator to reduce RPC calls
Calling reserve() used to require an RPC call. This commit allows large
ranges of executor address space to be reserved. Subsequent calls to
reserve() will return subranges of already reserved address space while
there is still space available.

Differential Revision: https://reviews.llvm.org/D130392
2022-08-08 15:13:41 +05:30
David Green 061e0189a3 [DAG] Ensure Legal BUILD_VECTOR elements types in shuffle->And combine
D129150 added a combine from shuffles to And that creates a BUILD_VECTOR
of constant elements. We need to ensure that the elements are of a legal
type, to prevent asserts during lowering.

Fixes #56970.

Differential Revision: https://reviews.llvm.org/D131350
2022-08-08 09:47:55 +01:00
Cullen Rhodes a6dec9f5b2 [AArch64][SVE] Add patterns to select masked FP arith
Add patterns to select predicated instructions when lowering:

  fadd(a, select(mask, b, splat(0)))
  fsub(a, select(mask, b, splat(0)))

'fadd' is unsafe unless no-signed zeros fast-math flag is set, since

  -0.0 + 0.0 = 0.0

changes the sign. Alive2: https://alive2.llvm.org/ce/z/wbhJh_

Also adds FMA patterns for:

  fadd(a, select(mask, mul(b, c), splat(0))) -> fmla(a, mask, b, c)
  fsub(a, select(mask, mul(b, c), splat(0))) -> fmla(a, mask, b, c)

These patterns require the 'contract' fast-math flag to be set, and the
fadd 'nsz' as above.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D130564
2022-08-08 08:44:13 +00:00
Carlos Alberto Enciso a3f7a2c183 [CodeView] Add function to get size in bytes for TypeIndex/CVType.
Given a TypeIndex or CVType return its size in bytes. Basically it
is the inverse to 'CodeViewDebug::lowerTypeBasic', that returns a
TypeIndex based in a size.

Reviewed By: rnk, djtodoro

Differential Revision: https://reviews.llvm.org/D129846
2022-08-08 08:48:23 +01:00
Kazu Hirata e20d210eef [llvm] Qualify auto (NFC)
Identified with readability-qualified-auto.
2022-08-07 23:55:27 -07:00
Kazu Hirata 0e37ef0186 [Transforms] Fix comment typos (NFC) 2022-08-07 23:55:24 -07:00
wanglei 0c2b738f8f [LoongArch] Support for varargs
This patch ensures the `$fp` always points to the bottom of the vararg
spill region.
Includes support for expand `ISD::DYNAMIC_STACKALLOC`.

Differential Revision: https://reviews.llvm.org/D130250
2022-08-08 14:01:24 +08:00
Sheng 64d326c33c [M68k] Add MC support for link/unlk
Reviewers: myhsu

Differential Revision: https://reviews.llvm.org/D125444
2022-08-08 11:00:11 +08:00
Jun Zhang 98339ac7af
[Support] move llvm::llvm_is_multithread to header, NFC
This allow optimization without LTO. Also remove some useless else-ifs.
Signed-off-by: Jun Zhang <jun@junz.org>

Differential Revision: https://reviews.llvm.org/D131313
2022-08-08 08:49:02 +08:00
Sanjay Patel 74b5e797d5 [InstSimplify] fold scalable vectors with over-shift splat constant to poison
Fixes #56968
2022-08-07 16:26:05 -04:00
Lang Hames 41c41fcbc0 Revert "[JITLink] Fix some C++17 related fixmes."
This reverts commit 6ea5bf436a.

6ea5bf436a made use of new c++17 rules regarding
order of evaluation (specifically: in function calls the expression naming the
function should be sequenced before the evalution of any operands) to simplify
some continuation-passing calls. Unfortunately this appears to break at least
one MSVC bot: https://lab.llvm.org/buildbot/#/builders/123/builds/12149 .

Includes an update to the comments to note that the workaround is now based on
MSVC limitations, not on LLVM adopting c++17.
2022-08-07 12:15:59 -07:00
Sanjay Patel 8148c28fad [ConstFolding] fix overzealous assert when converting FP half
Fixes #56981
2022-08-07 13:34:51 -04:00
Lang Hames 6ea5bf436a [JITLink] Fix some C++17 related fixmes. 2022-08-07 09:37:14 -07:00
Aaron Ballman 32fd0b7fd5 Revert "[RDF] Remove explicit template arguments from Print"
This reverts commit ede96de751.

This breaks the build on Windows with Visual Studio:
https://lab.llvm.org/buildbot/#/builders/123/builds/12134
2022-08-07 08:24:01 -04:00
Kazu Hirata ba0407ba86 [llvm] Use range-based for loops (NFC) 2022-08-07 00:16:21 -07:00
Kazu Hirata 54199d805a [x86] Remove unused declaration processWaitCnt (NFC)
The declaration was introduced without a corresponding definition on
Jan 2, 2022 in commit 85e6e748d4.
2022-08-07 00:16:19 -07:00
Kazu Hirata d0ec61c9ff [Target] Remove unused forward declarations (NFC) 2022-08-07 00:16:16 -07:00
Kazu Hirata a2d4501718 [llvm] Fix comment typos (NFC) 2022-08-07 00:16:14 -07:00
Kazu Hirata 3b114087c3 [llvm] Drop unnecessary const from return types (NFC)
Identified with readability-const-return-type.
2022-08-07 00:16:11 -07:00
Fangrui Song fa66789d06 [llvm] LLVM_NODISCARD => [[nodiscard]]. NFC
With C++17 there is no Clang pedantic warning.
2022-08-07 00:26:33 +00:00
Fangrui Song 5deb678289 Revert "[SampleProfileInference] Work around odr-use of const non-inline static data member to fix -O0 builds after D120508"
This reverts commit 48c74bb2e2.
With C++17 the workaround is no longer needed.
2022-08-06 16:48:23 -07:00
Krzysztof Parzyszek 2bc390bdd6 [RDF] Use default TargetOperandInfo if not given in constructor
All current in-tree users use the default implementation.
2022-08-06 14:32:52 -05:00
Krzysztof Parzyszek ede96de751 [RDF] Remove explicit template arguments from Print
CTAD takes care of it.
2022-08-06 13:29:15 -05:00
Kazu Hirata c8e6ebd74e Use value instead of getValue (NFC) 2022-08-06 11:21:39 -07:00
Chen Zheng ef60e44fe8 [PowerPC] fix stack size allocated for float point argument
This is for https://github.com/llvm/llvm-project/issues/56469

Allocate 4 bytes for float point arguments on PPC32.

Reviewed By: nemanjai

Differential Revision: https://reviews.llvm.org/D129558
2022-08-06 08:38:52 -04:00
Markus Böck f7b73b7e8e [llvm] Remove uses of deprecated `std::iterator`
std::iterator has been deprecated in C++17 and some standard library implementations such as MS STL or libc++ emit deperecation messages when using the class.
Since LLVM has now switched to C++17 these will emit warnings on these implementations, or worse, errors in build configurations using -Werror.

This patch fixes these issues by replacing them with LLVMs own llvm::iterator_facade_base which offers a superset of functionality of std::iterator.

Differential Revision: https://reviews.llvm.org/D131320
2022-08-06 14:07:37 +02:00
Leon Clark 6a275cd53c Transform illegal intrinsics to V_ILLEGAL
Related tasks:

- SWDEV-240194
- SWDEV-309417
- SWDEV-334876

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D123693
2022-08-06 08:59:00 +01:00
Filipp Zhinkin c55899f763 [DAGCombiner] Hoist funnel shifts from logic operation
Hoist funnel shift from logic op:
logic_op (FSH x0, x1, s), (FSH y0, y1, s) --> FSH (logic_op x0, y0), (logic_op x1, y1), s

The transformation improves code generated for some cases related to
issue https://github.com/llvm/llvm-project/issues/49541.

Reduced amount of funnel shifts can also improve throughput on x86 CPUs by utilizing more
available ports: https://quick-bench.com/q/gC7AKkJJsDZzRrs_JWDzm9t_iDM

Transformation correctness checks:
https://alive2.llvm.org/ce/z/TKPULH
https://alive2.llvm.org/ce/z/UvTd_9
https://alive2.llvm.org/ce/z/j8qW3_
https://alive2.llvm.org/ce/z/7Wq7gE
https://alive2.llvm.org/ce/z/Xr5w8R
https://alive2.llvm.org/ce/z/D5xe_E
https://alive2.llvm.org/ce/z/2yBZiy

Differential Revision: https://reviews.llvm.org/D130994
2022-08-05 17:02:22 -04:00
Lang Hames bc062e034f [ORC] Fix a memory leak in LLVMOrcIRTransformLayerSetTransform.
This function heap-allocates a ThreadSafeModule (the current C bindings assume
that TSMs are always heap-allocated), but was failing to free it.

Should fix http://llvm.org/PR56953.
2022-08-05 13:52:03 -07:00
Vitaly Buka 8d2901d537 [NFC][Inliner] Add Load/Store handler
This is an additional signal which may benefit sanitizers.

Reviewed By: kda

Differential Revision: https://reviews.llvm.org/D131129
2022-08-05 13:42:17 -07:00
Craig Topper 75c64c7c4e [RISCV] Don't use li+sh3add for constants that can use lui+add.
If we're adding a constant that can't use addi we try a few tricks,
one of which is using li+sh3add. We should not do this if lui+add
would work. For example adding 8192. Using sh3add prevents folding
a sext.w to form addw, thus increasing instruction count.
2022-08-05 12:47:03 -07:00
Philip Reames 9a9848f4b9 [RISCVInsertVSETVLI] Remove an unsound optimization
This fixes a bug reported privately by @craig.topper. Here's an example which illustrates the problem:

vsetivli a1, a0, e32, m1, ta, mu # both DefInfo and PrevInfo
vsetivli a2, a1, e32, m4, ta, mu

With the unsound result being:

vsetivli a1, a0, e32, m1, ta, mu
vsetivli a2, a0, e32, m4, ta, mu

Consider the case where this is running on a machine with VLEN=512,. For this case, the VLMAXs are 16 and 64 respectively.

Consider for a0 = 33. The correct result is: a1 = 16, and a2 = 16

After the unsound optimization: a1 = 16 and a2 = 33

This particular example used VLMAXs which differed by more than a power of two. With a difference of only one power of two, there's another form of this bug which involves the AVL < 2 x VLMAX special case, but that ones more complicated to construct as many examples turn out accidentally sound.

This patch takes the approach of simply removing the unsound optimization, but there are multiple sound sub-cases of it. I plan to return to at least a couple of them, but figured it was cleaner to remove the unsound optimization (for ease of backporting), and then review the new optimizations on their own.

Differential Revision: https://reviews.llvm.org/D131264
2022-08-05 12:13:08 -07:00
Zhaoshi Zheng 99e50e5838 [WinEH][ARM64] Split Unwind Info for Fucntions Larger than 1MB
Create function segments and emit unwind info of them.

A segment must be less than 1MB and no prolog or epilog is splitted between two
segments.

This patch should generate correct, though not optimal, unwind info for large
functions. Currently it only generate pacted info (.pdata) only for functions
that are less than 1MB (single-segment functions). This is NFC from before this
patch.

The next step is to enable (.pdata) only unwind info for the first segment or
segments that have neither prolog or epilog in a multi-segment function.

Another future work item is to further split segments that require more than 255
code words or have more than 65535 epilogs.

Reference:
https://docs.microsoft.com/en-us/cpp/build/arm64-exception-handling#function-fragments

Differential Revision: https://reviews.llvm.org/D130049
2022-08-05 11:46:41 -07:00
Sanjay Patel b63fc26d33 [InstSimplify] make uses of isImpliedCondition more efficient (NFCI)
As suggested in the post-commit comments for 019d76196f,
this makes the usage symmetric with the 'and' patterns and should
be more efficient.
2022-08-05 12:06:47 -04:00
Paul Walker 0533c39a76 [SVE] Expand DUPM patterns to handle all integer vector types.
NOTE: i8 vector splats are ignored because the immediate range of
DUP already has full coverage.

Differential Revision: https://reviews.llvm.org/D131078
2022-08-05 16:00:08 +00:00
Sanjay Patel 019d76196f [InstSimplify] use isImpliedCondition() instead of semi-duplicated code
We get a couple of improvements from recognizing swapped
operand patterns that were not handled by the replicated
code.

This should also enable simplifying larger patterns as
seen in issue #56653 and issue #56654, but that requires
enhancements to isImpliedCondition() itself.
2022-08-05 10:59:09 -04:00
Mirko Brkusanin 19bb535ed9 [AMDGPU] Remove unused MIMG tablegen variants
There are no AMDGPUSampleVariant versions for _G16, it is treated more like a
modifier for derivatives (_D) (also for intrinsics where it is overloaded type
instead of part of instrinsic name) so we ended up making more variants for
these instruction then we actually needed.

32-bit derivatives need 6 dwords at most, while 16-bit need 4 at most. Using
same AMDGPUSampleVariant for both, we ended up creating 2 extra variants per
instruction than were necessary.

In total this deletes 260 unused tablegen records.

Differential Revision: https://reviews.llvm.org/D131252
2022-08-05 15:30:47 +02:00
Dawid Jurczak 1bd31a6898 [NFC] Add SmallVector constructor to allow creation of SmallVector<T> from ArrayRef of items convertible to type T
Extracted from https://reviews.llvm.org/D129781 and address comment:
https://reviews.llvm.org/D129781#3655571

Differential Revision: https://reviews.llvm.org/D130268
2022-08-05 13:35:41 +02:00
David Green b2de84633a [ConstProp] Don't fallthorugh for poison constants on vctp and active_lane_mask.
Given a poison constant as input, the dyn_cast to a ConstantInt would
fail so we would fall through to the generic code that attempts to fold
each element of the input vectors. The inputs to these intrinsics are
not vectors though, leading to a compile time crash. Instead bail out
properly for poison values by returning nullptr. This doesn't try to
define what poison means for these intrinsics.

Fixes #56945
2022-08-05 11:19:36 +01:00
David Spickett c401dbde71 [llvm][IROutliner] Account for return void in sort comparator
This fixes 69 llvm tests that failed when EXPENSIVE_CHECKS was enabled.
llvm/test/Transforms/IROutliner/outlining-commutative-operands-opposite-order.ll
is one example.

When we have EXPENSIVE_CHECKS, _GLIBCXX_DEBUG is defined. This means
that libstdc++ will call the compare function to check if it is
implemented correctly (that !(a < a) is true).

This happens even if there is only one item and here, we expect
to see one return void or multiple return constant integer.

Don't sort if we have 1 item, but do assert that it is the 1
ret void we expect. In the comparator, assert that neither
Value is a nullptr in case one ended up in a the list somehow.

Reviewed By: AndrewLitteken

Differential Revision: https://reviews.llvm.org/D130230
2022-08-05 09:36:43 +00:00
Phoebe Wang 2312b747b8 [X86] Move getting module flag into `runOnMachineFunction` to reduce compile-time. NFCI
Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D131245
2022-08-05 01:58:17 -07:00
wanglei 57eb77d411 [LoongArch] Implement more of the ABI
According to the description of the LoongArch abi documentation,
(https://loongson.github.io/LoongArch-Documentation/LoongArch-ELF-ABI-EN.html)
the calling convention of LoongArch is almost the same as the RISCV's
(except for the vector part), so we borrow the implementation of RISCV.

This patch only guarantees the correctness of lp64d, because only the
part of lp64d is described in detail in the documentation.

Differential Revision: https://reviews.llvm.org/D130249
2022-08-05 15:14:16 +08:00
Chuanqi Xu 230d6f93aa [Coroutines] Remove lifetime intrinsics for spliied allocas in coroutine frames
Closing https://github.com/llvm/llvm-project/issues/56919

It is meaningless to preserve the lifetime markers for the spilled
allocas in the coroutine frames and it would block some optimizations
too.
2022-08-05 14:50:43 +08:00
David Green 38c2366b3f [AArch64][GlobalISel] Recognise some CCMPri
This is a simple addition to emitConditionalComparison, to match CCMP
with immediates using getIConstantVRegValWithLookThrough, letting it
select the CCMPri variants of the instructions.

Differential Revision: https://reviews.llvm.org/D131073
2022-08-05 07:48:42 +01:00
Paul Kirth a812b39e8c [llvm][ir] Add missing license to ProfDataUtils
We failed to add these in D128860 or D128858

Reviewed By: davidxl

Differential Revision: https://reviews.llvm.org/D131226
2022-08-05 03:39:13 +00:00
Phoebe Wang 7f648d27a8 Reland "[X86][MC] Always emit `rep` prefix for `bsf`"
`BMI` new instruction `tzcnt` has better performance than `bsf` on new
processors. Its encoding has a mandatory prefix '0xf3' compared to
`bsf`. If we force emit `rep` prefix for `bsf`, we will gain better
performance when the same code run on new processors.

GCC has already done this way: https://c.godbolt.org/z/6xere6fs1

Fixes #34191

Reviewed By: craig.topper, skan

Differential Revision: https://reviews.llvm.org/D130956
2022-08-05 10:22:48 +08:00
Fangrui Song 7d6017fd31 [TTI] Change new getVectorInstrCost overload to use const reference after D131114
A const reference is preferred over a non-null const pointer.
`Type *` is kept as is to match the other overload.

Reviewed By: davidxl

Differential Revision: https://reviews.llvm.org/D131197
2022-08-04 15:16:51 -07:00
Sanjay Patel 8e7acb670b [ValueTracking] improve readability in isImpliedCond helper functions; NFC
This matches the caller code naming scheme and avoids the
potentially confusing transition from left/right to A/B.
2022-08-04 17:43:31 -04:00
Craig Topper 12a1ca9c42 [RISCV] Relax another one use restriction in performSRACombine.
When folding (sra (add (shl X, 32), C1), 32 - C) -> (shl (sext_inreg (add X, C1), i32), C)
it's possible that the add is used by multiple sras. We should
allow the combine if all the SRAs will eventually be updated.

After transforming all of the sras, the shls will share a single
(sext_inreg (add X, C1), i32).

This pattern occurs if an sra with 32 is used as index in multiple
GEPs with different scales. The shl from the GEPs will be combined
with the sra before we get a chance to match the sra pattern.
2022-08-04 14:32:31 -07:00
Sanjay Patel 657bfa364f [ValueTracking] reduce code in isImpliedCondICmps; NFC
This copies the implementation of the subsequent match with constants.
2022-08-04 17:03:42 -04:00
Arthur Eubanks 6e45162adf [InstrProf] Set prof global variables to internal linkage if adding a comdat
COFF has a verifier check that private global variables don't have a comdat of the same name.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D131043
2022-08-04 13:24:55 -07:00
Mingming Liu bc8f2f3649 [AArch64][TTI][NFC] Overload method 'getVectorInstrCost' to provide vector instruction itself, as a context information for cost estimation.
1) Overloaded (instruction-based) method is a wrapper around the current (opcode-based) method.
2) This patch also changes a few callsites (VectorCombine.cpp,
   SLPVectorizer.cpp, CodeGenPrepare.cpp) to call the overloaded method.
3) This is a split of D128302.

Differential Revision: https://reviews.llvm.org/D131114
2022-08-04 12:58:25 -07:00
Johannes Doerfert f81a209337 [Attributor][FIX] Deal with implicit `undef` in AAPotentialConstantValues.
In contrast to AAPotentialValues, the constant values version can
contain implicit `undef` in the set. We had an assertion that could
misfire before. Handle it properly now.
2022-08-04 14:44:51 -05:00
Craig Topper a2de12c987 [RISCV] Relax a one use restriction performSRACombine
When folding (sra (add (shl X, 32), C1), 32 - C) -> (shl (sext_inreg (add X, C1), C)
ignore the use count on the (shl X, 32).

The sext_inreg after the transform is free. So we're only making
2 new instructions, the add and the shl. So we only need to be
concerned with replacing the original sra+add. The original shl
can have other uses. This helps if there are multiple different
constants being added to the same shl.
2022-08-04 11:25:08 -07:00
Daniel Thornburgh 22df238d4a [Symbolizer] Implement data symbolizer markup element.
This connects the Symbolizer to the markup filter and enables the first
working end-to-end flow using the filter.

Reviewed By: peter.smith

Differential Revision: https://reviews.llvm.org/D130187
2022-08-04 10:20:29 -07:00
Ellis Hoag 12e78ff881 [InstrProf] Add the skipprofile attribute
As discussed in [0], this diff adds the `skipprofile` attribute to
prevent the function from being profiled while allowing profiled
functions to be inlined into it. The `noprofile` attribute remains
unchanged.

The `noprofile` attribute is used for functions where it is
dangerous to add instrumentation to while the `skipprofile` attribute is
used to reduce code size or performance overhead.

[0] https://discourse.llvm.org/t/why-does-the-noprofile-attribute-restrict-inlining/64108

Reviewed By: phosek

Differential Revision: https://reviews.llvm.org/D130807
2022-08-04 08:45:27 -07:00
Joshua Cranmer 2138c90645 [IR] Move support for dxil::TypedPointerType to LLVM core IR.
This allows the construct to be shared between different backends. However, it
still remains illegal to use TypedPointerType in LLVM IR--the type is intended
to remain an auxiliary type, not a real LLVM type. So no support is provided for
LLVM-C, nor bitcode, nor LLVM assembly (besides the bare minimum needed to make
Type->dump() work properly).

Reviewed By: beanz, nikic, aeubanks

Differential Revision: https://reviews.llvm.org/D130592
2022-08-04 10:41:11 -04:00
Lorenzo Albano 74940d2668 [VP] Add widening for VP_STRIDED_LOAD and VP_STRIDED_STORE
Reviewed By: frasercrmck, craig.topper

Differential Revision: https://reviews.llvm.org/D121114
2022-08-04 16:12:01 +02:00
Martin Storsjö 46bc1b5689 [ORC] Actually propagate memory unmapping errors on Windows
This fixes warnings like these:

    ../lib/ExecutionEngine/Orc/MemoryMapper.cpp:364:9: warning: ignoring return value of function declared with 'warn_unused_result' attribute [-Wunused-result]
            joinErrors(std::move(Err),
            ^~~~~~~~~~ ~~~~~~~~~~~~~~~

Differential Revision: https://reviews.llvm.org/D131056
2022-08-04 11:14:52 +03:00
Martin Storsjö 46196db4d3 [ORC] Fix a warning about an unused variable on Windows. NFC.
Differential Revision: https://reviews.llvm.org/D131055
2022-08-04 11:14:52 +03:00
wanglian b6b0690355 [LegalizeTypes][VP] Add split operand support for VP float and integer casting
Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D130685
2022-08-04 15:41:50 +08:00
jacquesguan b61cfc91ea [RISCV] Add cost modelling for vector widenning reduction.
In RVV, we use vwredsum.vs and vwredsumu.vs for vecreduce.add(ext(Ty A)) if the result type's width is twice of the input vector's SEW-width. In this situation, the cost of extended add reduction should be same as single-width add reduction. So as the vector float widenning reduction.

Differential Revision: https://reviews.llvm.org/D129994
2022-08-04 15:31:31 +08:00
Phoebe Wang 6f867f9102 [X86] Support ``-mindirect-branch-cs-prefix`` for call and jmp to indirect thunk
This is to address feature request from https://github.com/ClangBuiltLinux/linux/issues/1665

Reviewed By: nickdesaulniers, MaskRay

Differential Revision: https://reviews.llvm.org/D130754
2022-08-04 15:12:15 +08:00
Thomas Lively b19de814ad [WebAssembly] Improve codegen for v128.bitselect
Add patterns selecting ((v1 ^ v2) & c) ^ v2 and ((v1 ^ v2) & ~c) ^ v2 to
v128.bitselect.

Resolves #56827.

Reviewed By: aheejin

Differential Revision: https://reviews.llvm.org/D131131
2022-08-03 23:28:37 -07:00
Craig Topper 91e8079cd5 [X86] Teach PostprocessISelDAG to fold ANDrm+TESTrr when chain result is used.
The isOnlyUserOf prevented the fold if the chain result had any
users. What we really care about is the the data result from the
AND is only used by the TEST, and the flags results from the ANDs
aren't used at all. It's ok if the chain has users, we just need
to replace those users with the chain from the TESTrm.

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D131117
2022-08-03 21:00:22 -07:00
Arthur Eubanks 203296d642 [BoundsChecking] Fix merging of sizes
BoundsChecking uses ObjectSizeOffsetEvaluator to keep track of the
underlying size/offset of pointers in allocations.  However,
ObjectSizeOffsetVisitor (something ObjectSizeOffsetEvaluator
uses to check for constant sizes/offsets)
doesn't quite treat sizes and offsets the same way as
BoundsChecking.  BoundsChecking wants to know the size of the
underlying allocation and the current pointer's offset within
it, but ObjectSizeOffsetVisitor only cares about the size
from the pointer to the end of the underlying allocation.

This only comes up when merging two size/offset pairs. Add a new mode to
ObjectSizeOffsetVisitor which cares about the underlying size/offset
rather than the size from the current pointer to the end of the
allocation.

Fixes a false positive with -fsanitize=bounds.

Reviewed By: vitalybuka, asbirlea

Differential Revision: https://reviews.llvm.org/D131001
2022-08-03 17:21:19 -07:00
Vitaly Buka a2aa6809a8 [NFC][Inliner] Add cl::opt<int> to tune InstrCost
The plan is tune this for sanitizers.

Differential Revision: https://reviews.llvm.org/D131123
2022-08-03 17:14:10 -07:00
Congzhe Cao 8dc4b2edfa [LoopInterchange][PR56275] Fix legality with negative dependence vectors
This is the 2nd patch of the two-patch series (D130188, D130189) that
fix PR56275 (https://github.com/llvm/llvm-project/issues/56275) which
is a missed opportunity for loop interchange.

As follow-up on the dependence analysis (DA) patch D130188, this patch
normalizes DA results in loop interchange, such that negative dependence
vectors queried by loop interchange are reversed to be non-negative.

Now all tests in PR56275 can get interchanged. Those tests are added
in lit test as `pr56275.ll`.

Reviewed By: kawashima-fj, bmahjour, Meinersbur, #loopoptwg

Differential Revision: https://reviews.llvm.org/D130189
2022-08-03 19:59:01 -04:00
Congzhe Cao 76be554931 [DependenceAnalysis][PR56275] Normalize negative dependence analysis results
This patch is the first of the two-patch series (D130188, D130179) that
resolve PR56275 (https://github.com/llvm/llvm-project/issues/56275)
which is a missed opportunity, where a perfrectly valid case for loop
interchange failed interchange legality.

If the distance/direction vector produced by dependence analysis (DA) is
negative, it needs to be normalized (reversed). This patch provides helper
functions `isDirectionNegative()` and `normalize()` in DA that does the
normalization, and clients can query DA to do normalization if needed.

A pass option `<normalized-results>` is added to DependenceAnalysisPrinterPass,
and we leverage it to update DA test cases to make sure of test coverage. The
test cases added in `Banerjee.ll` shows that negative vectors are normalized
with `print<da><normalized-results>`.

Reviewed By: bmahjour, Meinersbur, #loopoptwg

Differential Revision: https://reviews.llvm.org/D130188
2022-08-03 19:59:00 -04:00
Bill Wendling 239c831de4 Add switch to use "source_filename" instead of a hash ID for globally promoted local
During LTO a local promoted to a global gets a unique suffix based on
a hash of the module IR. This means that changes in the local's module
can affect the contents in another module that imported it (because the name
of the imported promoted local is changed, but that doesn't reflect a
real change in the importing module). So any tool that's
validating changes to the importing module will see a superficial change.

Instead of using the module hash, we can use the "source_filename" if it
exists to generate a unique identifier that doesn't change due to LTO
shenanigans.

Differential Revision: https://reviews.llvm.org/D128863
2022-08-03 16:41:56 -07:00
Mircea Trofin 0cb9746a7d [nfc][mlgo] Separate logger and training-mode model evaluator
This just shuffles implementations and declarations around. Now the
logger and the TF C API-based model evaluator are separate.

Differential Revision: https://reviews.llvm.org/D131116
2022-08-03 16:20:28 -07:00
Craig Topper 53d560b22f [RISCV] Prevent infinite loop after D129980.
D129980 converts (seteq (i64 (and X, 0xffffffff)), C1) into
(seteq (i64 (sext_inreg X, i32)), C1). If bit 31 of X is 0, it
will be turned back into an 'and' by SimplifyDemandedBits which
can cause an infinite loop.

To prevent this, check if bit 31 is 0 with computeKnownBits before
doing the transformation.

Fixes PR56905.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D131113
2022-08-03 15:19:07 -07:00
Vitaly Buka 26dd42705c [NFC][Inliner] Simplify clamping in addCost 2022-08-03 14:54:37 -07:00
Craig Topper 84e9194828 Revert "[X86][MC] Always emit `rep` prefix for `bsf`"
This reverts commit c2066d19cd.

It's causing failures on the build bots.
2022-08-03 14:51:34 -07:00
Chris Bieneman ce0bb316eb [DX] [NFC] Move hasSection check up
Juming out earlier if the global doesn't have a section is just a
cleaner early out.
2022-08-03 15:54:53 -05:00
Vitaly Buka e056e74dda [NFC][inline] Add const to an argument 2022-08-03 13:20:47 -07:00
Craig Topper ff91b2d9df [X86] Promote i16 CTTZ/CTTZ_ZERO_UNDEF always.
If we're going to emit a rep prefix before bsf as proposed in
D130956, it makes sense to promote i16 operations to i32 to avoid
the false depedency of tzcntw.

Reviewed By: skan, pengfei

Differential Revision: https://reviews.llvm.org/D130995
2022-08-03 13:12:20 -07:00
Adrian Prantl 905f2d1ecb Fix LDV InstrRefBasedImpl to not crash when encountering unreachable MBBs.
The testcase was delta-reduced from an LTO build with sanitizer
coverage and the MIR tail duplication pass caused a machine basic
block to become unreachable in MIR. This caused the MBB to be invisible
to the reverse post-order traversal used to initialize the MBB <->
RPONumber lookup tables.

rdar://97226240

Differential Revision: https://reviews.llvm.org/D130999
2022-08-03 13:05:05 -07:00
Felipe de Azevedo Piovezan a5a8a05c78 [SelectionDAG] Handle IntToPtr constants in dbg.value
The function `handleDebugValue` has custom logic to handle certain kinds
constants, namely integers, floats and null pointers. However, it does
not handle constant pointers created from IntToPtr ConstantExpressions.
This patch addresses the issue by replacing the Constant with its
integer operand.

A similar bug was addressed for GlobalISel in D130642.

Reviewed By: aprantl, #debug-info

Differential Revision: https://reviews.llvm.org/D130908
2022-08-03 14:10:05 -04:00
Nicolai Hähnle 5c7c83885f Revert "ManagedStatic: remove from DynamicLibrary"
This reverts commit 38817af6a7.

Buildbots report a Windows build error. Revert until I can look at it
more carefully.
2022-08-03 17:56:46 +02:00
Philip Reames 569a7f6aa3 [LV] Move definition of isPredicatedInst out of line and make it const [nfc] 2022-08-03 08:53:11 -07:00
Nicolai Hähnle 38817af6a7 ManagedStatic: remove from DynamicLibrary
Differential Revision: https://reviews.llvm.org/D129127
2022-08-03 17:43:52 +02:00
Philip Reames a1cab0daae [LV] Use cost base decision for uniform mem op strategy [nfc-ish]
This is mostly a stylistic change to make the uniform memop widening cost
code fit more naturally with the sourounding code.  Its not strictly
speaking NFC as I added in the store with invariant value case, and we
could in theory have a target where a gather/scatter is cheaper than a
single load/store... but it's probably NFC in practice.  Note that the
scatter/gather result can still be overriden later if the result is
uniform-by-parts.
2022-08-03 07:47:24 -07:00
David Truby 9a976f3661 [llvm] Always use TargetConstant for FP_ROUND ISD Nodes
This patch ensures consistency in the construction of FP_ROUND nodes
such that they always use ISD::TargetConstant instead of ISD::Constant.

This additionally fixes a bug in the AArch64 SVE backend where patterns
were matching against TargetConstant nodes and sometimes failing when
passed a Constant node.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D130370
2022-08-03 14:02:11 +01:00
Alex Bradbury 28f12a09ae [RISCV] Teach ComputeNumSignBitsForTargetNode about masked atomic intrinsics
An unnecessary sext.w is generated when masking the result of the
riscv_masked_cmpxchg_i64 intrinsic. Implementing handling of the
intrinsic in ComputeNumSignBitsForTargetNode allows it to be removed.

Although this isn't a particularly important optimisation, removing the
sext.w simplifies implementation of an additional cmpxchg-related
optimisation in D130192.

Although I can't produce a test with different codegen for the other
atomics intrinsics, these are added as well for completeness.

Differential Revision: https://reviews.llvm.org/D130191
2022-08-03 13:41:58 +01:00
Nicolai Hähnle d4cab87094 ManagedStatic: remove from CrashRecoveryContext
Differential Revision: https://reviews.llvm.org/D129126
2022-08-03 14:35:30 +02:00
Dmitry Preobrazhensky 05b3aadfff [AMDGPU][MC][GFX11] Correct v_dot2_f16_f16 and v_dot2_bf16_bf16
Enable SGPRs for the following operands of these opcodes:

- src operands of VOP3 variant.
- src2 operand of DPP variants.

Differential Revision: https://reviews.llvm.org/D130989
2022-08-03 15:08:23 +03:00
Dmitry Preobrazhensky ae553f9e49 [AMDGPU][MC][GFX10] Correct encoding of VOP3 v_cmpx* opcodes
Encode dst=EXEC but allow disassembler accept any dst value.

Differential Revision: https://reviews.llvm.org/D130978
2022-08-03 15:03:44 +03:00
Nicolai Hähnle 4cf0a9d4ae ManagedStatic: remove from Interpreter/ExternalFunctions
Differential Revision: https://reviews.llvm.org/D129124
2022-08-03 13:29:24 +02:00
Johannes Reifferscheid 3e9e43b48e Fix compiler error: init-statements in if/switch.
Reviewed By: pifon2a

Differential Revision: https://reviews.llvm.org/D131061
2022-08-03 11:36:41 +02:00
Fraser Cormack 646e2f4803 [VP] Rename VP int<->float conversion ISD opcodes
These should be named like the non-VP versions for consistency.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D130967
2022-08-03 10:04:38 +01:00
Phoebe Wang c2066d19cd [X86][MC] Always emit `rep` prefix for `bsf`
`BMI` new instruction `tzcnt` has better performance than `bsf` on new
processors. Its encoding has a mandatory prefix '0xf3' compared to
`bsf`. If we force emit `rep` prefix for `bsf`, we will gain better
performance when the same code run on new processors.

GCC has already done this way: https://c.godbolt.org/z/6xere6fs1

Fixes #34191

Reviewed By: skan

Differential Revision: https://reviews.llvm.org/D130956
2022-08-03 17:09:36 +08:00
Johannes Reifferscheid 7ae5d00afa Fix a stack overflow in ScalarEvolution.
Unfortunately, this overflow is extremely hard to reproduce reliably (in fact, I was unable to do so). The issue is that:

- getOperandsToCreate sometimes skips creating an SCEV for the LHS
- then, createSCEV is called for the BinaryOp
- ... which calls getNoWrapFlagsFromUB
- ... which under certain circumstances calls isSCEVExprNeverPoison
- ... which under certain circumstances requires the SCEVs of all operands

For certain deep dependency trees, this causes a stack overflow.

Reviewed By: bkramer, fhahn

Differential Revision: https://reviews.llvm.org/D129745
2022-08-03 11:08:01 +02:00
Nicolai Hähnle 2fe3589acd ManagedStatic: remove from PluginLoader
Differential Revision: https://reviews.llvm.org/D129123
2022-08-03 10:41:58 +02:00
Nicolai Hähnle 48c401a60e ManagedStatic: remove from TimeProfiler
Differential Revision: https://reviews.llvm.org/D129121
2022-08-03 10:41:58 +02:00
Liu, Chen3 5bbb0a831f [X86] Using `X86MemOperand` instead of `Operand` for `i32mem_TC` and `i64mem_TC`
To fix build fail when X86_GEN_FOLD_TABLES is enabled.

Differential Revision: https://reviews.llvm.org/D131049
2022-08-03 16:17:51 +08:00
Nikita Popov b128e057c1 [AA] Make ModRefInfo a bitmask enum (NFC)
Mark ModRefInfo as a bitmask enum, which allows using normal
& and | operators on it. This supersedes various functions like
unionModRef() and intersectModRef(). I think this makes the code
cleaner than going through helper functions...

Differential Revision: https://reviews.llvm.org/D130870
2022-08-03 10:05:55 +02:00
Max Kazantsev 34ae308c73 [SCEV] Use context to strengthen flags of BinOps
Sometimes SCEV cannot infer nuw/nsw from something as simple as
```
  len in [0, MAX_INT]
...
  iv = phi(0, iv.next)
  guard(iv <s len)
  guard(iv <u len)
  iv.next = iv + 1
```
just because flag strenthening only relies on definition and does not use local facts.
This patch adds support for the simplest case: inference of flags of `add(x, constant)`
if we can contextually prove that `x <= max_int - constant`.

In case if it has negative CT impact, we can add an option to switch it off. I woudln't
expect that though.

Differential Revision: https://reviews.llvm.org/D129643
Reviewed By: apilipenko
2022-08-03 14:08:57 +07:00
Craig Topper f19497f7b0 [RISCV] Use InstVisitor in RISCVCodeGenPrepare. NFC
Makes it easy to add new instructions to look at without dispatching
manually.
2022-08-02 21:19:30 -07:00
Chuanqi Xu ce1b24cca8 [IRBuilder] Handle constexpr-bitcast for IRBuilder::CreateThreadLocalAddress
In case that opaque pointers not enabled, there may be some constexpr
bitcast uses for thread local variables and the design of llvm allow
people to sink constant arbitrarily. This breaks the assumption of
IRBuilder::CreateThreadLocalAddress. This patch tries to handle the
case.
2022-08-03 11:13:49 +08:00
Paul Kirth d434e40f39 [llvm][NFC] Refactor code to use ProfDataUtils
In this patch we replace common code patterns with the use of utility
functions for dealing with profiling metadata. There should be no change
in functionality, as the existing checks should be preserved in all
cases.

Reviewed By: bogner, davidxl

Differential Revision: https://reviews.llvm.org/D128860
2022-08-03 00:09:45 +00:00
Austin Kerbow 3dfa562643 [AMDGPU] Add CL option for max-ilp scheduler.
When compiling for multiple targets the scheduler that is selected via the
-misched option is applied globally. This patch adds a target CL option instead.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D131022
2022-08-02 16:52:14 -07:00
Craig Topper a5605f1f68 [RISCV] Fix operand number in debug message in RISCVMergeBaseOffset.
This used to print from the ADDI where the operand number was
correct. It recently changed to print from the LUI or AUIPC which
needs to use operand 1 instead of 2.

This shows up as a crash with -debug.
2022-08-02 15:27:23 -07:00
Nicolai Hähnle f7872cdce1 CommandLine: add and use cl::SubCommand::get{All,TopLevel}
Prefer using these accessors to access the special sub-commands
corresponding to the top-level (no subcommand) and all sub-commands.

This is a preparatory step towards removing the use of ManagedStatic:
with a subsequent change, these global instances will be moved to
be regular function-scope statics.

It is split up to give downstream projects a (albeit short) window in
which they can switch to using the accessors in a forward-compatible
way.

Differential Revision: https://reviews.llvm.org/D129118
2022-08-02 23:49:16 +02:00
Austin Kerbow 40eec27618 [AMDGPU] Add llvm_unreachable to switch statement added in d7100b398. 2022-08-02 13:45:38 -07:00
Mircea Trofin 4146c1756d [nfc] Remove unused parameter in TailDuplicator::duplicateSimpleBB
Differential Revision: https://reviews.llvm.org/D131008
2022-08-02 13:39:34 -07:00
Austin Kerbow d7100b398b [AMDGPU] Add GCNMaxILPSchedStrategy
Creates a new scheduling strategy that attempts to maximize ILP for a single
wave.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D130869
2022-08-02 13:21:24 -07:00
Xiang Li 20f7f9b709 [NFC][DirectX backend] Fix crash when emit_obj for DirectX backend.
When emit-obj from clang directly, DirectX backend will hit assert caused by not initialize passes for AsmPrinter.
The fix will initialize the passes by calling createPassConfig.
Also ignore global variable which not has section in DXILAsmPrinter::emitGlobalVariable to avoid hit llvm_unreachable in DXILTargetObjectFile::SelectSectionForGlobal.

Reviewed By: beanz

Differential Revision: https://reviews.llvm.org/D130856
2022-08-02 12:09:07 -07:00
Arthur Eubanks 43aa4ac70b [StandardInstrumentations] Assign names to basic blocks without names
Fixes code in OrderedChangedData<T>::report which assumes that a string will only appear once in Before/After.

Reviewed By: jamieschmeiser

Differential Revision: https://reviews.llvm.org/D130587
2022-08-02 11:04:01 -07:00
Kai Nacke b38375378d [GIsel] Add missing libcall for G_MUL to LegalizerHelper
The LegalizerHelper misses the code to lower G_MUL to a library call,
which this change adds.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D130987
2022-08-02 13:35:25 -04:00
Vladislav Dzhidzhoev 71aecbb75c [AArch64] Treat x18 as callee-saved in functions with Windows calling convention on Darwin
rGcf97e0ec42b8 makes $x18 to be treated as callee-saved in functions with
Windows calling convention on non-Windows OSes.

Here we mark $x18 as callee-saved for functions with Windows calling
convention on Darwin, as well as on other non-Windows platforms, in
order to prevent some miscompilations (like miscompilation of
win64cc-darwin-backup-x18.ll).

Since getCalleeSavedRegs doesn't return x18 in list of callee-saved
registers, assignCalleeSavedSpillSlots and determineCalleeSaves
consider different sets of registers as callee-saved. It causes an
error:
```
Assertion failed: ((!HasCalleeSavedStackSize || getCalleeSavedStackSize() == Size) && "Invalid size calculated for callee saves"), function getCalleeSavedStackSize, file
AArch64MachineFunctionInfo.h, line 292.
```

Differential Revision: https://reviews.llvm.org/D130676
2022-08-02 20:33:42 +03:00
Vladislav Dzhidzhoev f6d9f00031 [DebugInfo] Test commit: update irrelevant comments
Differential Revision: https://reviews.llvm.org/D130998
2022-08-02 20:21:24 +03:00
Guozhi Wei 85a6dd50ad [MIPS] Expose the ZERO register as a constant physical register
The ZERO register should be exposed as a constant physical register through the interface TargetRegisterInfo::isConstantPhysReg.

Differential Revision: https://reviews.llvm.org/D130932
2022-08-02 17:04:52 +00:00
Craig Topper ae6877836e [RISCV] Add scheduler classes to PseudoVMV*R_V.
I think these pseudos will exist when the post-RA scheduler runs
so they should have sched classes.

Reviewed By: monkchiang

Differential Revision: https://reviews.llvm.org/D130945
2022-08-02 09:38:32 -07:00
Craig Topper 2e5c516a3d [RISCV] Add scheduler class to PseudoReadVLENB.
Reviewed By: monkchiang

Differential Revision: https://reviews.llvm.org/D130938
2022-08-02 09:38:32 -07:00
Alexander Timofeev a321d95b59 [AMDGPU] avoid blind converting to VALU REG_SEQUENCE and PHIs
In the 2e29b0138c we introduce a specific solving algorithm
that analyzes the VGPR to SGPR copies use chains and either lowers
the copy to v_readfirstlane_b32 or converts the whole chain to VALU forms.
Same time we still have the code that blindly converts to VALU REG_SEQUENCE and PHIs
in case they produce SGPR but have VGPRs input operands. In case the REG_SEQUENCE and PHIs
are in the VGPR to SGPR copy use chain, and this chain was considered long enough to convert
copy to v_readfistlane_b32, further lowering them to VALU leads to several kinds of issues.
At first, we have v_readfistlane_b32 which is completely useless because most parts of its use chain
were moved to VALU forms. Second, we may encounter subtle bugs related to the EXEC-dependent CF
because of the weird mixing of SALU and VALU instructions.
This change removes the code that moves REG_SEQUENCE and PHIs to VALU. Instead, we use the fact
that both REG_SEQUENCE and PHIs have copy semantics. That is, if they define SGPR but have VGPR inputs,
we insert VGPR to SGPR copies to make them pure SGPR. Then, the new copies are processed by the common
VGPR to SGPR lowering algorithm.
This is Part 2 in the series of commits aiming at the massive refactoring of the SIFixSGPRCopies pass.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D130367
2022-08-02 18:37:57 +02:00
Jay Foad e301e071ba [AMDGPU] Remove IR SpeculativeExecution pass from codegen pipeline
This pass seems to have very little effect because all it does is hoist
some instructions, but it is followed later in the codegen pipeline by
the IR CodeSinking pass which does the opposite.

Differential Revision: https://reviews.llvm.org/D130258
2022-08-02 17:35:20 +01:00
Jay Foad c24d68fff1 [AMDGPU] Take advantage of VOP3 literals in convertToThreeAddress
This improves a corner case where v_fmac can be converted to v_fma on
GFX10+ even if it has a literal operand.

Differential Revision: https://reviews.llvm.org/D130992
2022-08-02 17:27:11 +01:00
Philip Reames 0b47615fcf [LV] Recognize store of invariant value to invariant address as uniform
This extends the handling of uniform memory operations to handle the case where a store is storing a loop invariant value. Unlike the general case of a store to an invariant address where we must use the last active lane, in this case we can use any lane since all lanes must produce the same result.

For context, the basic structure of the existing code and how the change fits in:
* First, we select a widening strategy. (The result is irrelevant for this patch.)
* Then we determine if a computation is uniform within all lanes of VF. (Note this is the uniform-per-part definition, not LAI's uniform across all unrolled iterations definition.)
* If it is, we overrule the widening strategy, and unconditionally scalarize.
* VPReplicationRecipe - which is what actually does the scalarization - knows how to handle unform-per-part values including for scalable vectors. However, we do need to know that the expression is safe to execute without predication - e.g. the uniform mem op was unconditional in the original loop. (This part was split off and already landed.)

An obvious question is why not simply implement the generic case? The answer is that I'm going to, but doing so without a canonicalization towards uniform causes regressions due to bad interaction with scalarization/uniformity of values feeding the uniform mem-op. This patch is needed to avoid those regressions.

Differential Revision: https://reviews.llvm.org/D130364
2022-08-02 08:09:49 -07:00
Phoebe Wang 23021d4d8c [X86][FP16] Fix vector_shuffle and lowering without f16c feature problems
The problem Alexander reported on D127982 was caused by an optimization
for AVX512-FP16 instruction. We must limit it to the feature enabled only.

During the investigation, I found we didn't expand for fp_round/fp_extend
without F16C. This may result runtime crash, so change them too.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D130817
2022-08-02 22:26:41 +08:00
Jay Foad bb2832410e [IRBuilder] CreateIntrinsic with implicit mangling
Add a new IRBuilderBase::CreateIntrinsic which takes the return type and
argument values for the intrinsic call but does not take an explicit
list of types to mangle. Instead the builder works this out from the
intrinsic declaration and the types of the supplied arguments.

This means that the mangling is hidden from the client, which in turn
means that intrinsic definitions can change which arguments are mangled
without requiring any changes to the client code.

Differential Revision: https://reviews.llvm.org/D130776
2022-08-02 13:08:35 +01:00
David Green 1206f72e31 [AArch64] Fold Mul(And(Srl(X, 15), 0x10001), 0xffff) to CMLTz
This folds a v4i32 Mul(And(Srl(X, 15), 0x10001), 0xffff) into a v8i16
CMLTz instruction. The Srl and And extract the top bit (whether the
input is negative) and the Mul sets all values in the i16 half to all
1/0 depending on if that top bit was set. This is equivalent to a v8i16
CMLTz instruction. The same applies to other sizes with equivalent
constants.

Differential Revision: https://reviews.llvm.org/D130874
2022-08-02 13:01:59 +01:00
Simon Pilgrim b651fdff79 [DAG] matchRotateSub - ensure the (pre-extended) shift amount is wide enough for the amount mask (PR56859)
matchRotateSub is given shift amounts that will already have stripped any/zero-extend nodes from - so make sure those values are wide enough to take a mask.
2022-08-02 11:38:52 +01:00
David Sherwood 4ef9cb6c17 [AArch64][LoopVectorize] Disable tail-folding for SVE when loop has interleaved accesses
If we have interleave groups in the loop we want to vectorise then
we should fall back on normal vectorisation with a scalar epilogue. In
such cases when tail-folding is enabled we'll almost certainly go on to
create vplans with very high costs for all vector VFs and fall back on
VF=1 anyway. This is likely to be worse than if we'd just used an
unpredicated vector loop in the first place.

Once the vectoriser has proper support for analysing all the costs
for each combination of VF and vectorisation style, then we should
be able to remove this.

Added an extra test here:

  Transforms/LoopVectorize/AArch64/sve-tail-folding-option.ll

Differential Revision: https://reviews.llvm.org/D128342
2022-08-02 09:52:33 +01:00
Alex Bradbury 5ad59c9e59 [RISCV][NFCI] Set TransientStackAlignment and rely on it rather than RVV-specific logic on RVV-less functions
* TargetFrameLowering has a TransientStackAlignment field that "returns
  the number of bytes to which the stack pointer must be aligned at all
  times, even between calls.
  * As explained in the [RISC-V calling
    convention](https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/master/riscv-cc.adoc),
    the stack pointer must remain fully aligned throughout execution for
    compliant code. This is important for embedded targets that might avoid
    realigning the stack pointer for interrupt service routines. Systems
    running full OSes may always realign the stack anyway.
* TransientStackAlignment is used in estimateStackSize in
  MachineFrameInfo and in PEI::calculateFrameObjectOffsets.
  * estimateStackSize is only used in the RISC-V backend for scavenging
    slots. It may be possible to craft a function where the difference
    is observable, but it wouldn't be a meaningful test.
  * calculateFrameObjectOffsets makes use of TransientStackAlignment,
    but then sets the stack alignment to the max of that alignment and
    MaxAlign, which is unconditionally set to 16 in
    RISCVFrameLowering::processFunctionBeforeFrameFinalized
  * I've changed this logic to only set MaxAlign if there are RVV frame
    objects. There should be no functional change here for either RVV
    targets (MaxAlign is set as before) or non-RVV targets
    (TransientStackAlign is now 16 anyway).

Differential Revision: https://reviews.llvm.org/D130068
2022-08-02 09:46:06 +01:00
Tim Northover b586dc21a7 Outliner: add "target-cpu" feature from source function to outlined
The CPU is used to determine which inline asm instructions are allowed, so
needs to be copied across in case the outlined function contains any.
2022-08-02 09:33:29 +01:00
wanglian e208bab55f [RISCV][NFC] Use defined variable instead some code.
Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D130687
2022-08-02 16:26:33 +08:00
jacquesguan e38af7ba95 [LV] Refactor getExtendedAddReductionCost to support other extended reduction more than Add.
Now the API getExtendedAddReductionCost is used to determine the cost of extended Add reduction with optional Mul. For Arm, it could cover the cases. But for other target, for example: RISCV, they support other kinds of extended recution, such as FAdd.

This patch does the following changes:
1, Split getExtendedAddReductionCost into 2 new API: getExtendedReductionCost which handles the extended reduction with addtional input of Opcode; getMulAccReductionCost which handle the MLA cases the getExtendedAddReductionCost.
2, Refactor getReductionPatternCost, add some contraint condition to make sure the getMulAccReductionCost should only handle the reuction of Add + Mul.

Differential Revision: https://reviews.llvm.org/D130868
2022-08-02 16:02:38 +08:00
Sotiris Apostolakis 995b61cdac [SelectOpti] Auto-disable other cmov optis when the new select-opti pass is enabled
Reviewed By: davidxl

Differential Revision: https://reviews.llvm.org/D129817
2022-08-02 00:19:59 +00:00
Sunho Kim 5680ef870f [IntelJITEvents] Add missing include.
Fixes compilation error.

Differential Revision: https://reviews.llvm.org/D130898
2022-08-02 08:45:14 +09:00
Martin Sebor bcef4d238d [InstCombine] Correct strtol folding with nonnull endptr
Reflect in the pointer's offset the length of the leading part
of the consumed string preceding the first converted digit.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D130912
2022-08-01 16:47:05 -06:00
Craig Topper da5b1bf5bb [RISCV] Teach RISCVMergeBaseOffset to merge %lo/%pcrel_lo into load/store after folding arithmetic.
It's possible we have:
lui  a0, %hi(sym)
addi a0, %lo(sym)
addi a0, <offset1>
lw a0, <offset2>(a0)

We want to arrive at
lui a0, %hi(sym+offset1+offset2)
lw a0, %lo(sym+offset1+offset2)

We currently fail to do this because we only consider loads/stores
if we didn't find any arithmetic.

This patch splits arithmetic folding and load/store folding into
two separate phases. The load/store folding can no longer assume
the offset in hi/lo is 0 so we must combine the offsets. I've applied
the same simm32 limit that we applied in the arithmetic folding.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D130931
2022-08-01 15:33:21 -07:00
Ilia Diachkov b25b507c77 [SPIRV] use tablegen to create SPIRVBaseInfo*
The patch replaces SPIRVBaseInfo.* previously created using macros by
the tablegen approach. There are many small changes in other files due to
differences in namespaces. Also, functions in SPIRVUtils are moved to
the llvm namespace.

Differential Revision: https://reviews.llvm.org/D130518

Co-authored-by: Aleksandr Bezzubikov <zuban32s@gmail.com>
Co-authored-by: Michal Paszkowski <michal.paszkowski@outlook.com>
Co-authored-by: Andrey Tretyakov <andrey1.tretyakov@intel.com>
Co-authored-by: Konrad Trifunovic <konrad.trifunovic@intel.com>
2022-08-02 01:57:23 +03:00
Craig Topper e07a8155f5 [RISCV] Move Pre-RA pseudo expansion from addMachineSSAOptimization to addPreRegAlloc.
addMachineSSAOptimization is skipped for -O0, but this pass is
required for -O0.
2022-08-01 13:44:43 -07:00
Vasileios Porpodas f669030373 [TTI][AArch64][SLP] Sets the cost of an ADD reduction 2xi64 to 2.
2xi64 is the legalized type for wide reductions (like 16xi64) and setting the
cost to 2 makes `load-reduce` and `load-zext-reduce` patterns profitable.

The few performance measurments that I did on an aarch64 machine confirm that
these patterns are actually faster when vectorized.

Differential Revision: https://reviews.llvm.org/D130740
2022-08-01 13:03:14 -07:00
Fangrui Song 2b70bebc6d [MachineFunctionPass] Support -print-changed={,c}diff{,-quiet}
Follow-up to D130434.
Move doSystemDiff to PrintPasses.cpp and call it in MachineFunctionPass.cpp.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D130833
2022-08-01 12:56:15 -07:00
Vang Thao 7fc52d7c8b [AMDGPU] Fix DGEMM hazard for GFX90a
For VALU write and memory (VM, L/DS, FLAT) instructions, SQ would insert
wait-states to avoid data hazard. However when there is a DGEMM instruction
in-between them, SQ incorrectly disables the wait-states thus the data hazard
needs to be handled with this workaround.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D130677
2022-08-01 11:56:22 -07:00
Craig Topper 450edb0b37 [RISCV] Explicitly select second operand of branch condition to X0.
At least based on the lit tests, the coalescer sometimes fails to
propagate the copy from X0 into the branch instruction. This patch
does it manually during isel. The majority of the changes are from
the select patterns.

Some of the changes are just register allocation changes. Only
the Select change affects the whether a b*z instruction is generated
in the tests. I changed the branch pattern for consistency.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D130809
2022-08-01 11:16:48 -07:00
Craig Topper ad8db972b0 [RISCV] Eagerly delete instructions in MergeBaseOffset.
The only iterator we're holding points to HiLUI and we never
delete that so I think it is safe to delete everything else
immediately.

I want to split detectAndFoldOffset into two phases. First, combine
LUI+ADDI with any ADD/ADDI/SHXADD that comes after it. This may
open opportunities to fold the ADDI from the LUI+ADDI into a
load/store address. So the load/store folding should run as a
second phase even if the ADD/ADDI/SHXADD made changes.

In order to do this we need to eagerly delete instructions in the
first phase so that we don't have dead users of the LUI+ADDI
when we start the second phase.

Patches to split the phases will come later.

Reviewed By: asb, luismarques

Differential Revision: https://reviews.llvm.org/D130119
2022-08-01 09:32:46 -07:00
Lorenzo Albano 71b7c03fd6 [RISCV][VP] Custom lower VP_STRIDED_LOAD and VP_STRIDED_STORE
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D121113
2022-08-01 09:23:45 -07:00
Piotr Sobczak f29a19b0b8 [AMDGPU] Extend cases for ReadM0MovRelInterpHazard
Extend hazard recognizer of ReadM0MovRelInterpHazard with
DS_READ_ADDTID and DS_WRITE_ADDTID, as they also
require a manually inserted S_NOP after SALU writing m0.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D130783
2022-08-01 17:59:33 +02:00
Simon Pilgrim 27105e2f30 MisExpect.h - fix Wdocumentation warnings. NFC. 2022-08-01 15:06:30 +01:00
Dmitry Preobrazhensky 3aae8cd842 [AMDGPU][MC] Verify selection of LDS MUBUF opcodes
Differential Revision: https://reviews.llvm.org/D130761
2022-08-01 16:44:39 +03:00
Anubhab Ghosh ac3cb4ecd0 [Orc] Disable use of shared memory on Android
shm_open and shm_unlink are not available on Android. This commit
disables SharedMemoryMapper on Android until a better solution is
available.

https://android.googlesource.com/platform/bionic/+/refs/heads/master/docs/status.md
https://github.com/llvm/llvm-project/issues/56812

Differential Revision: https://reviews.llvm.org/D130814
2022-08-01 18:48:39 +05:30
Dmitry Preobrazhensky bb901dcc5a [AMDGPU][MC][GFX940] Correct disassembly of MFMA opcodes
Add a decoder table for GFX940 MFMA opcodes.

Differential Revision: https://reviews.llvm.org/D130759
2022-08-01 16:00:47 +03:00
Jay Foad a5a7a9da39 [X86] Fix updating LiveVariables in convertToThreeAddress
Fix all instances of:
*** Bad machine code: Kill missing from LiveVariables ***
in the X86 CodeGen tests with D129213 applied, which adds verification
of LiveIntervals after the TwoAddressInstruction pass runs.

Differential Revision: https://reviews.llvm.org/D129634
2022-08-01 13:45:21 +01:00
Lucas Prates ba9caf9170 [Arm] Fix parsing and emission of Tag_also_compatible_with eabi attribute
According to the ABI for the Arm Architecture, the value for the
Tag_also_compatible_with eabi attribute is represented by an NTBS entry.
This string value, in turn, is composed of a pair of tag+value encoded
in one of two formats:
- ULEB128: tag, ULEB128: value, 0.
- ULEB128: tag, NBTS: data.
(See [[ 60a8eb8c55/addenda32/addenda32.rst (3373secondary-compatibility-tag) | section 3.3.7.3 on the Addenda to, and Errata in, the ABI for the Arm Architecture ]].)

Currently the Arm assembly parser and streamer ignore the encoding of
the attribute's NTBS value, which can result in incorrect attributes
being emitted in both assembly and object file outputs.

This patch fixes these issues by properly handing the value's encoding.
An update to llvm-readobj to properly handle the attribute's value will be
covered by a separate patch.

Patch by Victor Campos and Lucas Prates.

Reviewed By: vhscampos

Differential Revision: https://reviews.llvm.org/D129500
2022-08-01 13:28:01 +01:00
Marius Brehler ddb6c28638 Avoid comparison of integers of different signs
Otherwiese a warning is emitted when compiling with `-Wsign-compare`.
2022-08-01 11:20:41 +00:00
Pierre van Houtryve a847e3dc52 [NFC][AMDGPU] Fix typo in SIRegisterInfo.cpp 2022-08-01 07:01:33 -04:00
Simon Pilgrim b43d7aacf8 [DAG] visitINSERT_VECTOR_ELT - extend folding to BUILD_VECTOR if all missing elements from an insertion chain are known zero 2022-08-01 11:32:33 +01:00
Petar Avramovic e8d260753e [AMDGPU] gfx11 allow dlc for MUBUF atomics
Add MC support for dlc in gfx11 MUBUF atomic instructions.

Differential Revision: https://reviews.llvm.org/D129075
2022-08-01 12:18:01 +02:00
Dominik Adamski d90b7bf2c5 Add support for lowering simd if clause to LLVM IR
Scope of changes:
  1) Added new function to generate loop versioning
  2) Added support for if clause to applySimd function
  2) Added tests which confirm that lowering is successful

If ifCond is specified, then collapsed loop is duplicated and if branch
is added. Duplicated loop is executed if simd ifCond is evaluated to false.

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D129368

Signed-off-by: Dominik Adamski <dominik.adamski@amd.com>
2022-08-01 04:43:32 -05:00
David Sherwood 41119a0f52 [DAGCombiner] Extend visitAND to include EXTRACT_SUBVECTOR
Eliminate an AND by redefining an anyext|sext|zext.

     (and (extract_subvector (anyext|sext|zext v) _) iN_mask)
  => (extract_subvector (zeroext_iN v))

Differential Revision: https://reviews.llvm.org/D130782
2022-08-01 10:32:32 +01:00
Luís Marques 0bc177b6f5 [RISCV] Extend the Merge Base Offset pass to handle AUIPC+ADDI
Builds upon D123264, adding support for merging the low part of the LLA
address into the load/store instruction offsets.

Differential Revision: https://reviews.llvm.org/D123265
2022-08-01 11:30:02 +02:00
Vladislav Dzhidzhoev facb3ac385 [GlobalISel][DebugInfo] salvageDebugInfo analogue for gMIR
Salvage debug info of instruction that is about to be deleted as dead in
Combiner pass. Currently supported instructions are COPY and G_TRUNC.

It allows to salvage debug info of some dead arguments of functions, by putting
DWARF expression corresponding to the instruction being deleted into related
DBG_VALUE instruction.

Here is an example of missing variables location https://godbolt.org/z/K48osb9dK.
We see that arguments x, y of function foo are not available in debugger, and
corresponding DBG_VALUE instructions have undefined register operand instead of
variables locaton after Aarch64PreLegalizerCombiner pass. The reason is that
registers where variables are located are removed as dead (with instruction
G_TRUNC). We can use salvageDebugInfo analogue for gMIR to preserve debug
locations of dead variables.

Statistics of llvm object files built with vs without this commit on -O2
optimization level (CMAKE_BUILD_TYPE=RelWithDebInfo, -fglobal-isel) on Aarch64 (macOS):

Number of variables with 100% of parent scope covered by DW_AT_location has been increased by 7,9%.
Number of variables with 0% coverage of parent scope has been decreased by 1,2%.
Number of variables processed by location statistics has been increased by 2,9%.
Average PC ranges coverage has been increased by 1,8 percentage points.

Coverage can be improved by supporting more instructions, or by calling
salvageDebugInfo for instructions that are deleted during Combiner rules exection.

Reviewed By: aprantl

Differential Revision: https://reviews.llvm.org/D129909
2022-08-01 11:14:53 +02:00
Alex Bradbury 9bf2d8cbbe [NFC] Use AllocaInst's getAddressSpace helper 2022-08-01 10:11:16 +01:00
Nikita Popov 7314ad7a06 Revert "[SimplifyCFG] Allow SimplifyCFG hoisting to skip over non-matching instructions"
This reverts commit 7b0f6378e2.

As commented on the review, this patch has a correctness issue
regarding the modelling of memory effects.
2022-08-01 09:20:56 +02:00
Momchil Velikov 7b0f6378e2 [SimplifyCFG] Allow SimplifyCFG hoisting to skip over non-matching instructions
SimplifyCFG does some common code hoisting, which is limited to hoisting a
sequence of identical instruction in identical order and stops at the first
non-identical instruction.

This patch allows hoisting instruction pairs over same-length sequences of
non-matching instructions. The linear asymptotic complexity of the algorithm
stays the same, there's an extra parameter `simplifycfg-hoist-common-skip-limit`
serving to limit compilation time and/or the size of the hoisted live ranges.

The patch improves SPECv6/525.x264_r by about 10%.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D129370
2022-08-01 07:55:14 +01:00
Nikita Popov 4ec22ba9c8 [GlobalsAA] Remove unnecessary AAResultBase fallback (NFC)
This is unnecessary, as AA result chaining is implemented at a
higher level now.
2022-08-01 08:34:58 +02:00
Nikita Popov a21c245307 [ARMParallelDSP] Remove unnecessary ModRef intersection (NFC)
Intersecting with ModRef is a no-op, as these are the only two
possible values.
2022-08-01 08:34:58 +02:00
Nikita Popov 5b1d10bda6 [AA] Drop setModAndRef() function (NFC)
Without the "must" state, this function is pointless, because we
can just directly create a ModRef instead.
2022-08-01 07:55:39 +02:00
Nikita Popov 34683c3e35 [MSSA] Fix expensive checks build 2022-08-01 07:28:52 +02:00
Nikita Popov f96ea53e89 [AA] Do not track Must in ModRefInfo
getModRefInfo() queries currently track whether the result is a
MustAlias on a best-effort basis. The only user of this functionality
is the optimized memory access type in MemorySSA -- which in turn
has no users. Given that this functionality has not found a user
since it was introduced five years ago (in D38862), I think we
should drop it again.

The context is that I'm working to separate FunctionModRefBehavior
to track mod/ref for different location kinds (like argmem or
inaccessiblemem) separately, and the fact that ModRefInfo also has
an unrelated Must flag makes this quite awkward, especially as this
means that NoModRef is not a zero value. If we want to retain the
functionality, I would probably split getModRefInfo() results into
a part that just contains the ModRef information, and a separate
part containing a (best-effort) AliasResult.

Differential Revision: https://reviews.llvm.org/D130713
2022-08-01 07:14:31 +02:00
Chuanqi Xu 9701053517 Introduce @llvm.threadlocal.address intrinsic to access TLS variable
This belongs to a series of patches which try to solve the thread
identification problem in coroutines. See
https://discourse.llvm.org/t/address-thread-identification-problems-with-coroutine/62015
for a full background.

The problem consists of two concrete problems: TLS variable and readnone
functions. This patch tries to convert the TLS problem to readnone
problem by converting the access of TLS variable to an intrinsic which
is marked as readnone.

The readnone problem would be addressed in following patches.

Reviewed By: nikic, jyknight, nhaehnle, ychen

Differential Revision: https://reviews.llvm.org/D125291
2022-08-01 10:51:30 +08:00
Kazu Hirata bf6021709a Use drop_begin (NFC) 2022-07-31 15:17:09 -07:00
Kazu Hirata d11103f9a0 [Hexagon] Remove unused declaration adjustForCalleeSavedRegsSpillCall (NFC)
The function definition was removed on Apr 23, 2015 in commit
876a19d855, but the declaration has
remained since.
2022-07-31 15:17:06 -07:00
Kazu Hirata 71638b8be7 [ExecutionEngine] Ensure newlines at the end of files (NFC) 2022-07-31 15:16:58 -07:00
Luís Marques 260a641068 [RISCV] Pre-RA expand pseudos pass
Expand load address pseudo-instructions earlier (pre-ra) to allow follow-up
patches to fold the addi of PseudoLLA instructions into the immediate
operand of load/store instructions.

Differential Revision: https://reviews.llvm.org/D123264
2022-07-31 23:19:00 +02:00
Sanjay Patel 02b3a35892 [InstSimplify] fold FP rounding intrinsic with rounded operand
issue #56775

I rearranged the Thumb2 codegen test to avoid simplifying the chain
of rounding instructions. I'm assuming the intent of the test is
to verify lowering of each of those intrinsics.
2022-07-31 10:00:27 -04:00
Simon Pilgrim acb5abb7d3 [X86] getFauxShuffleMask - use DemandedElts variant of getTargetShuffleInputs. NFCI.
We don't specify the demanded elts yet, this patch just rewires the getTargetShuffleInputs calls and gives an "all demanded elts" mask.
2022-07-31 12:15:04 +01:00
Simon Pilgrim 9cdba33337 [X86] combineX86ShufflesRecursively - determine demanded elts to pass to getTargetShuffleInputs
Only PACKSS/PACKUS faux shuffles make use of the demanded elts at the moment, but this at least improves the handling of a couple of truncation patterns.
2022-07-31 11:30:40 +01:00
Sunho Kim c559072e46 [JITLink][COFF] Remove unused variable. 2022-07-31 09:19:17 +09:00
Sunho Kim b501770aef [JITLink][COFF] Handle COMDAT symbol with offset.
Handles COMDAT symbol with an offset and refactor the code to only generated symbol if the second symbol was encountered. This happens very infrequently but happens in recursive_mutex implementation of MSVC STL library.

Reviewed By: lhames

Differential Revision: https://reviews.llvm.org/D130454
2022-07-31 09:09:48 +09:00
Sunho Kim d86f903b1d [JITLink][COFF][x86_64] Implement remaining IMAGE_REL_AMD64_REL32_*.
Implements remaining IMAGE_REL_AMD64_REL32_*. We only need IMAGE_REL_AMD64_REL32_4 for now but doing all remaining ones for completeness. (clang only uses IMAGE_REL_AMD64_REL32_1 and IMAGE_REL_AMD64_REL32)

Reviewed By: lhames

Differential Revision: https://reviews.llvm.org/D130452
2022-07-31 09:03:28 +09:00
Sunho Kim e781451140 [JITLink] Relax zero-fill edge assertions.
Relax zero-fill edge assertions to only consider relocation edges. Keep-alive edges to zero-fill blocks can cause this assertion which is too strict.

Reviewed By: lhames

Differential Revision: https://reviews.llvm.org/D130450
2022-07-31 08:34:10 +09:00
Sunho Kim ee9cf336d6 [JITLink][COFF] Remove obsolete FIXMEs. (NFC) 2022-07-31 08:10:14 +09:00
Sunho Kim ea75c25833 [JITLInk][COFF] Remove unnecessary unique_ptr. (NFC) 2022-07-31 08:08:19 +09:00
Sunho Kim 067faddb55 [JITLink][COFF] Add explicit std::move.
Since ArgList is not copyable we need to make sure it's moved explicitly.
2022-07-31 08:01:00 +09:00
Sunho Kim 88181375a3 [JITLink][COFF] Implement include/alternatename linker directive.
Implements include/alternatename linker directive. Alternatename is used by static msvc runtime library. Alias symbol is technically incorrect (we have to search for external definition) but we don't have a way to represent this in jitlink/orc yet, this is solved in the following up patch.

Inlcude linker directive is used in ucrt to forcelly lookup the static initializer symbols so that they will be emitted. It's implemented as extenral symbols with live flag on that cause the lookup of these symbols.

Reviewed By: lhames

Differential Revision: https://reviews.llvm.org/D130276
2022-07-31 07:49:59 +09:00
Craig Topper d21b315360 [RISCV] Remove vmerges from vector ceil, floor, trunc lowering.
Use masked operations to suppress spurious exception bits being
set in fflags. Unfortunately, doing this adds extra copies.
2022-07-30 10:58:41 -07:00
Simon Pilgrim df457f583a [X86] Use std::tie so we can have more meaningful variable names for demanded bits/elts pairs. NFCI.
*.first + *.second were proving difficult to keep track of.
2022-07-30 18:57:15 +01:00
Kazu Hirata 12b29900a1 Use any_of (NFC) 2022-07-30 10:35:56 -07:00
Kazu Hirata 60db8d9b4e Use nullptr instead of 0 (NFC)
Identified with modernize-use-nullptr.
2022-07-30 10:35:48 -07:00
Kazu Hirata 66b6cc3acd [ExecutionEngine] Ensure a newline at the end of a file (NFC) 2022-07-30 10:35:43 -07:00
Craig Topper a23f07fb1d [RISCV] Add merge operands to more RISCVISD::*_VL opcodes.
This adds a merge operand to all of the binary _VL nodes. Including
integer and widening. They all share multiclasses in tablegen
so doing them all at once was easiest.

I plan to use FADD_VL in an upcoming patch. The rest are just for
consistency to keep tablegen working.

This does reduce the isel table size by about 25k so that's nice.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D130816
2022-07-30 10:26:38 -07:00
Craig Topper 9bf305fe2b [RISCV] Swap the merge and mask operand order for VRGATHER*_VL and FCOPYSIGN_VL nodes.
Based on review feedback from D130816.
2022-07-30 09:57:05 -07:00
Simon Pilgrim a14f94c20c [X86] computeKnownBitsForTargetNode - out of range X86ISD::VSRAI doesn't fold to zero
Noticed by inspection and I can't seem to make a test case, but SSE arithmetic bit shifts clamp to the max shift amount (i.e. create a sign splat) - combineVectorShiftImm already does something similar.
2022-07-30 17:55:39 +01:00
Dmitry Vassiliev adc387460d [CodeGen] Fixed undeclared MISchedCutoff in case of NDEBUG and LLVM_ENABLE_ABI_BREAKING_CHECKS
This patch fixes the error llvm/lib/CodeGen/MachineScheduler.cpp(755): error C2065: 'MISchedCutoff': undeclared identifier in case of NDEBUG and LLVM_ENABLE_ABI_BREAKING_CHECKS.
Note MISchedCutoff is declared under #ifndef NDEBUG.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D130425
2022-07-30 18:24:50 +02:00
Sanjay Patel 7073ec530e [InstCombine] canonicalize more zext-and-of-bool compare to narrow and
https://alive2.llvm.org/ce/z/vBNiiM

This matches variants of patterns that were folded with:
b5a9361c90
2022-07-30 11:22:05 -04:00
Austin Kerbow 7898426a72 [AMDGPU] Remove unused function 2022-07-30 07:47:35 -07:00
Simon Pilgrim 49c0980eac Fix Wdocumentation warning. NFC.
warning: '\returns' command used in a comment that is attached to a function returning void
2022-07-30 15:41:13 +01:00
Simon Pilgrim 813459ed2b [X86] combineSelect fold 'smin' style pattern select(pcmpgt(RHS, LHS), LHS, RHS) -> select(pcmpgt(LHS, RHS), RHS, LHS) if pcmpgt(LHS, RHS) already exists
Avoids repeated commuted comparisons when we're performing min/max and clamp patterns
2022-07-30 15:31:36 +01:00
Nuno Lopes d4b4747de5 ConstantFolding: fold OOB accesses to poison instead of undef 2022-07-30 15:20:32 +01:00
Sanjay Patel f95a6aea1b [InstCombine] avoid splitting a constant expression with div/rem fold
Follow-up to d4940c0f3d to further limit the transform
to avoid an unintended pattern/fold of a constant expression.
2022-07-30 09:45:25 -04:00
Simon Pilgrim 9ad082eb5a [DAG] Pull out repeated getOperand() calls for shuffle ops. NFC. 2022-07-30 14:02:54 +01:00
Simon Pilgrim 276480b1d3 [AMDGPU] Fix || vs && precedence warning. NFC. 2022-07-30 14:02:54 +01:00
Nuno Lopes fffabd5348 [NFC] Switch a few uses of undef to poison as placeholders for unreachable code 2022-07-30 13:55:56 +01:00
Alexander Shaposhnikov 4220ef2be1 [InstCombine] Add fold for redundant sign bits count comparison
For power-of-2 C:
((X s>> ShiftC) ^ X) u< C --> (X + C) u< (C << 1)
((X s>> ShiftC) ^ X) u> (C - 1) --> (X + C) u> ((C << 1) - 1)

(https://github.com/llvm/llvm-project/issues/56479)

Test plan:
0/ ninja check-llvm check-clang + bootstrap LLVM/Clang
1/ https://alive2.llvm.org/ce/z/eEUfx3

Differential revision: https://reviews.llvm.org/D130433
2022-07-30 09:06:53 +00:00
Carl Ritson 4c4db81630 [AMDGPU] Extend SILoadStoreOptimizer to s_load instructions
Apply merging to s_load as is done for s_buffer_load.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D130742
2022-07-30 11:38:39 +09:00
Alexander Shaposhnikov d982f1e0c6 [InstCombine] Refactor foldICmpMulConstant
This is a follow-up to 2ebfda2417
(replace "if" with "else if" since the cases nuw/nsw
were meant to be handled separately).

Test plan:
1/ ninja check-llvm check-clang check-lld
2/ Bootstrapped LLVM/Clang pass tests
2022-07-30 02:29:15 +00:00
Fangrui Song ce6dd4e835 Revert D130458 "[llvm-objcopy] Support --{,de}compress-debug-sections for zstd"
This reverts commit c26dc2904b.

The new Zstd dispatch has an ongoing design discussion related to https://reviews.llvm.org/D130516#3688123 .
Revert for now before it is resolved.
2022-07-29 15:46:51 -07:00
Sanjay Patel d4940c0f3d [InstCombine] fix miscompile from urem/udiv transform with constant expression
The isa<Constant> check could misfire on an instruction with 2 constant
operands. This bug was introduced with bb789381fc (D36988).

See issue #56810 for a C source example that exposed the bug.
2022-07-29 17:14:30 -04:00
Craig Topper e637feee80 [RISCV] Add isel pattern for (setne/eq GPR, -2048)
For constants in the range [-2047, 2048] we use addi. If the constant
is -2048 we can use xori. If we don't match this explicitly, we'll
emit an LI for the -2048 followed by an XOR.
2022-07-29 14:07:38 -07:00
Jay Foad 9436a85eb6 [IRBuilder] Make createCallHelper a member function. NFC.
This just avoids explicitly passing in 'this' as an argument in a bunch
of places.

Differential Revision: https://reviews.llvm.org/D130752
2022-07-29 21:17:26 +01:00
Austin Kerbow 2c82a126d7 [AMDGPU] Omit unnecessary waitcnt before barriers
It is not necessary to wait for all outstanding memory operations before
barriers on hardware that can back off of the barrier in the event of an
exception when traps are enabled. Add a new subtarget feature which
tracks which HW has this ability.

Reviewed By: #amdgpu, rampitec

Differential Revision: https://reviews.llvm.org/D130722
2022-07-29 11:12:36 -07:00
Sanjay Patel b5a9361c90 [InstCombine] canonicalize zext-and-of-bool compare to narrow and
https://alive2.llvm.org/ce/z/3jYbEH

We should choose one of these forms, and the option that uses
the narrow type allows the motivating example from issue #56294
to reduce. In the best case (no 'not' needed and 'trunc' remains),
this does remove an instruction.

Note that there is what looks like a regression because there
is an existing canonicalization that turns trunc into and+icmp.
That is a long-standing transform, and I'm not sure what effect
reversing it would have.
2022-07-29 12:02:54 -04:00
Matt Devereau a8b726ac65 [AArch64][SVE] Change DupLane128Combine Index comparison to 0
IdxInsert == IdxDupLane is incorrect. IdxInsert is the starting element number,
whereas IdxIndex is the index of a quadword
2022-07-29 14:31:00 +00:00
Simon Pilgrim bc2c4f6c85 [X86] combineAndnp - constant fold ANDNP(C,X) -> AND(~C,X) (REAPPLIED)
If the LHS op has a single use then using the more general AND op is likely to allow commutation, load folding, generic folds etc.

Updated version - original version rG057db2002bb3 didn't correctly account for multiple uses of the mask that might be folding "OR(AND(X,C),AND(Y,~C)) -> OR(AND(X,C),ANDNP(C,Y))" in canonicalizeBitSelect
2022-07-29 15:12:26 +01:00
Nikita Popov 5eaeeed8cb [InstCombine] Avoid ConstantExpr::getFNeg() calls (NFCI)
Instead call the constant folding API, which can fail. For now,
this should be NFC, as we still allow the creation of fneg
constant expressions.
2022-07-29 16:01:46 +02:00