Commit Graph

162738 Commits

Author SHA1 Message Date
Fangrui Song e18b7c7ae0 [llvm-objcopy] Support --decompress-debug-sections when zlib is disabled
When zlib is disabled at build time, the diagnostic `LLVM was not compiled with
LLVM_ENABLE_ZLIB: cannot decompress` for --decompress-debug-sections may be
inaccurate: if zstd is enabled, we should still support zstd decompression.

It's not useful to test zlib and zstd. Just remove the diagnostic and add a new
one before `compression::decompress`.

This fixes compress-debug-sections-zstd.test

Reviewed By: mariusz-sikora-at-amd, jhenderson, phosek

Differential Revision: https://reviews.llvm.org/D135744
2022-10-12 11:52:05 -07:00
Arthur Eubanks f59e1bcc22 [PrintPipeline] Handle CoroConditionalWrapper and add more verification
Add a check (can be disabled via a flag) that the pipeline we generate is actually parsable.
Can be disabled because we don't expect to handle every pass in -print-pipeline-passes.

Fixes #58280.

Reviewed By: ChuanqiXu

Differential Revision: https://reviews.llvm.org/D135703
2022-10-12 09:36:45 -07:00
Mirko Brkusanin 8b8463ef6c [SelectionDAG] Use consistent type sizes for opcode 2022-10-12 17:33:04 +02:00
Sanjay Patel 7b9482df3d [InstCombine] fold sdiv with common shl amount in operands
(X << Z) / (Y << Z) --> X / Y

https://alive2.llvm.org/ce/z/CLKzqT

This requires a surprising "nuw" constraint because we have
to guard against immediate UB via signed-div overflow with
-1 divisor.

This extends 008a89037a and is another transform
derived from issue #58137.
2022-10-12 11:32:15 -04:00
Alexey Bataev d71ad41080 [SLP]Fix insertpoint of the extractellements instructions to avoid reshuffle crash.
Need to set the insertpoint for extractelement to point to the first
instruction in the node to avoid possible crash during external uses
combine  process. Without it we may endup with the incorrect
transformation.

Differential Revision: https://reviews.llvm.org/D135591
2022-10-12 08:18:30 -07:00
Sanjay Patel 008a89037a [InstCombine] fold udiv with common shl amount in operands
(X << Z) / (Y << Z) --> X / Y

https://alive2.llvm.org/ce/z/E5eaxU

This fixes the motivating example from issue #58137,
but it is not the most general transform. We should
probably also convert left-shift in the divisor to
right-shift in the dividend for that, but that exposes
another missed canonicalization for shifts and adds.
2022-10-12 11:12:26 -04:00
Jordan Rupprecht cbae57c0e1 [NFC] Ignore unused var in no-asserts builds 2022-10-12 08:11:10 -07:00
Alexey Bataev 1be3428ea0 [SLP]Fix PR58177: Improve isUndefVector function to avoid extra freeze.
Freeze instruction in some cases makes codegen worse, so need to be very
careful when emitting it. Instead improve analysis in isUndefVector
function to generate mask of unused elements and use it in the analysis.

Differential Revision: https://reviews.llvm.org/D135382
2022-10-12 07:32:54 -07:00
gonglingqin ec2640bf3a [LoongArch] Handle missing CondCodes
Support SETLE/SETEQ and expand SETGE/SETNE/SETGT

Differential Revision: https://reviews.llvm.org/D135511
2022-10-12 21:26:57 +08:00
Sanjay Patel fe97f95036 [InstCombine] propagate "exact" through folds of div
These folds were added recently with:
6b869be810
8da2fa856f
...but they didn't account for the "exact" attribute,
and that can be safely propagated:
https://alive2.llvm.org/ce/z/F_WhnR
https://alive2.llvm.org/ce/z/ft9Cgr
2022-10-12 09:25:05 -04:00
Sanjay Patel d117ee25b8 [InstCombine] add helper function for div+shl folds; NFC
There are at least 2 similar patterns that could be added here,
and the existing fold can be improved because it fails to
propagate "exact".
2022-10-12 09:25:04 -04:00
gonglingqin b1d7a95e4e [LoongArch] Add earlyclobber of destination register to atomic instructions
If the AM* atomic memory access instruction has the same register number as
rd and rj, the execution will trigger an Instruction Non-defined Exception.
If the AM* atomic memory access instruction has the same register number as
rd and rk, the execution result is uncertain.

Reference: https://github.com/loongson/LoongArch-Documentation

Differential Revision: https://reviews.llvm.org/D135641
2022-10-12 21:09:21 +08:00
Florian Hahn c1fe52bfa6
[VPlan] Remove dead recipes before sinking.
optimizeInductions may leave dead recipes which can prevent sinking.
Sinking on the other hand should not introduce new dead recipes, so
clean up dead recipes before sinking.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D133762
2022-10-12 12:49:42 +01:00
Max Kazantsev fbad5fdc03 [NFC] Perform all legality checks for non-trivial unswitch in one function
They have been scattered over the code. For better structuring, perform
them in one place. Potential CT drop is possible because we collect exit
blocks twice, but it's small price to pay for much better code structure.
2022-10-12 18:35:12 +07:00
Luo, Yuanke f885c08034 Don't widen shuffle element with AVX512
Fix crash issue of D129537 and reopen it.

Currently the X86 shuffle lowering would widen the element type for
shuffle if the mask element value is adjacent. For below example

  %t2 = add nsw <16 x i32> %t0, %t1
  %t3 = sub nsw <16 x i32> %t0, %t1
  %t4 = shufflevector <16 x i32> %t2, <16 x i32> %t3,
                      <16 x i32> <i32 16, i32 17, i32 2, i32 3, i32 4,
                       i32 5, i32 6, i32 7, i32 8, i32 9, i32 10,
                       i32 11, i32 12, i32 13, i32 14, i32 15>

  ret <16 x i32> %t4

Compiler would transform the shuffle to
  %t4 = shufflevector <8 x i64> %t2, <8 x i64> %t3,
                      <8 x i64> <i32 8, i32 1, i32 2, i32 3, i32 4,
                                 i32 5, i32 6, i32 7>
This may lose the oppotunity to let ISel select mask instruction when
avx512 is enabled.

This patch is to prevent the tranform when avx512 feature is enabled.
Thank Simon for the idea.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D130830
2022-10-12 19:18:10 +08:00
David Green 1e723b7ab3 Revert "[AArch64] Add support for 128-bit non temporal loads."
This reverts commit 661403b85c as the
custom lowering of loads prevents expanding unaligned loads with
strict-align.
2022-10-12 11:11:32 +01:00
Martin Storsjö a07787c9a5 [AArch64] Exclude instructions after setting the FP from SEH prologues
After setting up the FP, the rest of the prologue doesn't need to
be replayed for unwinding the stack frame.

This allows reverting the functional parts of
2f7fbf8376 (but fixing inconsistent
duplicate setting of HasWinCFI).

Differential Revision: https://reviews.llvm.org/D135686
2022-10-12 12:36:21 +03:00
Cullen Rhodes 388cacb341 [AArch64][SVE] Add instcombine for PTEST_ANY(X=OP(PG,...), X) -> PTEST_ANY(PG, X))
Given this is an OR reduction the two are equivalent and later
optimizations (AArch64InstrInfo::optimizePTestInstr) may rewrite the
sequence to use the flag-setting variant of instruction X, to remove the
PTEST altogether.

Reviewed By: paulwalker-arm, bsmith

Differential Revision: https://reviews.llvm.org/D134946
2022-10-12 09:14:08 +00:00
Cullen Rhodes a17fcb2230 [AArch64][SVE] Fix BRKNS bug in optimizePTestInstr
The BRKNS instruction is unlike the other instructions that set flags
since it has an all active implicit predicate, so the existing

  PTEST(PG, BRKN(PG, A, B)) -> BRKNS(PG, A, B)

in AArch64InstrInfo::optimizePTestInstr is incorrect, however

  PTEST(PTRUE_B(31), BRKN(PG, A, B)) -> BRKNS(PG, A, B)

is correct.

Spotted by @paulwalker-arm in D134946.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D135655
2022-10-12 08:34:41 +00:00
Martin Storsjö afa6bb643f [MC] [Win64EH] Generate ARM64 packed unwind info with signed return addresses
Differential Revision: https://reviews.llvm.org/D135660
2022-10-12 11:07:12 +03:00
Martin Storsjö c43bff64e9 [AArch64] Add support for the SEH opcode for return address signing
This was documented upstream in
https://github.com/MicrosoftDocs/cpp-docs/pull/4202.

Differential Revision: https://reviews.llvm.org/D135276
2022-10-12 11:07:11 +03:00
Max Kazantsev 6bfcac612f [SimpleLoopUnswitch][NFC] Separate legality checks from cost computation
These are semantically two different stages, but were entwined in the
old implementation. Now cost computation does not do legality checks,
and they all are done beforehead.
2022-10-12 13:31:36 +07:00
Max Kazantsev 421728b40c [NFC] Factor out computation of best unswitch cost candidate
Split out a major peice of this method to make code more readable.
2022-10-12 12:36:46 +07:00
chenglin.bi 41f5bbe18b [AArch64][Windows] Check sret attribute also for inreg attribute
Fix the issue: #57684

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D135512
2022-10-12 09:58:50 +08:00
Yeting Kuo 2749b942e9 [RISCV] Add isel patterns for vmacc, vnmsac.
The patch selects VSELECT/VP_MERGE_VL which uses fmadd/fnmsub as true operand
and the adden of the fmadd/fnmsub as false operand.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D135330
2022-10-12 09:19:01 +08:00
Craig Topper 1bdf21d55c [RISCV] Use mask/tail agnostic if tied source is IMPLICIT_DEF regardless of the policy operand.
If the source is implicit_def, the register allocator won't have
any constraint on what register it picks for the destination. This
doesn't give the user much control of what register is being used.

So in my mind that means the only reason to honor the policy operand
is to control what policy is used in vsetvli to maybe avoid a vtype
change. Given the other optimizations we do on the policy field, I
don't think allowing the user this control is reliable.

Therefore, I think we should use agnostic policies if the source is
undef.

This should give better performance on some CPUs for VP intrinsics where
there is no merge operand and the backend adds IMPLICIT_DEF to the instruction.

Differential Revision: https://reviews.llvm.org/D135396
2022-10-11 16:40:16 -07:00
Craig Topper ac9209751a Revert "[DAGCombiner] Fold (mul (sra X, BW-1), Y) -> (neg (and (sra X, BW-1), Y))"
This reverts commit 0148df8157.

Getting a lit test failures on AMDGPU but I can't reproduce it so far.
Reverting to investigate.
2022-10-11 16:30:40 -07:00
Craig Topper 902c1b3c2f [RISCV] Remove unused SchedClass WriteVFCvtFToFV. NFC
This isn't bound to any instruction. From the section comment it
was for single-width F-to-F conversions, but those don't exist.
2022-10-11 16:26:06 -07:00
Craig Topper 0148df8157 [DAGCombiner] Fold (mul (sra X, BW-1), Y) -> (neg (and (sra X, BW-1), Y))
(sra X, BW-1) is either 0 or -1. So the multiply is a conditional
negate of Y.

This pattern shows up when type legalizing wide multiplies involving
a sign extended value.

Fixes PR57549.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D133399
2022-10-11 16:20:55 -07:00
Jessica Paquette 0f1a51e173 [GlobalISel] Allow vectors in redundant or + add combines
We support KnownBits for vectors, so we can enable these.

https://godbolt.org/z/r9a9W4Gj1

Differential Revision: https://reviews.llvm.org/D135719
2022-10-11 15:31:09 -07:00
Fangrui Song 8ef3fd8d59 [LTO] Make local linkage GlobalValue in non-prevailing COMDAT available_externally
For a local linkage GlobalObject in a non-prevailing COMDAT, it remains defined while its
leader has been made available_externally. This violates the COMDAT rule that
its members must be retained or discarded as a unit.

To fix this, update the regular LTO change D34803 to track local linkage
GlobalValues, and port the code to ThinLTO (GlobalAliases are not handled.)

This fixes two problems.

(a) `__cxx_global_var_init` in a non-prevailing COMDAT group used to
linger around (unreferenced, hence benign), and is now correctly discarded.
```
int foo();
inline int v = foo();
```

(b) Fix https://github.com/llvm/llvm-project/issues/58215:
as a size optimization, we place private `__profd_` in a COMDAT with a
`__profc_` key. When FuncImport.cpp makes `__profc_` available_externally due to
a non-prevailing COMDAT, `__profd_` incorrectly remains private. This change
makes the `__profd_` available_externally.

```
cat > c.h <<'eof'
extern void bar();
inline __attribute__((noinline)) void foo() {}
eof
cat > m1.cc <<'eof'
int main() {
  bar();
  foo();
}
eof
cat > m2.cc <<'eof'
__attribute__((noinline)) void bar() {
  foo();
}
eof

clang -O2 -fprofile-generate=./t m1.cc m2.cc -flto -fuse-ld=lld -o t_gen
rm -fr t && ./t_gen && llvm-profdata show -function=foo t/default_*.profraw

clang -O2 -fprofile-generate=./t m1.cc m2.cc -flto=thin -fuse-ld=lld -o t_gen
rm -fr t && ./t_gen && llvm-profdata show -function=foo t/default_*.profraw
```

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D135427
2022-10-11 15:30:07 -07:00
Fangrui Song d8162a7196 [MC] .addrsig_sym: ignore unregistered symbols
.addrsig_sym forces registering the symbol regardless whether it is otherwise
registered. This creates an undefined symbol which is inconvenient/undesired:

* `extern int x; void f() { (void)x; }` has inconsistent behavior whether `x` is emitted as an undefined symbol.
  `-O0 -faddrsig` makes `x` undefined while other -O levels and -fno-addrsig eliminate the symbol.
* In ThinLTO, after a non-prevailing linkonce_odr definition is converted to available_externally, and then a declaration,
  the addrsig code emits a symbol while the symbol is otherwise unseen.

D135427 fixed a bug that a non-prevailing `__cxx_global_var_init` was
incorrectly retained. However, the IR declaration causes an undesired
`.addrsig_sym __cxx_global_var_init`. This can be addressed in a way similar
to D101512 (`isTransitiveUsedByMetadataOnly`) but the increased
`OutStreamer->emitAddrsigSym(getSymbol(&GV));` complexity makes me nervous.
Just ignoring unregistered symbols circumvents the problem.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D135642
2022-10-11 15:07:14 -07:00
Arthur Eubanks 60e4af7ab8 [CallGraph] Port -print-callgraph-sccs to new pass manager
And remove the legacy opt-specific pass.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D135487
2022-10-11 14:43:16 -07:00
Chris Bieneman 2b2afb2529 [DX] Add analysis and printer for shader flags
This adds infrastructural pieces for an analysis to compute the DXIL
shader flags. In this state the analysis can compute two fairly
straightforward feature flags for use of double-precision floating
point values and the DX 11.1 extended double support.

This patch does conflict with D135190, conflicts will be resolved prior
to merging.

Reviewed By: python3kgae

Differential Revision: https://reviews.llvm.org/D135393

# Conflicts:
#	llvm/lib/Target/DirectX/CMakeLists.txt
#	llvm/lib/Target/DirectX/DirectXTargetMachine.cpp
2022-10-11 14:27:05 -05:00
Abinav Puthan Purayil 3d9f011a9c [AMDGPU] Make the uses_dynamic_stack field in the kernel descriptor and the metadata map specific to code object v5 and later
Unfortunately, we have a broken handling of this in the runtime of rocm
5.3. The runtime is expected to handle this correctly when v5 becomes
the default.

Differential Revision: https://reviews.llvm.org/D134714
2022-10-11 23:28:43 +05:30
Jessica Paquette 036a13065b [GlobalISel] Combine (X op Y) == X --> Y == 0
This matches patterns of the form

```
(X op Y) == X
```

And transforms them to

```
Y == 0
```

where appropriate.

Example: https://godbolt.org/z/hfW811c7W

Differential Revision: https://reviews.llvm.org/D135380
2022-10-11 09:52:48 -07:00
Philip Reames 487695e7c9 [SDAG] Treat DemandedElts argument to isSplatVector as splat for scalable vectors [nfc]
The previous code used a APInt(1, 0) to represent the demanded elts of a scalable vector, and then ignored that argument if type was scalable.  This was inconsistent with the UndefElts parameter which is set to either APInt(1, 0) or APInt(1,1) - that is, implicitly broadcast across all lanes.  Particularly since the undef code relied on the DemandedElts parameter having bitwidth 1 to achieve that result!

This change switches the demanded parameter to APInt(1,1), documents the broadcast semantics, and takes advantage of it to remove one special case for scalable vectors which is no longer required.
2022-10-11 09:49:28 -07:00
Joe Nash 3648fc5b42 [AMDGPU] Make disassembler convertFMAanyK call more generic
Make support more generic to support future instructions.
Currently NFC.

Reviewed By: foad, arsenm

Differential Revision: https://reviews.llvm.org/D135678
2022-10-11 11:22:25 -04:00
Krzysztof Parzyszek cb6804104f [Hexagon] Remove unused function, NFC 2022-10-11 08:05:22 -07:00
Philip Reames ac4f3fff8c [SDAG] Clarify behavior of scalable demanded/undef elts in isSplatValue [nfc]
Update comment, and add an assertion to check property expected by sole (non-test) caller.  Remove tests which appear to have been copied from fixed vector tests, and whose demanded bits don't correspond to the way this interface is otherwise used.
2022-10-11 07:28:34 -07:00
Florian Hahn be611ef7fa
[LoopRotation] Also drop block dispositions.
LoopRotation may also fold basic blocks, so cached block dispositions
also need to be dropped.

Fixes #58291.
2022-10-11 15:25:27 +01:00
Luke Drummond 940fa35ece [NVPTX] Fix a segfault for bitcasted calls with byval params
`getFunctionParamOptimizedAlign` was being passed a null function
argument when getting the callee of a bitcasted function symbol. This is
because `CallBase::getCalledFunction` does not look through bitcasts.

There is already code to handle this case in
`NVPTXTargetLowering::getArgumentAlignment`, which is now hoisted into
an NVPTX util.

The alignment computation now gracefully handles computing alignment of
virtual functions with a check for null.
2022-10-11 15:12:25 +01:00
Sanjay Patel 7ec604a317 [InstCombine] try harder to cancel out mul/div
((Op1 * X) / Y) / Op1 --> X / Y
https://alive2.llvm.org/ce/z/JYxWjA

InstSimplify handles the more basic mul+div pattern with
shared operand, but we don't seem to have any reassociation
folds to handle cases where the common op is further away.

This is a generalization of 9cff4711ac and another
transform derived from issue #58137.
2022-10-11 09:51:51 -04:00
Max Kazantsev 91aa9097ae [NFC] Factor out collection of unswitch candidate to a separate function
Just to make the code more structured and easier to understand.
2022-10-11 19:35:16 +07:00
Max Kazantsev f18979912d [NFC] Refine API in SimpleLoopUnswitch: add missing const notions 2022-10-11 19:35:16 +07:00
Max Kazantsev a8a07890aa [NFC] Refine API: add missing const notion in hasPartialIVCondition 2022-10-11 19:35:16 +07:00
Weining Lu 42b70793a1 Reland "[Clang][LoongArch] Add inline asm support for constraints k/m/ZB/ZC"
Reference: https://gcc.gnu.org/onlinedocs/gccint/Machine-Constraints.html

k: A memory operand whose address is formed by a base register and
(optionally scaled) index register.

m: A memory operand whose address is formed by a base register and
offset that is suitable for use in instructions with the same
addressing mode as st.w and ld.w.

ZB: An address that is held in a general-purpose register. The offset
is zero.

ZC: A memory operand whose address is formed by a base register and
offset that is suitable for use in instructions with the same
addressing mode as ll.w and sc.w.

Note:
The INLINEASM SDNode flags in below tests are updated because the new
introduced enum `Constraint_k` is added before `Constraint_m`.
  llvm/test/CodeGen/AArch64/GlobalISel/irtranslator-inline-asm.ll
  llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-inline-asm.ll
  llvm/test/CodeGen/X86/callbr-asm-kill.mir

This patch passes `ninja check-all` on a X86 machine with all official
targets and the LoongArch target enabled.

Differential Revision: https://reviews.llvm.org/D134638
2022-10-11 19:51:48 +08:00
Dmitry Preobrazhensky 4e62d02db9 [AMDGPU][MC] Correct image_gather4h
Correct encoding of image_gather4h for GFX9; disable this instruction for SI, CI and VI.

Differential Revision: https://reviews.llvm.org/D135605
2022-10-11 14:41:27 +03:00
Martin Storsjö 018ac7847b [AArch64] Add SEH_Nop opcodes for BTI hints
These are harmless for the unwinder - the unwinder doesn't need to
handle them for being able to unwind correctly.

Only add the opcodes when the branch target is in a SEH prologue;
for jumptables e.g. within a function, we shouldn't add any SEH
opcodes.

Differential Revision: https://reviews.llvm.org/D135277
2022-10-11 14:32:01 +03:00
wanglei 64c42a4d70 [LoongArch] Define getSetCCResultType for setting vector setCC type
To avoid trigger "No default SetCC type for vectors!" Assertion.

Differential Revision: https://reviews.llvm.org/D135527
2022-10-11 19:05:14 +08:00
wanglei d1b526fb95 [LoongArch] Add codegen support of GlobalTLSAddress lowering
There are static and dynamic TLS address lowering in DAG stage according
to different TLS models.

TLS address will be lowered to pseudo instruction and then expanded by
the `LoongArch Pre-RA pseudo instruction expansion` pass.

Differential Revision: https://reviews.llvm.org/D134713
2022-10-11 18:10:13 +08:00
Nikita Popov df8264c46a [SimplifyLibCalls] Use helper methods to query attributes (NFC) 2022-10-11 11:41:28 +02:00
Nikita Popov ac47db6aca [Attributes] Return Optional from getAllocSizeArgs() (NFC)
As suggested on D135572, return Optional<> from getAllocSizeArgs()
rather than the peculiar pair(0, 0) sentinel.

The method on Attribute itself does not return Optional, because
the attribute must exist in that case.
2022-10-11 11:05:21 +02:00
Nikita Popov 256976774f [Attributes] Support int attributes with zero value
This regularly comes up as a stumbling stone when adding int
attributes: They currently need to be encoded in a way to avoids
the zero value.

This adds support for zero-value int attributes by a) making the
ctor determine int/enum attribute based on attribute kind, not
whether the value is non-zero and b) switching getRawIntAttr()
to return an Optional, so that it's possible to distinguish a zero
value from non-existence.

Differential Revision: https://reviews.llvm.org/D135572
2022-10-11 10:38:27 +02:00
Nikita Popov 24b1340ff9 [AttrBuilder] Remove unused vscale accessors (NFC)
These accessors are not used. Generally, nowadays it is preferable
to perform queries on AttributeSets/Lists, rather than the
AttrBuilder, which is optimized towards attribute construction now.
2022-10-11 10:33:33 +02:00
Nikita Popov 454722745b [Attributes] Remove AttrBuilder::hasAlignmentAttr() method (NFC)
This was the odd one out, with similar methods not existing for
any other attributes. In the places where it is used, it is best
replaced by AttrBuilder::getAttribute(), which allows us to both
test for presence of the attribute and retrieve its value at the
same time. (To just check for presence, contains() could be used.)
2022-10-11 09:56:37 +02:00
Nikita Popov 7b7d3b20ff [AsmParser] Remove some redundant checks for align attributes
The verifier already has a more general check that the attribute is
legal for the position it is used in.
2022-10-11 09:53:08 +02:00
Nikita Popov 15c1ab25ab [Attributes] Remove special SRet/ByVal attribute handling in C API
Proper construction functions for these have long since been
exposed, and these attributes require a type nowadays, so drop the
old compatibility code.
2022-10-11 09:39:39 +02:00
Nikita Popov 884bb97dca [MustExec][LICM] Handle latch being part of an inner cycle (PR57780)
The algorithm in allLoopPathsLeadToBlock() does not handle the case
where the loop latch is part of the predecessor set correctly: In
this case, we may take the backedge (escaping to a different loop
iteration) and not execute other latch successors. This can happen
if the latch is part of an inner cycle.

Fixes https://github.com/llvm/llvm-project/issues/57780.

Differential Revision: https://reviews.llvm.org/D134279
2022-10-11 09:30:13 +02:00
Daniel Sanders 4a95a64e4a [instcombine] (extelt (inselt Vec, Value, Index), Index) -> Value
When Index is variable but still trivially known to be equal we can use Value
from before the insertion, possibly eliminating the vector.

Reverts a functional change from:
Author: Philip Reames <listmail@philipreames.com>
Date:   Wed Dec 8 12:21:10 2021 -0800

    [instcombine] A couple style tweaks to visitExtractElementInst [nfc]

Thanks to Michele Scandale for identifying the bug

Differential Revision: https://reviews.llvm.org/D135625
2022-10-10 15:41:53 -07:00
Michal Paszkowski 7a3c9a85c5 [SPIRV] Fix call lowering of "anonymous" functions
The patch fixes lowering of anonymous functions, removes file/linkage
info for builtin call demangling, and adds relevant test demonstrating
a fixed problem.

Differential Revision: https://reviews.llvm.org/D135390
2022-10-11 00:06:29 +02:00
Craig Topper 0121b1a4ac Revert "[TargetLowering][RISCV][X86] Support even divisors in expandDIVREMByConstant."
This reverts commit d4facda414.

This has been reported to cause failures. Reverting while I investigate.
2022-10-10 14:53:29 -07:00
David Green deb8f8ab17 [ARM] Add errors for MVE exclusive registers.
These instructions already had errors for operands that could not share
the same register:
  VCMUL, VMULL, VQDMULL.
This extends that to a few others:
  VREV64, VQDMULLqr, VCADD and VHCADD.
Only the i32 types require the error.

Differential Revision: https://reviews.llvm.org/D135560
2022-10-10 22:06:35 +01:00
Joe Nash 8a7d4993b7 [AMDGPU] Fix True16 patterns for cmp on GFX11
These patterns should have a True16 version and a non-true16 version.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D135609
2022-10-10 16:41:06 -04:00
Florian Hahn 4b599fa1ee
[SCEV] Verify block disposition cache.
This extends the existing SCEV verification to catch cache invalidation
issues as in #57837.

The validation logic is similar to the recently added loop disposition
cache validation in bb68b2402d.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D134531
2022-10-10 20:42:19 +01:00
Sanjay Patel baab4aa1ba [VectorCombine] convert scalar fneg with insert/extract to vector fneg
insertelt DestVec, (fneg (extractelt SrcVec, Index)), Index --> shuffle DestVec, (fneg SrcVec), Mask

This is a specialized form of what could be a more general fold for a binop.
It's also possible that fneg is overlooked by SLP in this kind of
insert/extract pattern since it's a unary op.

This shows up in the motivating example from #issue 58139, but it won't solve
it (that probably requires some x86-specific backend changes). There are also
some small enhancements (see TODO comments) that can be done as follow-up
patches.

Differential Revision: https://reviews.llvm.org/D135278
2022-10-10 14:59:56 -04:00
Jordan Rupprecht fb27fd5f88 Revert "[LTO] Make local linkage GlobalValue in non-prevailing COMDAT available_externally"
This reverts commit 4fbe33593c. It causes linking errors, with details provided internally. (Hopefully the author/reviewers will be able to upstream the internal repro).
2022-10-10 11:40:45 -07:00
Craig Topper d4facda414 [TargetLowering][RISCV][X86] Support even divisors in expandDIVREMByConstant.
If the divisor is even, we can first shift the dividend and divisor
right by the number of trailing zeros. Now the divisor is odd and we
can do the original algorithm to calculate a remainder. Then we shift
that remainder left by the number of trailing zeros and add the bits
that were shifted out of the dividend.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D135541
2022-10-10 11:02:22 -07:00
Florian Hahn 4b6bd1c9d5
[LoopSimplifyCFG] Clear SCEV dispositions when removing dead blocks.
When removing loops & blocks we also need to clear the SCEV dispositions
as they may now contain incorrect values.

Fixes #58262.
2022-10-10 18:08:35 +01:00
Florian Hahn 80e49f49e4
[ConstraintElimination] Bail out for GEPs with scalable vectors.
This fixes a crash with scalable vectors, thanks @nikic for spotting
this!
2022-10-10 16:01:20 +01:00
Shubham Narlawar b920407cf5 [LICM] Disable thread-safety checks in single-thread model
If the single-thread model is used, or the
-licm-force-thread-model-single flag is specified, skip checks
related to thread-safety. This means that store promotion for
conditionally executed stores only requires proof of
dereferenceability and writability, but not of thread-safety. For
example, this enables promotion of stores to (non-constant) globals,
as well as captured allocas.

Fixes https://github.com/llvm/llvm-project/issues/50537.

Differential Revision: https://reviews.llvm.org/D130466
2022-10-10 16:51:16 +02:00
Alex Brachet deb82d4a20 Revert "[PGO] Make emitted symbols hidden"
This reverts commit 4ea1a647ff.

This breaks on Darwin which tries to export these symbols
ebb258d3b0/clang/lib/Driver/ToolChains/Darwin.cpp (L1363)

I'll try to reland which that removed and approval from
Apple folks.
2022-10-10 14:37:59 +00:00
Joe Nash ebb258d3b0 [AMDGPU] Make V_SAT_PK_U8_I16 a True16 Instruction
The return type is two u8 packed into a 16 bit VGPR, so this instruction
should be True16.

Reviewed By: dp

Differential Revision: https://reviews.llvm.org/D135478
2022-10-10 10:33:49 -04:00
Sanjay Patel 9cff4711ac [InstCombine] fold udiv with common factor
((X *nuw Y) >> Z) / X --> Y >> Z

https://alive2.llvm.org/ce/z/x3kKnq

This is similar to 6b869be810 / 8da2fa856f, but I have
not found a signed equivalent, so it's just an unsigned
match for now.
2022-10-10 08:12:06 -04:00
Peter Rong 2343ad755d [AArch64] Add index check before lowerInterleavedStore() uses ShuffleVectorInst's mask
This commit fixes https://github.com/llvm/llvm-project/issues/57326.

Currently we would take a Mask out and directly use it by doing
auto Mask = SVI->getShuffleMask();
However, if the mask is undef, this Mask is not initialized. It might be
a vector of -1 or random integers.
This would cause an Out-of-bound read later when trying to find a
StartMask.

This change checks if all indices in the Mask is in the allowed range,
and fixes the out-of-bound accesses.

Differential Revision: https://reviews.llvm.org/D132634
2022-10-10 12:52:31 +01:00
Matt Devereau d48e63074f [AArch64][SVE] Fix AArch64_SVE_VectorCall calling convention
This fixes the case where callees with SVE arguments outside of the z0-z7
range were incorrectly deduced as SVE calling convention functions
2022-10-10 10:25:29 +00:00
Nikita Popov 874c0327e7 [Attributor] Use ConstantFoldLoadFromConst()
When determining the initial value of the object, use the constant
folding API to load a given type at a given offset in the global
initializer. This makes it work for cases where the load doesn't
directly correspond to an aggregate member.

Differential Revision: https://reviews.llvm.org/D135435
2022-10-10 10:17:37 +02:00
LiaoChunyu a835b92e6c [RISCV] Use hasAllWUsers to recover XORI/ORI
reference 0fbe71e91f.

Also add testcase for addi.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D135538
2022-10-10 14:16:50 +08:00
Lang Hames d6c9b3cc34 [ORC] Relax assertions in SimpleRemoteEPCTransport.
Null source/destination pointers are ok for zero-sized messages.
2022-10-09 21:58:10 -07:00
Lang Hames aedeb8d557 [LLJIT] Default to EPCEHFrameRegistrar rather than InProcessEHFrameRegistrar.
Now that ExecutionSession objects alway have ExecutorProcessControl (EPC)
objects attached we can use EPCEHFrameRegistrar by default, rather than
InProcessEHFrameRegistrar. This allows LLJIT to work out-of-the-box with remote
EPCs on platforms that use JITLink, without requiring a custom
ObjectLinkingLayerCreator to override the eh-frame registrar.
2022-10-09 21:58:10 -07:00
Mingming Liu 159fb378f7 [AArch64] Swap 'lsl(val1,small-shmt)' to right hand side for AND(lsl(val1,small-shmt), lsl(val2,large-shmt))
On many aarch64 processors (Cortex A78, Neoverse N1/N2/V1, etc), ADD with LSL shift (shift-amount <= 4) has smaller latency and higher
throughput than ADD with larger shift (shift-amunt > 4). This is at least no-op for the rest of the processors.

Differential Revision: https://reviews.llvm.org/D135208
2022-10-09 17:26:54 -07:00
Florian Hahn fee8f561bd
[ConstraintElimination] Include index type scale.
The current decomposition for GEPs did not correctly handle cases where
GEPs access different source types. Adjust the constraints by including
the indexed type-size as coefficients.

Further generalization to allow GEPs with more than one index is a
needed general follow-up improvement.
2022-10-09 21:53:30 +01:00
luxufan eaf6e2fc33 [DSE] Relax constraint on isGuaranteedLoopInvariant
If the location ptr to be killed is in no loop and the Function does not
have irreducible loops, then we can regard it as loop invariant.

Differential Revision: https://reviews.llvm.org/D135369
2022-10-06 03:01:21 +00:00
Yeting Kuo 7329dc0cc3 [RISCV][NFC] Fix unused variable warning.
Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D135365
2022-10-09 21:39:30 +08:00
gonglingqin 5593d36356 [LoongArch] Expand fptrunc store from f64 to f32
Differential Revision: https://reviews.llvm.org/D135510
2022-10-09 17:55:42 +08:00
Florian Hahn 11a6e64ba7
[ConstraintElim] Move logic to get constraint for solving to helper.
Move common logic shared by callers of getConstraint that use the result
to query the constraint system to a new helper getConstraintForSolving.

This includes common legality checks (i.e. not an equality constraint,
no new variables) and the logic to query the unsigned system if possible
for signed predicates.
2022-10-09 10:44:36 +01:00
Ting Wang bc5e969ca1 [PowerPC] Add vector pair calling convention for AIX
This is AIX part of update after https://reviews.llvm.org/D117225

Fixed the issue that AIX64 with vector pair enabled saw redundant
spill/reload of callee saved vector registers.

Based on original patch by: Kai Luo

Reviewed By: lkail

Differential Revision: https://reviews.llvm.org/D133466
2022-10-09 01:23:18 -04:00
WANG Xuerui 31327c29fb [LoongArch] Don't merge FrameIndex accesses into [F]{LD,ST}X
Otherwise eliminateFrameIndex cannot figure out how to fixup the stack
offset with its stateless logic, because there wouldn't be an immediate
slot for it to trivially write to, and it may not be easy to transform
the surrounding code to make it work.

This fixes a fairly common crash when compiling moderately complex code with
Clang.

Differential Revision: https://reviews.llvm.org/D135251
2022-10-09 13:04:21 +08:00
Tatsuyuki Ishi 5699692dfc [Support] Add fast path for StringRef::find with needle of length 2.
InclusionRewriter on Windows (CRLF line endings) will exercise this in a
hot path. Calling memcmp repeatedly would be highly suboptimal for that
use case, so give it a specialized path.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D133660
2022-10-09 00:58:07 +00:00
Fangrui Song 4fbe33593c [LTO] Make local linkage GlobalValue in non-prevailing COMDAT available_externally
See the updated linkonce_resolution_comdat.ll. For a local linkage GV in a
non-prevailing COMDAT, it remains defined while its leader has been made
available_externally. This violates the COMDAT rule that its members must be
retained or discarded as a unit.

To fix this, update the regular LTO change D34803 to track local linkage
GlobalValues, and port the code to ThinLTO (GlobalAliases are not handled.)

Fix https://github.com/llvm/llvm-project/issues/58215:
as a size optimization, we place private `__profd_` in a COMDAT with a
`__profc_` key. When FuncImport.cpp makes `__profc_` available_externally due to
a non-prevailing COMDAT, `__profd_` incorrectly remains private. This change
makes the `__profd_` available_externally.

```
cat > c.h <<'eof'
extern void bar();
inline __attribute__((noinline)) void foo() {}
eof
cat > m1.cc <<'eof'
#include "c.h"
int main() {
  bar();
  foo();
}
eof
cat > m2.cc <<'eof'
#include "c.h"
__attribute__((noinline)) void bar() {
  foo();
}
eof

clang -O2 -fprofile-generate=./t m1.cc m2.cc -flto -fuse-ld=lld -o t_gen
rm -fr t && ./t_gen && llvm-profdata show -function=foo t/default_*.profraw
# one _Z3foov

clang -O2 -fprofile-generate=./t m1.cc m2.cc -flto=thin -fuse-ld=lld -o t_gen
rm -fr t && ./t_gen && llvm-profdata show -function=foo t/default_*.profraw
# one _Z3foov
```

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D135427
2022-10-08 11:09:43 -07:00
Florian Hahn e0136a62cc
[ConstraintElimination] Support chained GEPs with constant offsets.
Handle the (gep (gep ....), C) case by incrementing the constant
coefficient of the inner GEP, if C is a constant.
2022-10-08 16:59:27 +01:00
Florian Hahn 73950f26f5
[LV] Replace check with assert for reduction resume values (NFC).
At this point, we need to have resume values for all inductions. If not,
this would result in silent mis-compiles.
2022-10-08 16:26:10 +01:00
Florian Hahn be858bda69
[ConstraintElimination] Remove unused function (NFC). 2022-10-08 16:05:56 +01:00
Sanjay Patel eccb9a77c6 [InstCombine] fold exact sdiv to ashr (2nd try)
The 1st attempt failed to updated the test checks as expected.

Original commit message:

sdiv exact X, (1<<ShAmt) --> ashr exact X, ShAmt (if shl is non-negative)

https://alive2.llvm.org/ce/z/kB6VF7

It would probably be better to use ValueTracking to replace this
and the existing transform above it, but the analysis does not
account for the no-wrap properly, and it's not immediately clear
to me how to fix it.
2022-10-08 10:09:44 -04:00
Sanjay Patel 68d4dbc2c1 Revert "[InstCombine] fold exact sdiv to ashr"
This reverts commit fe15290e0c.
The test checks were not updated as expected.
2022-10-08 10:02:03 -04:00
Sanjay Patel fe15290e0c [InstCombine] fold exact sdiv to ashr
sdiv exact X, (1<<ShAmt) --> ashr exact X, ShAmt (if shl is non-negative)

https://alive2.llvm.org/ce/z/kB6VF7

It would probably be better to use ValueTracking to replace this
and the existing transform above it, but the analysis does not
account for the no-wrap properly, and it's not immediately clear
to me how to fix it.
2022-10-08 09:23:46 -04:00
Florian Hahn 9d31d1c214
[ConstraintElimination] Use logic from 3771310eed for queries only.
The logic added in 3771310eed was placed sub-optimally. Applying the
transform in ::getConstraint meant that it would also impact conditions
that are added to the system by the signed <-> unsigned transfer logic.

This meant we failed to add some signed facts to the signed system. To
make sure we still add as many useful facts to the signed/unsigned
systems, move the logic to the point where we query the system.
2022-10-08 11:03:45 +01:00
Freddy Ye 566c277c64 [X86] Remove AVX512VP2INTERSECT from Sapphire Rapids.
For more details, please refer to the latest ISE document: https://www.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html

Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D135509
2022-10-08 14:54:03 +08:00
gonglingqin f4ccb577f8 [LoongArch] Do not assert value type in isFPImmLegal
This patch fixes the failure of llvm/test/CodeGen/Generic/vector.ll and
CodeGen/PowerPC/2007-11-19-VectorSplitting.ll for a LoongArch native build.

Differential Revision: https://reviews.llvm.org/D134798
2022-10-08 14:50:48 +08:00
wanglei 730ee6568c [LoongArch] Set correct encodings for DWARF exception handling
This patch sets correct encodings for DWARF exception handling for
LoongArch.

Differential Revision: https://reviews.llvm.org/D134710
2022-10-08 11:53:48 +08:00
Craig Topper f749b2d9a5 [RISCV] Fix incorrect parenthese placement in comment. NFC 2022-10-07 17:16:38 -07:00
Craig Topper 9f67047cf0 [VP][RISCV] Add vp.smax/smin/umax/umin intrinsics
Differential Revision: https://reviews.llvm.org/D135418
2022-10-07 17:14:31 -07:00
Krzysztof Parzyszek 09d84e0ad8 [Hexagon] Implement helper to get intrinsic for instruction opcode
There are intrinsics for most scalar instructions and almost all HVX
instructions. What's somewhat painful is that there are two intrinsics
for each HVX instruction: one for 64- and one for 128-byte mode.
Instead of checking the current codegen settings every time, this
function would simply return the right intrinsic.
2022-10-07 15:56:06 -07:00
Amaury Séchet 62ea6c5be7
[DAGCombine] Deduplicate addcarry node using commutativity.
The first two parameters of addcarry are commutative. We may face a situation where both variant are present in the DAG, in which case we benefit from using just one.

Depends on D57302 and D33587

Reviewed By: RKSimon, chfast

Differential Revision: https://reviews.llvm.org/D57317
2022-10-08 00:55:14 +02:00
Jessica Paquette 42cb2f8b12 [GlobalISel] Mark mi_match as nodiscard
Typically when you match something, you want to check the result.

Fix a couple warnings in the AMDGPUPostLegalizerCombiner which appear as a
result of this.

Differential Revision: https://reviews.llvm.org/D135491
2022-10-07 15:47:05 -07:00
Arthur Eubanks f3a928e233 [opt] Don't translate legacy -analysis flag to require<analysis>
Tests relying on this should explicitly use -passes='require<analysis>,foo'.
2022-10-07 14:54:34 -07:00
Pavel Chupin 27ef42bec8 Fix warnings in build done by clang-based compiler
Differential Revision: https://reviews.llvm.org/D135230
2022-10-07 14:12:10 -07:00
Artem Belevich 9a01cca660 Add support for CUDA-11.8 and sm_{87,89,90} GPUs.
Differential Revision: https://reviews.llvm.org/D135306
2022-10-07 13:59:28 -07:00
Florian Hahn 13ac102726
[LoopSimplifyCFG] Invalidate SCEV dispositions.
Clear all dispositions if there are any dead blocks (which will get
removed later) and also clear dispositions for removed instructions.

Clearing all dispositions in case there are dead blocks happens first,
which should avoid traversing SCEV use-lists for invalidating
dispositions for individual values.

Fixes #58179.
2022-10-07 21:35:42 +01:00
Matt Arsenault 74ef03d38a AMDGPU: Update SlotIndexes independently of LiveIntervals
Apparently StackColoring depends on SlotIndexes, but not
LiveIntervals. If regalloc fast were manually requested, LiveIntervals
would be dropped before SILowerSGPRSpills but not SlotIndexes.

SILowerSGPRSpills preserved SlotIndexes, but only through
LiveIntervals. As a result, SILowerSGPRSpills was incorrectly
reporting it preserved SlotIndexes. Start updating these directly,
instead of depending on LiveIntervals also being available.
2022-10-07 13:15:15 -07:00
Florian Hahn 19ad1cd5ce
Recommit "[SCEV] Support clearing Block/LoopDispositions for a single value."
This reverts commit 92f698f01f.

The updated version of the patch includes handling for non-SCEVable
types. A test case has been added in ec86e9a99b.
2022-10-07 20:15:44 +01:00
Philip Reames cb66e123c6 Remove PlaceSafepoints pass
This patch was added way back in the beginning of the work which became the statepoint infrastructure. The idea was that safepoints could be inserted late in the optimization pipeline. This is true if the only concern is garbage collection, but this approach turned out to be incompatible with the requirement to also support deoptimization at safepoints.

In theory, this pass would still be quite useful for an AOT compiled language which wants to support garbage collection, but we have no known users, and haven't for over 5 years. Time to remove unused code. If someone wants to use this, restoring it would not be hard. The immediate motivation for removal is that this is one of the last passes remaining which hasn't been ported to the new pass manager and the (straight forward) work to do so is not justified for unused code.

Differential Revision: https://reviews.llvm.org/D135371
2022-10-07 11:51:00 -07:00
Sanjay Patel 3e6767ed5f [InstCombine] propagate 'exact' when converting ashr to lshr
The shift amount is not changing, so if we guaranteed
shifting out zeros before, those bits are still zeros.

https://alive2.llvm.org/ce/z/sokQca
2022-10-07 13:17:19 -04:00
Florian Hahn 92f698f01f
Revert "[SCEV] Support clearing Block/LoopDispositions for a single value."
This reverts commit 9e931439dd.

This commit causes a crash when TSan, e.g. with
https://lab.llvm.org/buildbot/#/builders/70/builds/28309/steps/10/logs/stdio

Reverting while I extract a reproducer and submit a fix.
2022-10-07 17:58:54 +01:00
Ellis Hoag 70fb7bb561 [InstrProf][llvm-profdata] Dump profile correlation data as YAML
Change the behavior of the `llvm-profdata show --debug-info=` command to dump a YAML file when using debug info correlation since it provides more information in a parseable format.

Reviewed By: yozhu, phosek

Differential Revision: https://reviews.llvm.org/D134770
2022-10-07 09:47:25 -07:00
Krzysztof Parzyszek d184045d36 [Hexagon] Formatting changes, NFC 2022-10-07 09:13:51 -07:00
Krzysztof Parzyszek e492cdc358 [Hexagon] Add couple of helper functions in HexagonVectorCombine
1. `length(value/type)`: return the number of elements in the vector
   input,
2. `getHvxTy(elem_type)`: return the HVX vector type with the element
   type provided.

These will help write things more succintly.
2022-10-07 09:10:08 -07:00
Krzysztof Parzyszek 06019b8e55 [Hexagon] Add default parameter to HexagonVectorCombine::getIntTy, NFC 2022-10-07 08:52:19 -07:00
Krzysztof Parzyszek d376b2667a [Hexagon] Make HexagonSubtarget::isHVXVectorType take EVT instead of MVT
EVT can be created for any Type, and so this function can now be used to
check if given Type, as-is, is an HVX type (as opposed to a type that may
be subject to legalization to an HVX type).
2022-10-07 08:42:39 -07:00
Sanjay Patel bdfefac9a4 [InstCombine] refactor sdiv by (negative) power-of-2 folds; NFCI
It's probably better to try harder on this kind of
pattern by using ValueTracking.
2022-10-07 11:35:17 -04:00
Kazu Hirata 7f90597be6 [AMDGPU] Fix a warning
This patch fixes:

  llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp:800:17:
  error: unused variable 'DST_IDX' [-Werror,-Wunused-variable]
2022-10-07 08:27:02 -07:00
Krzysztof Parzyszek 2216d8f6b8 [Hexagon] Replace llvm::Optional with std::optional, NFC 2022-10-07 08:23:39 -07:00
Krzysztof Parzyszek 473210ae90 [Hexagon] Constify member refererence, NFC 2022-10-07 08:23:39 -07:00
Florian Hahn 9e931439dd
[SCEV] Support clearing Block/LoopDispositions for a single value.
Extend forgetBlockAndLoopDisposition to allow clearing information for a
single value. This can be useful when only a single value is changed,
e.g. because the instruction is moved.

We also need to clear the cached values for all SCEV users, because they
may depend on the starting value's disposition.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D134614
2022-10-07 16:07:17 +01:00
Bjorn Pettersson 01e1f32971 [ValueTracking][SimplifyLibCalls] Fix bug in getConstantDataArrayInfo for wchar_t
When SimplifyLibCalls is dealing with wchar_t (e.g. optimizing wcslen)
it uses ValueTracking helpers with a CharSize/ElementSize that isn't
8, but rather 16 or 32 (to match with the size in bits of a wchar_t).

Problem I've seen is that llvm::getConstantDataArrayInfo is taking
both an "ElementSize" argument (basically indicating size of a
char/element in bits) and an "Offset" which afaict is an offset
in the unit "number of elements". Then it also use
stripAndAccumulateConstantOffsets to get a "StartIdx" which afaict
is calculated in bytes. The returned Slice.Length is based on
arithmetics that add/subtract variables that are having different
units (bytes vs elements). Most notably I think the "StartIdx" must
be scaled using the "ElementSize" to get correct results.

The symptom of the above problem was seen in the wcslen-1.ll test
case which miscompiled.

This patch is supposed to resolve the bug by converting between
bytes and elements when needed.

Differential Revision: https://reviews.llvm.org/D135263
2022-10-07 15:29:32 +02:00
Dmitry Preobrazhensky 8f8e4e3b38 [AMDGPU][MC][GFX11] Correct v_fmac_.*_e64_dpp
Differential Revision: https://reviews.llvm.org/D134961
2022-10-07 16:21:55 +03:00
Dmitry Preobrazhensky 1d1c7555e2 [AMDGPU][GFX11][NFC] Refactor VOPD handling in codegen
Differential Revision: https://reviews.llvm.org/D135084
2022-10-07 16:13:05 +03:00
Dmitry Preobrazhensky fd7b0eeaf6 [AMDGPU][MC][GFX11] Add VOPD VGPR bank access validation
Differential Revision: https://reviews.llvm.org/D134960
2022-10-07 15:52:59 +03:00
Jan Sjodin 4627cef113 [OpenMP][OMPIRBuilder] Migrate emitOffloadingArraysArgument from clang
This patch moves the emitOffloadingArraysArgument function and
supporting data structures to OpenMPIRBuilder. This will later be used
in flang as well. The TargetDataInfo class was split up into generic
information and clang-specific data, which remain in clang. Further
migration will be done in in the future.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D134662
2022-10-07 07:03:03 -05:00
zhongyunde 75358f060c [AArch64] Lower multiplication by a constant int to madd
Lower a = b * C -1 into madd
  a) instcombine change b * C -1 --> b * C + (-1)
  b) machine-combine change b * C + (-1) --> madd

Assembler will transform the neg immedate of sub to add, see https://gcc.godbolt.org/z/cTcxePPf4
Fixes AArch64 part of https://github.com/llvm/llvm-project/issues/57255.

Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D134336
2022-10-07 19:33:47 +08:00
Florian Hahn 3771310eed
[ConstraintElimination] Convert to unsigned Pred if possible.
Convert SLE/SLT predicates to unsigned equivalents if both operands are
known to be signed-positive.

https://alive2.llvm.org/ce/z/tBeiZr
2022-10-07 12:27:36 +01:00
Nikita Popov b43a4d0850 [LoopPeeling] Support peeling loops with non-latch exits
Loop peeling currently requires that a) the latch is exiting
b) a branch and c) other exits are unreachable/deopt. This patch
removes all of these limitations, and adds the necessary branch
weight updating support. It essentially works the same way as
before with latch -> exiting terminator and
loop trip count -> per exit trip count.

It's worth noting that there are still other limitations in
profitability heuristics: This patch enables peeling of loops to
make conditions invariant (which is pretty much always highly
profitable if possible), while peeling to make loads dereferenceable
still checks that non-latch exits are unreachable and PGO-based
peeling has even more conditions. Those checks could be relaxed
later if we consider those cases profitable.

The motivation for this change is that loops using iterator adaptors
in Rust often optimize very badly, and end up with a loop phi of the
form phi(true, false) in the final result. Peeling eliminates that
phi and conditions based on it, which enables a lot of follow-on
simplification.

Differential Revision: https://reviews.llvm.org/D134803
2022-10-07 12:35:52 +02:00
Nikita Popov ccf53cae32 [ValueTracking] Remove unused Offset argument in getConstantStringInfo() (NFC) 2022-10-07 11:35:55 +02:00
eopXD dbc681c98e [VP][RISCV] Add vp.roundtozero and its RISC-V support
The scalar instruction of this is `llvm.trunc`. However the naming of
ISD::VP_TRUNC is already taken by `trunc` of the LLVM IR. Naming this as
`vp.ftrunc` would likely cause confusion with `vp.fptrunc`. So adding
`vp.roundtozero` that will look similar to `vp.roundeven`.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D135233
2022-10-07 02:15:23 -07:00
Dmitry Makogon 8307f6c854 [LoopPredication] Insert assumes of conditions of predicated guards
As LoopPredication performs non-equivalent transforms removing some
checks from loops, other passes may not be able to perform transforms
they'd be able to do if the checks were left in loops.

This patch makes LoopPredication insert assumes of the replaced
conditions either after a guard call or in the true block of
widenable condition branch.

Differential Revision: https://reviews.llvm.org/D135354
2022-10-07 16:10:24 +07:00
Nikita Popov 333246b48e Reapply [InstCombine] Switch foldOpIntoPhi() to use InstSimplify
Relative to the previous attempt, this adjusts simplification to
use the correct context instruction: We need to use the terminator
of the incoming block, not the original instruction.

-----

foldOpIntoPhi() currently only folds operations into the phi if all
but one operands constant-fold. The two exceptions to this are freeze
and select, where we allow more general simplification.

This patch makes foldOpIntoPhi() generally simplification based and
removes all the instruction-specific logic. We just try to simplify
the instruction for each operand, and for the (potentially) one
non-simplified operand, we move it into the new block with adjusted
operands.

This fixes https://github.com/llvm/llvm-project/issues/57448, which
was my original motivation for the change.

Differential Revision: https://reviews.llvm.org/D134954
2022-10-07 11:04:19 +02:00
Pierre van Houtryve 36c3833783 [GISel] Add Trunc/Lshr/BuildVector Folding
Similar to the current "Trunc/BuildVector" folding - which folds low element extracts of BuildVectors, folds hi element extracts done using bitshifts.

For D134354

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D135148
2022-10-07 08:44:03 +00:00
Pierre van Houtryve a34977c4d0 [GISel] Handle G_TRUNC in `matchExtractVecEltBuildVec`
Spotted some cases in D134354 where this was an issue.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D135147
2022-10-07 08:37:18 +00:00
Mike Hommey d3b0e745e8 [CodeView] Avoid NULL deref of Scope
Regression from D131400: cross-language LTO causes a crash in the
compiler on the NULL deref of Scope in `isa` call when Rust IR is
involved. Presumably, this might affect other languages too, and
even Rust itself without cross-language LTO when the Rust compiler
switched to LLVM 16.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D134616
2022-10-07 08:34:57 +02:00
Xiang Li 220185552f [DirectX backend] Add analysis to collect DXILResources
Now only DXILTranslateMetadata uses DXILResources, so DXILResourceWrapper is only used by DXILTranslateMetadata.
Once we add lower for createHandle, DXILResourceWrapper will be used in more passes.
Also we can add resource index allocation in DXILResourceWrapper.

Reviewed By: beanz

Differential Revision: https://reviews.llvm.org/D135190
2022-10-06 19:34:29 -07:00
Leonard Chan 69695817f5 [llvm] Clear the ForwardRefDSOLocalEquivalentIDs map
I accidentally cleared ForwardRefDSOLocalEquivalentNames twice instead.

Differential Revision: https://reviews.llvm.org/D135315
2022-10-06 23:13:42 +00:00
Craig Topper 3b20765cf7 [RISCV] Use mask agnostic policy for isel patterns where the merge operand is IMPLICIT_DEF.
I tend to think we should ignore the policy bit in vsetvli insertion
if the tied operand is IMPLICIT_DEF. But that raises questions about
what the policy operand on RVV intrinsics means if you also pass
vundefined().

This change at least fixes some cases. I'll post a separate patch
for vsetvli insertion for discussion.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D135386
2022-10-06 15:44:39 -07:00
Alina Sbirlea 0fbeca0ee6 [SmallVector] Reallocate if assigned memory is right after the current vector, created with capacity 0
Potential solution for
https://github.com/llvm/llvm-project/issues/57324.

Differential Revision: https://reviews.llvm.org/D132512
2022-10-06 15:37:43 -07:00
Philip Reames 79f0413e5e [RISCV] Use branchless form for selects with -1 in either arm
We can lower these as an or with the negative of the condition value. This appears to result in significantly less branch-y code on multiple common idioms (as seen in tests).

Differential Revision: https://reviews.llvm.org/D135316
2022-10-06 15:18:43 -07:00
Philip Reames 027516553d [RISCV] Verify that policy operands only exist on instructions with tied passthru operands
This is a non-trivial property relied upon by D135396. I wrote this to convince myself it was true.

Differential Revision: https://reviews.llvm.org/D135403
2022-10-06 15:18:43 -07:00
Shubham Sandeep Rastogi f491b898c5 Revert "Remove the dependency between lib/DebugInfoDWARF and MC."
This reverts commit d96ade00c3.
2022-10-06 14:58:34 -07:00
Shubham Sandeep Rastogi d96ade00c3 Remove the dependency between lib/DebugInfoDWARF and MC.
This patch had to be reverted because on gcc 7.5.0 we see an error converting from std::unique_ptr<MCRegisterInfo> to Expected<std::unique_ptr<MCRegisterInfo>> as the return type for the function createRegInfo. This has now been fixed.
2022-10-06 14:46:01 -07:00
Philip Reames 04bb32e58a [DAG] Extract helper for (neg x) [nfc]
This is a frequently reoccurring pattern, let's factor it out.

Differential Revision: https://reviews.llvm.org/D135301
2022-10-06 13:23:52 -07:00
Alina Sbirlea b9898e7ed1 Revert "Reapply [InstCombine] Switch foldOpIntoPhi() to use InstSimplify"
This reverts commit e94619b955.
2022-10-06 13:12:24 -07:00
Alexey Bataev 323ed2308a [SLP]Improve/fix CSE analysis of the blocks/instructions.
Added analysis for invariant extractelement instructions and improved
detection of the CSE blocks for generated extractelement instructions.

Differential Revision: https://reviews.llvm.org/D135279
2022-10-06 12:08:48 -07:00
Alex Brachet 4ea1a647ff [PGO] Make emitted symbols hidden
Differential Revision: https://reviews.llvm.org/D135340
2022-10-06 18:28:16 +00:00
Joao Moreira eac3e5c3fb [X86] Do not emit JCC to __x86_indirect_thunk
Clang may optimize conditional tailcall blocks with the following layout:

cmp <condition>
je  tailcall_target
ret

When retpoline is in place, indirect calls are converted into direct calls to a retpoline thunk. When these indirect calls are tail calls, they may be subject to the above described optimization (there is no indirect JCC, but since now the jump is direct it can be made conditional). The above layout is non-ideal for the Linux kernel scenario because the branches into thunks may be patched back into indirect branches during runtime depending on the underlying CPU features, what would not be feasible if the binary is emitted with the optimized layout above.

Thus, prevent clang from emitting this it if CodeModel is Kernel.

Feature request from the respective kernel mailing list: https://lore.kernel.org/llvm/Yv3uI%2FMoJVctmBCh@worktop.programming.kicks-ass.net/

Reviewed By: nickdesaulniers, pengfei

Differential Revision: https://reviews.llvm.org/D134915
2022-10-06 11:09:24 -07:00
Bjorn Pettersson 0db4b1d1a8 [SimplifyLibCalls] Adjust code comment in optimizeStringLength. NFC
The limitation in LibCallSimplifier::optimizeStringLength to only
optimize when the string is an i8 array was changed already in
commit 50ec0b5dce back in 2017.

We still only simplify when 's' points at an array of 'CharSize', so
the comment is still valid in the sense that we do not support
arbitrary array types.

Differential Revision: https://reviews.llvm.org/D135261
2022-10-06 20:00:27 +02:00
Arthur Eubanks ae5733346f Revert "[DSE] Eliminate noop store even through has clobbering between LoadI and StoreI"
This reverts commit cd8f3e7581.

Causes miscompiles, see D132657
2022-10-06 10:36:02 -07:00
Sanjay Patel 8da2fa856f [InstCombine] fold sdiv with hidden common factor
(X * Y) s/ (X << Z) --> Y s/ (1 << Z)

https://alive2.llvm.org/ce/z/yRSddG

issue #58137
2022-10-06 13:11:50 -04:00
Ivan Tetyushkin 0e6c1576e6 [RISCV] Optimization for using compressed beqz and bnez PR#56391
Optimization for using compressed beqz and bnez

If there is pattern
```
br_cc val1 constval eq/neq place
select_cc val1 constval eq/neq trueval falseval
```
and constval does not fit in compressed imm format(6 bit), but fit in
imm format(12 bit), we can replace by non compress sub and compress
c.beqz/c.bneqz:

```
addi val val -constval
c.beqz val place
```

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D132839
2022-10-06 09:33:32 -07:00
Shubham Sandeep Rastogi 870b74d590 Revert "Remove the dependency between lib/DebugInfoDWARF and MC."
This reverts commit 0008990479.
2022-10-06 09:30:46 -07:00
Shubham Sandeep Rastogi 0008990479 Remove the dependency between lib/DebugInfoDWARF and MC.
Differential Revision: https://reviews.llvm.org/D134817
2022-10-06 09:25:57 -07:00
Florian Hahn a7ac0dd0cf
[ConstraintElimination] Generalize AND matching.
Extend more general matching used for chains of ORs to also support
chains of ANDs.
2022-10-06 17:17:38 +01:00
Tyker 1654b22ac0 [ADT] Add support for more formats in APFixedPoint
Prior to this patch FixedPointSemantics and APFixedPoint only support semantics where
the Scale larger or equal to zero and the Width is larger or equal to the Scale.
This patch removes both those requirements while staying API compatible.
2022-10-06 17:55:31 +02:00
Sanjay Patel 6b869be810 [InstCombine] fold udiv with hidden common factor
(X * Y) u/ (X << Z) --> Y u>> Z

https://alive2.llvm.org/ce/z/4G9D_W
2022-10-06 11:35:27 -04:00
Philip Reames d89d45ca9a [RISCV][InsertVSETVLI] Default to MA not MU
This changes the default value used for mask policy from mask undisturbed to mask agnostic. In hardware, there may be a minor preference for ta/ma, but since this is only going to apply to instructions which don't use the mask policy bit, this is functionally mostly a nop. The main value is to make future changes to using MA when legal for masked instructions easier to review by reducing test churn.

The prior code was motivated by a desire to minimize state transitions between masked and unmasked code. This patch achieves the same effect using the demanded field logic (landed in afb45ff), and there are no regressions I spotted in the test diffs. (Given the size, I have only been able to skim.) I do want to call out that regressions are possible here; the demanded analysis only works on a block local scope right now, so e.g. a tight loop mixing masked and unmasked computation might see an extra vsetvli or two.

Differential Revision: https://reviews.llvm.org/D133803
2022-10-06 07:59:39 -07:00
Florian Hahn 8e3e96298f
[ConstraintElimination] Order cmps for signed <-> unsigned transfer first.
Make sure conditions with constant operands come before conditions
without constant operands. This increases the effectiveness of the
current signed <-> unsigned fact transfer logic.
2022-10-06 15:56:25 +01:00
Philip Reames afb45ffce7 [RISCV][InsertVSETVLI] Treat mask policy as undemanded if usesMaskPolicy is false
Differential Revision: https://reviews.llvm.org/D135327
2022-10-06 07:20:16 -07:00
Florian Hahn 349375d093
[ConstraintElimination] Generalize OR matching.
Extend OR handling to traverse chains of ORs.
2022-10-06 11:56:22 +01:00
Nikita Popov 028874dd61 [Local] Fix unused variable warnings (NFC) 2022-10-06 10:30:59 +02:00
Nikita Popov 3d0b5f019e [AA] Remove unused template argument from AAResultBase (NFC)
After D94363, there is no more need to use CRTP here.
2022-10-06 10:21:17 +02:00
Nikita Popov c5bf452022 [AA] Pass AAResults through AAQueryInfo
Currently, AAResultBase (from which alias analysis providers inherit)
stores a reference back to the AAResults aggregation it is part of,
so it can perform recursive alias analysis queries via
getBestAAResults().

This patch removes the back-reference from AAResultBase to AAResults,
and instead passes the used aggregation through the AAQueryInfo.
This can be used to perform recursive AA queries using the full
aggregation.

Differential Revision: https://reviews.llvm.org/D94363
2022-10-06 10:10:19 +02:00
Nikita Popov 6053b37e45 [AA] Thread AAQI through getModRefBehavior() (NFC)
This is in preparation for D94363, as we will need AAQI to
perform the recursive call to the function variant.
2022-10-06 09:57:42 +02:00
Pierre van Houtryve 3ec0085c3f [DAG] Update `isKnownNeverNaN` for `FMA/FMAD`
We can still get a NaN even if none of the operands are NaN,
e.g. from +inf/-inf. D50804 didn't catch that.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D134854
2022-10-06 06:52:36 +00:00
Florian Hahn 7449570ff7
[ConstraintElimination] Use ConstraintTy::IsSigned instead of Predicate.
This should be NFC and ensure the sign of the constraint is used
consistently in the future.
2022-10-06 07:51:49 +01:00
Pierre van Houtryve bb71079e30 [AMDGPU][GISel] Add missing V2S16 BUILD_VECTOR_TRUNC legalization
Previously we would be unable to legalize V2S16 BUILD_VECTOR_TRUNC on GFX8 & below as the custom legalization was missing.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D135149
2022-10-06 06:48:53 +00:00
Ilia Diachkov 25ee36c6b1 [SPIRV] read kernel arg attributes from fuction/module metadata
The patch introduces reading the attributes of kernel arguments both from
function-attached and module-level metadata, during kernel arguments lowering.
Two tests are added to show the improvement.

Differential Revision: https://reviews.llvm.org/D135106

Co-authored-by: Aleksandr Bezzubikov <zuban32s@gmail.com>
Co-authored-by: Michal Paszkowski <michal.paszkowski@outlook.com>
Co-authored-by: Andrey Tretyakov <andrey.tretyakov@mail.com>
Co-authored-by: Konrad Trifunovic <konrad.trifunovic@intel.com>
2022-10-06 04:43:52 +03:00
Carl Ritson c316332e17 [Sink] Allow sinking of invariant loads across critical edges
Invariant loads can always be sunk.

Reviewed By: foad, arsenm

Differential Revision: https://reviews.llvm.org/D135133
2022-10-06 09:21:12 +09:00
Sami Tolvanen 43f4c215a1 [AArch64][KCFI] Define Size for KCFI_CHECK
Specify the correct size for the KCFI_CHECK pseudo
instruction, which is lowered into six 4-byte instructions in
AArch64AsmPrinter::LowerKCFI_CHECK.

Link: https://github.com/ClangBuiltLinux/linux/issues/1730
2022-10-05 22:24:50 +00:00
Quentin Colombet 6e440ee2aa [RISCV][ISel] Finally fix the UBSan error
Forgot another SDValue check and a boolean initialization.
2022-10-05 21:43:09 +00:00
Quentin Colombet 6bbe7d376e [RISCV][ISel] Attempt to fix UBSan error
Explicitly check an SDValue with the invalid SDValue.

UBSan reports:
runtime error: load of value 36, which is not a valid value for type
'bool'

https://lab.llvm.org/buildbot/#/builders/85/builds/11231
2022-10-05 20:59:28 +00:00
Quentin Colombet c5c2de287e [RISCV][ISel] Fold extensions when all the users can consume them
This patch allows the combines that fold extensions in binary operations
to have more than one use.
The approach here is pretty conservative: if all the users of an
extension can fold the extension, then the folding is done, otherwise we
don't fold.
This is the first step towards avoiding the one-use limitation.

As a result, we make a decision to fold/don't fold for a web of
instructions. An instruction is part of the web of instructions as soon
as it consumes an extension that needs to be folded for all its users.

Because of how SDISel works a web of instructions can be visited over
and over. More precisely, if the folding happens, it happens for the
whole web and that's the end of it, but if the folding fails, the whole
web may be revisited when another member of the web is visited.

To avoid a compile time explosion in pathological cases, we bail out
earlier for webs that are bigger than a given threshold (arbitrarily set
at 18 for now.) This size can be changed using
`--riscv-lower-ext-max-web-size=<maxWebSize>`.

At the current time, I didn't see a better scheme for that. Assuming we
want to stick with doing that in SDISel.

Differential Revision: https://reviews.llvm.org/D133739
2022-10-05 20:49:21 +00:00
David Green 03b145480d [AArch64] Add tablegen patterns for bf16 trn/zip/uzp.
This adds some missing tablegen patterns to handle trn1/trn2/zip1/zip2/uzp1/uzp2,
similar to the Arm handling in 5e1a9d319d, but via tablegen
patterns for the AArch64 backend.
2022-10-05 21:47:36 +01:00
Leonard Chan 34004d2d03 Fix build failures from 4f6477a615. 2022-10-05 18:54:28 +00:00
Leonard Chan 4f6477a615 [llvm][NFC] Consolidate equivalent function type parsing code into
single function

Differential Revision: https://reviews.llvm.org/D135296
2022-10-05 18:41:19 +00:00
Quentin Colombet 4852f26acd [RISCV][ISel] Refactor the formation of VW operations
This patch centralizes all the combines of add|sub|mul with extended
operands in one "framework".
The rationale for this change is to offer a one-stop-shop for all these
transformations so that, in the future, it is easier to make combine
decisions for a web of instructions (i.e., instructions connected
through s|zext operands).

Technically this patch is not NFC because the new version is more
powerful than the previous version.
In particular, it diverges in two cases:
- VWMULSU can now also be produced from `mul(splat, zext)`, whereas
  previously only `mul(sext, splat)` were supported when `splat`s were
  involved. (As demonstrated in rvv/fixed-vectors-vwmulsu.ll)
- VWSUB(U) can now also be produced from `sub(splat, ext)`, whereas
  previously only `sub(ext, splat)` were supported when `splat`s were
  involved. (As demonstrated in rvv/fixed-vectors-vwsub.ll)

If we wanted, we could block these transformations to make this
patch really NFC. For instance, we could do something similar to
`AllowSplatInVW_W`, which prevents the combines to form vw(add|sub)(u)_w
when the RHS is a splat.

Regarding the "framework" itself, the bulk of the patch is some
boilderplate code that abstracts away the actual extensions that are
present in the DAG. This allows us to handle `vwadd_w(ext a, b)` as if
it was a regular `add(ext a, ext b)`. Since the node `ext b` doesn't
actually exist in the DAG, we have a bunch of methods (all in the
NodeExtensionHelper class) that fake all that for us.

The other half of the change is around `CombineToTry` and
`CombineResult`. These helper structures respectively:
- Represent the kind of combines that can be applied to a node, and
- Store what needs to happen to do that combine.

This can be viewed as a two step approach:
- First, check if a pattern applies, and
- Second apply it.

The checks and the materialization of the combines are decoupled so that
in the future we can perform several checks and do all the related
applies in one go.

Differential Revision: https://reviews.llvm.org/D134703
2022-10-05 17:43:48 +00:00
Ellis Hoag 549773f9e9 [Dwarf] Reference the correct CU when inlining
Sometimes when a function is inlined into a different CU, `llvm-dwarfdump --verify` would find an inlined subroutine with an invalid abstract origin. This is because `DwarfUnit::addDIEEntry()` will incorrectly assume the inlined subroutine and the abstract origin are from the same CU if it can't find the CU for the inlined subroutine.

In the added test, the inlined subroutine for `bar()` is created before the CU for `B.swift` is created, so it tries to point to `goo()` in the wrong CU. Interestingly, if we swap the order of the two functions then we don't see a crash since the module for `goo()` is created first.

The fix is to give a parent DIE to `ScopeDIE` before calling `addDIEEntry()` so that its CU can be found. Luckily, `constructInlinedScopeDIE()` is only called once so we can pass it the DIE of the scope's parent and give it a child just after it's created.

`constructInlinedScopeDIE()` should always return a DIE, so assert that it is not null.

Reviewed By: aprantl

Differential Revision: https://reviews.llvm.org/D135114
2022-10-05 09:19:12 -07:00
Alexandre Ganea 1c25ce1738 [Orc] Fix the SharedMemoryMapper dtor
As briefly discussed on https://reviews.llvm.org/rG1134d3a03facccd75efc5385ba46918bef94fcb6, fix the unintended copy while iterating on Reservations and add a mutex guard, to be symmetric with other usages of Reservations.

Differential revision: https://reviews.llvm.org/D134212
2022-10-05 12:16:54 -04:00
Florian Hahn 9aa004a04c
[ConstraintElimination] Convert NewIndices to vector and rename (NFCI).
The callers of getConstraint only require a list of new variables.
Update the naming and types to make this clearer.
2022-10-05 16:25:00 +01:00
Joe Nash 203d0b0ee1 [AMDGPU] Fix V_CMP_CLASS_F16_t16_e64 src1 type.
For V_CMP_CLASS_F16_t16_e64 and V_CMPX_CLASS_F16_t16_e64,
https://reviews.llvm.org/D133723 changed the value type of src1 from i32 to i16.
These src1 operands are 16 bits, therefore need to be encoded as true16
operands. So the _e32 type was correctly set to VGPR_32_Lo128.
In _e64 form the operand class went from
VSrc_b32 to VSrc_b16. For some reason, we cannot encode inline literals for
VSrc_b16, see 5f5f566b26. In this phase of
the true16 implementation, VSrc_b16 and VSrc_b32 are still similar,
except from that quirk of inlines. So set the operand class to regain
that function.

Reviewed By: dp, arsenm

Differential Revision: https://reviews.llvm.org/D134897
2022-10-05 11:15:40 -04:00
Michael Maitland fbf8aa1b49 [RISCV][CodeGen] Add Scheduling for vset{i}vl{i} instruction
Differential Revision: https://reviews.llvm.org/D135188
2022-10-05 07:54:22 -07:00
Juan Manuel MARTINEZ CAAMAÑO fa2b1cb8c9 [NFC][AMDGPULowerKernelAttributes] Factorize repeated code into function
Differential Revision: https://reviews.llvm.org/D135266
2022-10-05 09:26:39 -05:00
Anton Sidorenko 3e97e94237 [NFC][RISCV] Move getSEWLMULRatio function to header
More uses of getSEWLMULRatio will be added in D130895.

Reviewed By: craig.topper, frasercrmck

Differential Revision: https://reviews.llvm.org/D135086
2022-10-05 15:10:53 +01:00
Johannes Doerfert e18736149c [Attributor] Teach AAPointerInfo about atomic cmxchg and rmw
The atomic operations behave similar to a store except that we don't
know the new value and we read the result first.
2022-10-05 06:48:00 -07:00
Dmitry Preobrazhensky f4b1cfa1cb [AMDGPU][MC][GFX11] Correct e64_dpp variants of v_movreld and v_movrelsd
Differential Revision: https://reviews.llvm.org/D135079
2022-10-05 16:47:18 +03:00
Kerry McLaughlin f7f44f018f [AArch64][SME] Set up a lazy-save/restore around calls.
Setting up a lazy-save mechanism around calls is done during SelectionDAG
because calls to intrinsics may be expanded into an actual function call
(e.g. calls to @llvm.cos()), and maintaining an allowed-list in the SMEABI
pass is not feasible.

The approach for conditionally restoring the lazy-save based on the runtime
value of TPIDR2_EL0 is similar to how we handle conditional smstart/smstop.
We create a pseudo-node which gets expanded into a conditional branch and
expands to a call to __arm_tpidr2_restore(%tpidr2_object_ptr).

The lazy-save buffer and TPIDR2 block are only allocated once at the start
of the function. For each call, the TPIDR2 block is initialised, and at
the end of the call, a pseudo node (RestoreZA) is planted.

Patch by Sander de Smalen.

Differential Revision: https://reviews.llvm.org/D133900
2022-10-05 14:36:53 +01:00
Johannes Doerfert 93e51fa444 [Attributor] AAPointerInfo can model non-escaping call uses
If a call base use will not capture a pointer we can approximate the
effects. This is important especially for readnone/only uses. Even
may-write uses are not too bad with reachability in place. Capturing
is the problem as we loose track of update sides.
2022-10-05 06:29:14 -07:00
Johannes Doerfert 477e8e10f0 [Attributor] Teach AAPointerInfo to look into aggregates
If we have a constant aggregate, e.g., as an initializer, we usually
failed to extract the proper value/type from it. This patch provides the
size and offset information necessary to extract the right part of the
constant.
2022-10-05 06:19:47 -07:00
Nikita Popov 5fa14ee835 [MemCpyOpt] Don't hoist above producer of pointer operand
This was already handled correctly below, but not checked for the
original store pointer operand. Encountered when converting tests
to opaque pointers, where the intermediate bitcast goes away.
2022-10-05 14:52:33 +02:00
David Stuttard d1d7d2235c [AggressiveInstCombine] Fix cases where non-opaque pointers are used
In the case of non-opaque pointers, when combining consecutive loads,
need to bitcast the pointer source to the combined type size, otherwise
asserts are triggered.

Differential Revision: https://reviews.llvm.org/D135249
2022-10-05 13:42:46 +01:00
Nikita Popov e94619b955 Reapply [InstCombine] Switch foldOpIntoPhi() to use InstSimplify
The infinite loop seen on buildbots should be fixed by
11897708c0 (assuming there are not
multiple infinite combine loops...)

-----

foldOpIntoPhi() currently only folds operations into the phi if all
but one operands constant-fold. The two exceptions to this are freeze
and select, where we allow more general simplification.

This patch makes foldOpIntoPhi() generally simplification based and
removes all the instruction-specific logic. We just try to simplify
the instruction for each operand, and for the (potentially) one
non-simplified operand, we move it into the new block with adjusted
operands.

This fixes https://github.com/llvm/llvm-project/issues/57448, which
was my original motivation for the change.

Differential Revision: https://reviews.llvm.org/D134954
2022-10-05 14:00:19 +02:00
Nikita Popov 11897708c0 [InstCombine] Directly replace instr in foldIntegerTypedPHI() (NFCI)
Rather than inserting a ptrtoint + inttoptr pair, directly replace
the inttoptr with the new phi node. This ensures that no other
transform can undo it before the pair gets folded away.

This avoids the infinite loop when combined with D134954.

This is NFCI in the sense that it shouldn't make a difference, but
could due to different worklist order.
2022-10-05 13:28:23 +02:00
David Sherwood f0f474dfd0 [AArch64][SME] Add codegen pass to handle ZA state in arm_new_za functions.
The new pass implements the following:

* Inserts code at the start of an arm_new_za function to
    commit a lazy-save when the lazy-save mechanism is active.
* Adds a smstart intrinsic at the start of the function.
* Adds a smstop intrinsic at the end of the function.

Patch co-authored by kmclaughlin.

Differential Revision: https://reviews.llvm.org/D133896
2022-10-05 09:43:57 +00:00
Fraser Cormack 08497a785b [VP] Fix unused variable in release configurations 2022-10-05 10:33:07 +01:00
Florian Hahn 469f0fc6a6
[SimpleLoopUnswitch] Clear dispos in deleteDeadBlocksFromLoop.
SimpleLoopUnswitch may remove blocks from loops. Clear block and loop
dispositions in that case, to clean up invalid entries in the cache.

Fixes #58158.

Fixes #58159.
2022-10-05 10:28:15 +01:00
David Sherwood 991a36da1b [AArch64][SME] Prevent SVE object address calculations between smstop and call
This patch introduces a new AArch64 ISD node (OBSCURE_COPY) that can
be used when we want to prevent SVE object address calculations
from being rematerialised between a smstop/smstart and a call.
At the moment we use COPY to copy the frame index to a register,
which leads to problems because the "simple register coalescing"
pass understands the COPY instruction and attempts to rematerialise
an address calculation with 'addvl' between an smstop and a call.
When in streaming mode the 'addvl' instruction may have different
behaviour because the streaming SVE vector length is not guaranteed
to equal the normal SVE vector length.

The new ISD opcode OBSCURE_COPY gets lowered to a new pseudo
instruction also called OBSCURE_COPY. This ensures it cannot be
rematerialised and we expand this into a simple move very late in
the machine instruction pipeline.

A new test is added here:

CodeGen/AArch64/sme-streaming-interface.ll

Differential Revision: https://reviews.llvm.org/D134940
2022-10-05 08:11:16 +00:00
Martin Storsjö 2f7fbf8376 [AArch64] Add missing SEH_Nop when aligning the stack
This makes sure that the instructions of the prologue matches the
SEH opcodes.

Also remove a couple redundant cases of setting HasWinCFI; it was
already set unconditionally after the conditional cases.

Differential Revision: https://reviews.llvm.org/D135101
2022-10-05 11:00:36 +03:00
Fraser Cormack a3a9b0743e [VP][NFC] Remove \brief commands from doxygen comments
Following a precedent set in D46861.
2022-10-05 08:08:30 +01:00
Fraser Cormack 3362e2d57f [VP] Add IR expansion for vp.icmp and vp.fcmp
These intrinsics are simply expanded to regular icmp/fcmp instructions.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D121594
2022-10-05 08:07:39 +01:00
Serguei Katkov d330731f94 [RegAllocFast] Clean-up. Remove redundant operations. NFC.
Reviewed By: MatzeB, arsenm
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D109213
2022-10-05 11:38:54 +07:00
Johannes Doerfert a9557115b4 [Attributor] Qualify variables to avoid clashes in the future 2022-10-04 19:43:04 -07:00
Yeting Kuo 74a130af97 [RISCV] Add isel patterns for vfmacc, vfnmacc, vfmsac and vfnmsac.
The patch selects VSELECT_VL/VP_MERGE_VL that uses VF(N)M(ACC|SAC) as its
true operand and the adden of the true operand as its false operand.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D135080
2022-10-05 09:56:43 +08:00
Stefan Pintilie 30d639180f [PowerPC] Fix the register allocation hints for ACC registers.
The allocation hints for copies of ACC registers assumed that we would only be
copying between VSRp and UACC registers. In reality it is also possible to copy
between UACC and ACC registers.

This patch adds a new case for the ACC copy to fix that issue.
Note that the test case added with this patch will hit an assert without the
fix.

Reviewed By: lei, amyk

Differential Revision: https://reviews.llvm.org/D134501
2022-10-04 20:30:16 -05:00
Stella Laurenzo e28b15b572 Add APFloat and MLIR type support for fp8 (e5m2).
(Re-Apply with fixes to clang MicrosoftMangle.cpp)

This is a first step towards high level representation for fp8 types
that have been built in to hardware with near term roadmaps. Like the
BFLOAT16 type, the family of fp8 types are inspired by IEEE-754 binary
floating point formats but, due to the size limits, have been tweaked in
various ways in order to maximally use the range/precision in various
scenarios. The list of variants is small/finite and bounded by real
hardware.

This patch introduces the E5M2 FP8 format as proposed by Nvidia, ARM,
and Intel in the paper: https://arxiv.org/pdf/2209.05433.pdf

As the more conformant of the two implemented datatypes, we are plumbing
it through LLVM's APFloat type and MLIR's type system first as a
template. It will be followed by the range optimized E4M3 FP8 format
described in the paper. Since that format deviates further from the
IEEE-754 norms, it may require more debate and implementation
complexity.

Given that we see two parts of the FP8 implementation space represented
by these cases, we are recommending naming of:

* `F8M<N>` : For FP8 types that can be conceived of as following the
  same rules as FP16 but with a smaller number of mantissa/exponent
  bits. Including the number of mantissa bits in the type name is enough
  to fully specify the type. This naming scheme is used to represent
  the E5M2 type described in the paper.
* `F8M<N>F` : For FP8 types such as E4M3 which only support finite
  values.

The first of these (this patch) seems fairly non-controversial. The
second is previewed here to illustrate options for extending to the
other known variant (but can be discussed in detail in the patch
which implements it).

Many conversations about these types focus on the Machine-Learning
ecosystem where they are used to represent mixed-datatype computations
at a high level. At that level (which is why we also expose them in
MLIR), it is important to retain the actual type definition so that when
lowering to actual kernels or target specific code, the correct
promotions, casts and rescalings can be done as needed. We expect that
most LLVM backends will only experience these types as opaque `I8`
values that are applicable to some instructions.

MLIR does not make it particularly easy to add new floating point types
(i.e. the FloatType hierarchy is not open). Given the need to fully
model FloatTypes and make them interop with tooling, such types will
always be "heavy-weight" and it is not expected that a highly open type
system will be particularly helpful. There are also a bounded number of
floating point types in use for current and upcoming hardware, and we
can just implement them like this (perhaps looking for some cosmetic
ways to reduce the number of places that need to change). Creating a
more generic mechanism for extending floating point types seems like it
wouldn't be worth it and we should just deal with defining them one by
one on an as-needed basis when real hardware implements a new scheme.
Hopefully, with some additional production use and complete software
stacks, hardware makers will converge on a set of such types that is not
terribly divergent at the level that the compiler cares about.

(I cleaned up some old formatting and sorted some items for this case:
If we converge on landing this in some form, I will NFC commit format
only changes as a separate commit)

Differential Revision: https://reviews.llvm.org/D133823
2022-10-04 17:18:17 -07:00
Sam Clegg be758cd4a3 [WebAssembly][MC] Fix missing `else` after `return` due to type checker bug
Once we are in the `Unreachable` we want to disable type checking, but
we were unconditionally returning `true` here which means we encountered
and error.  Instead we unconditionally return false to signal no error.

Fixes: https://github.com/llvm/llvm-project/issues/56935

Differential Revision: https://reviews.llvm.org/D135195
2022-10-04 16:43:22 -07:00
Amara Emerson c5cebf78bd [GlobalISel] Add computeNumSignBits() support for compares.
Doing so allows G_SEXT_INREG to be combined away for many vector cases.

Differential Revision: https://reviews.llvm.org/D135168
2022-10-05 00:28:08 +01:00
Amara Emerson 8055aa8e8a [AArch64][GlobalISel] Make vector G_SEXT_INREG legal and allow combining.
As a result of making these legal, and tweaking the combine to allow vectors,
we generate vector G_SEXT_INREG during legalization.

The reason we want to make these legal in the first place is to allow for
more combine opportunities. Once those have been done, we can just lower them
back to shifts in the post-legalizer lowering.

This needs to be one commit otherwise we start causing tests to fail due to
incomplete support for selection etc.
2022-10-05 00:28:08 +01:00
Craig Topper ece4bb5ab8 [RISCV] Teach SExtWRemoval to recognize sign extended values that come from arguments.
This information is not preserved in MIR today. So this patch adds
information to RISCVMachineFunctionInfo when the vreg is created for
the argument.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D134621
2022-10-04 15:39:10 -07:00
Gulfem Savrun Yeniceri d7592bbb03 Revert "Reapply [InstCombine] Switch foldOpIntoPhi() to use InstSimplify"
This reverts commit e1dd2cd063 because
the original commit b20e34b39f had
a dramatic increase in the build time of RTfuzzer, which caused Fuchsia
Clang toolchain builders to timeout:
https://luci-milo.appspot.com/ui/p/fuchsia/builders/toolchain.ci/clang-linux-x64/b8801248587754572961/overview
2022-10-04 20:57:34 +00:00
Florian Hahn 4f827318e3
[LoopVersioning,LLE] Clear LoopAccessInfoManager after making changes.
Loop versioning changes the control-flow, which may impact SCEVs cached
by for other loops in LoopAccessInfoManager. Clear the manager after
making changes.

Fixes #57825.

Depends on D134609.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D134611
2022-10-04 21:35:42 +01:00
David Green cf43154bc3 [AArch64] Ensure condition (SUBS) has no uses of value in performCONDCombine
performCONDCombine removes and 0xff in patterns of
SUBS (and (add(..), 0xff), C)
under certain complex conditions. It doesn't come up often,
but in the lowering of usub.sat where the SUBS is both used as a
condition and as a value, the And is removed where it would only be
valid for the condition.

Fixes #58109.

Differential Revision: https://reviews.llvm.org/D135043
2022-10-04 21:18:30 +01:00
jeff cebec42089 [DAGCombiner] [AMDGPU] Allow vector loads in MatchLoadCombine
Since SROA chooses promotion based on reaching load / stores of allocas, we may run into scenarios in which we alloca a vector, but promote it to an integer. The result of which is the familiar LoadCombine pattern (i.e. ZEXT, SHL, OR). However, instead of coming directly from distinct loads, the elements to be combined are coming from ExtractVectorElements which stem from a shared load.

This patch identifies such a pattern and combines it into a load.

Change-Id: I0bc06588f11e88a0a975cde1fd71e9143e6c42dd
2022-10-04 12:16:00 -07:00
Ram-NK a58b6acf1f [NFC][LoopInterchange] Clean up of irrelevent dependency checking with
isOuterMostDepPositive()

The function isOuterMostDepPositive() is checked after negative dependence
vectors are normalized to be non-negative, so there will not be any negative
dependency ('>' as the outermost non-equal sign) after normalization. And
therefore the check in isOuterMostDepPositive() is irrelevent and redundant.

Reviewed By: congzhe

Differential Revision: https://reviews.llvm.org/D132982
2022-10-04 14:54:08 -04:00
Eli Friedman 76ccd1db73 [AArch64] Don't form paired loads from epilogue operations on Windows
AArch64LoadStoreOptimizer has a bunch of different guards to avoid
corrupting Windows SEH prologues/epilogues, but apparently we missed the
case of merging two instructions where the first instruction isn't part
of the epilogue, but the second instruction is.

Fixes issue discovered at https://reviews.llvm.org/D130049#3704064

Differential Revision: https://reviews.llvm.org/D134992
2022-10-04 11:41:59 -07:00
Chris Bieneman 618e5006a5 [DirectX] Generate `dx.resources` metadata entry
This code adds initial support for generating the HLSL resources
metadata entries. It has a lot of `FIXMEs` laying around because there
is a lot more work to do here, but this lays a solid groundwork and can
accurately handle some trivial cases.

I've filed a swath of issues covering the deficiencies here and left the
issues in comments so that we can easily follow them.

One big change to make sooner rather than later is to move some of this
code into a new libLLVMFrontendHLSL so that we can share it with the
Clang CodeGen layer.

Reviewed By: python3kgae

Differential Revision: https://reviews.llvm.org/D134682
2022-10-04 12:36:39 -05:00
Sanjay Patel 17dcbd8165 [SDAG] don't hoist div/rem through a select with neutral constant
This bug was introduced with D134966.
2022-10-04 13:15:01 -04:00
Daniel Rodríguez Troitiño 86bf43d2ab [ObjectYAML] Support for basic data in code.
This is a split of D134250.

Supports for parsing and dumping the LC_DATA_IN_CODE contents (as binary
data).

This allows more complete testing of llvm-objdump in D133974.

Reviewed By: Higuoxing

Differential Revision: https://reviews.llvm.org/D134569
2022-10-04 09:36:27 -07:00
Craig Topper a38fb90b19 [RISCV] Refactor and improve eliminateFrameIndex.
There are few changes mixed in here.

-Try to reuse the destination register from ADDI instead of always
creating a virtual register. This way we lean on the register
scavenger in fewer case.
-Explicitly reuse the primary virtual register when possible. There's
still a case where both getVLENFactoredAmount and handling large
fixed offsets can both create a secondary virtual register.
-Combine similar BuildMI calls by manipulating the Register variables.

There are still a couple early outs for ADDI, but overall I tried to
arrange the code into steps.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D135009
2022-10-04 09:32:27 -07:00
Craig Topper 2734968e32 [RISCV] Restructure eliminateFrameIndex to share more code. NFC
The old code took two different paths based on whether there is
a scalable offset, but these two paths had some code in common.

The main difference between the two code paths was whether we needed
to create a GPR or not for the ADDI that gets created for RVVSpill.
If we had a scalable offset, the same GPR was used as the destination
for adding the scalable offset and the ADDI. To manage this, we now
cache the scratch register and reuse it if it has already been created.

This is a pre-patch for D135009.

Reviewed By: reames, frasercrmck

Differential Revision: https://reviews.llvm.org/D135092
2022-10-04 09:27:48 -07:00
Daniel Rodríguez Troitiño 57bd11f047 [ObjectYAML][MachO] Encode export trie address as ULEB128, not as SLEB128
The `dumpExportEntry` was dumping everything using signed LEB128, but
the format seems to use unsigned LEB128. This can be cross-checked with
the implementation in MachOObjectFile.cpp, the implementation in LLD's
ExportTrie.cpp, and the implementation in macho2yaml.cpp, which all use
ULEB128 functions..

The difference is only apparent when encoding some values with specific
bit patterns (bit active in the 7th, 14th, ... bits of the binary). The
encoding was not always creating problems in the resulting binaries
because if the extra byte was part of the padding, the result of
decoding it as ULEB128 is the same as decoding as SLEB128, however, the
code of MachOObjectFile.cpp (used by llvm-objdump) checks the buffer
decoding position against the reported length, which triggered an error.

Modified a test that used an address with this pattern (0x3FA0, the 14th
bit is active), to show that a round trip still produces the same
results, and added a check using llvm-objdump to use their extra checks
to verify this implementation.

Reviewed By: pete

Differential Revision: https://reviews.llvm.org/D134563
2022-10-04 09:17:58 -07:00
Sanjay Patel 0a1210e482 [InstSimplify] try harder to fold fmul with 0.0 operand
https://alive2.llvm.org/ce/z/oShzr3

This was noted as a missing fold in D134876 (with additional
examples based on issue #58046).

I'm assuming that fmul with a zero operand is rare enough
that the use of ValueTracking will not noticeably increase
compile-time.

This adjusts a PowerPC codegen test that was added with D88388
because it would get folded away and no longer provide coverage
for the bug fix.
2022-10-04 11:20:01 -04:00
Alexey Bataev ab9a81f736 [SLP]Try to emit canonical shuffle with undef operand.
In the canonical form of the shuffle the poison/undef operand is the
second operand, the patch tries to emit canonical form for partial
vectorization of the buildvector sequence.
Also, this patch starts emitting freeze instruction for shuffles with undef indices if the second shuffle operan is undef, not poison. It is an initial step to D93818, where undef mask element are treated as returning poison value.

Differential Revision: https://reviews.llvm.org/D134377
2022-10-04 08:16:07 -07:00
Pierre van Houtryve 75b292cb14 [AMDGPU][DAG] Fix insert_vector_elt lowering for 8 bit elements
The bitmask used to extract the bits assumed 16 bit elements and wasn't taking the size of the elements into account.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D135156
2022-10-04 14:48:15 +00:00
Sanjay Patel 7f7a0f2f83 [InstSimplify] reduce code duplication for fmul folds; NFC
This is a modification of the earlier attempt from:
7b7940f9da

For fma callers, we only want to swap a 0.0 or 1.0 constant.
2022-10-04 10:29:53 -04:00
Pierre van Houtryve c93104073c [AMDGPU] Always lower SHUFFLE_VECTOR
Make it illegal, remove InstructionSelector logic for it

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D134967
2022-10-04 14:23:17 +00:00
Jay Foad af947d9fcb [ISel] Fix crash in new FMA DAG combine
Fix a crash in the FMA combine added by D132837 and amended by D134810.
In cases where the newly created node could be folded, the combiner
would fail this assertion:

llc: DAGCombiner.cpp:268: void (anonymous namespace)::DAGCombiner::AddToWorklist(llvm::SDNode *): Assertion `N->getOpcode() != ISD::DELETED_NODE && "Deleted Node added to Worklist"' failed.

Differential Revision: https://reviews.llvm.org/D135150
2022-10-04 15:13:18 +01:00
Dominik Adamski 6842d35012 [OpenMP][OMPIRBuilder] Add support for order(concurrent) to OMPIRBuilder for SIMD directive
If 'order(concurrent)' clause is specified, then the iterations of SIMD loop
can be executed concurrently.

This patch adds support for LLVM IR codegen via OMPIRBuilder for SIMD loop
with 'order(concurrent)' clause. The functionality added to OMPIRBuilder is
similar to the functionality implemented in 'CodeGenFunction::EmitOMPSimdInit'.

Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D134046

Signed-off-by: Dominik Adamski <dominik.adamski@amd.com>
2022-10-04 08:30:00 -05:00
Nikita Popov e1dd2cd063 Reapply [InstCombine] Switch foldOpIntoPhi() to use InstSimplify
Reapply with a fix for the case where an operand simplified back
to the original phi: We need to map this case to the new phi node.

-----

foldOpIntoPhi() currently only folds operations into the phi if all
but one operands constant-fold. The two exceptions to this are freeze
and select, where we allow more general simplification.

This patch makes foldOpIntoPhi() generally simplification based and
removes all the instruction-specific logic. We just try to simplify
the instruction for each operand, and for the (potentially) one
non-simplified operand, we move it into the new block with adjusted
operands.

This fixes https://github.com/llvm/llvm-project/issues/57448, which
was my original motivation for the change.
2022-10-04 15:18:34 +02:00
Alex Richardson 16f9c5577d [SimplifyLibCalls] Retain attributes added by Builder.CreateMem*
This currently does not make much of a difference (only one tests is
affected), but it is helpful e.g. for the out-of-tree CHERI target where
Builder.CreateMemCpy() can add attributes other than parameter alignment.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D135075
2022-10-04 13:11:34 +00:00
Bjorn Pettersson 491ac8f3e8 [LibCalls] Cast Char argument to 'int' before calling emitFPutC
The helpers in BuildLibCalls normally expect that the Value
arguments already have the correct type (matching the lib call
signature). And exception has been emitFPutC which casted the Char
argument to 'int' using CreateIntCast. This patch moves the cast to
the caller instead of doing it inside emitFPutC.

I think it makes sense to make the BuildLibCall API:s a bit
more consistent this way, despite the need to handle the int cast
in two different places now.

Differential Revision: https://reviews.llvm.org/D135066
2022-10-04 12:52:05 +02:00
Bjorn Pettersson aa1b64cc42 [BuildLibCalls] Use TLI to get 'int' and 'size_t' type sizes
Stop assuming that an 'int' is 32 bits in helpers that emit libcalls
to lib functions that had 'int' in the signature. For most targets
this is NFC. For a target with 16 bit 'int' type this could help out
detecting if trying to emit a libcall with incorrect signature.

Similarly we now derive the type mapping to 'size_t' by asking TLI
about the size of 'size_t'. This should be NFC (at least for in-tree
targets) since getSizeTSize(), in TLI, is deriving the size in the
same way as DataLayout::getIntPtrType().

Differential Revision: https://reviews.llvm.org/D135065
2022-10-04 12:52:05 +02:00
Bjorn Pettersson 73e8d95d28 [BuildLibCalls] Name types to identify when 'int' and 'size_t' is assumed. NFC
Lots of BuildLibCalls helpers are using Builder::getInt32Ty to get
a type matching an 'int', and DataLayout::getIntPtrType to get a
type matching 'size_t'. The former is not true for all targets, since
and 'int' isn't always 32 bits. And the latter is a bit weird as well
as the definition of DataLayout::getIntPtrType isn't clearly mapping
it to 'size_t'.

This patch is not aiming at solving any such problems. It is merely
highlighting when a libcall is expecting to use 'int' and 'size_t'
by naming the types as IntTy and SizeTTy when preparing the type
signatures for the emitted libcalls.

Differential Revision: https://reviews.llvm.org/D135064
2022-10-04 12:52:05 +02:00
Florian Hahn 825e16969e
[LAA] Pass LoopAccessInfoManager instead of GetLAA function.
Use LoopAccessInfoManager directly instead of various GetLAA lambdas.

Depends on D134608.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D134609
2022-10-04 11:51:25 +01:00
Amara Emerson 75b18ba14d Revert "[AArch64][GlobalISel] Fold away lowered vector sign-extend of vector compares."
This reverts commit dcd02a524b.

We should instead use the generic combine.
2022-10-04 11:03:02 +01:00
Nikita Popov 6e504d637d [ValueTracking] Handle constant exprs in isKnownNonZero()
Handle constant expressions by falling through to the general
operator-based code. In particular, this adds support for bitcast
and GEP expressions.
2022-10-04 11:58:07 +02:00
Florian Hahn e399dd601f
[SimpleLoopUnswitch] Clear block and loop dispos after destroying loop.
SimpleLoopUnswitch may remove loops. Clear block and loop dispositions,
to clean up invalid entries in the cache.

Fixes #58136.
2022-10-04 10:27:52 +01:00
Nikita Popov 635f93dff7 [SimplifyLibCalls] Place deref attr even if nonnull already set
If nonnull is already set, we currently skip setting both nonnull
and dereferenceable. Make these independent, to avoid regressions
when additional nonnull attributes are inferred earlier.
2022-10-04 11:26:15 +02:00
Nikita Popov 0f32f0e147 Revert "[InstCombine] Switch foldOpIntoPhi() to use InstSimplify"
This reverts commit b20e34b39f.

This causes RAUW type mismatch assertions on some buildbots,
reverting for now.
2022-10-04 11:17:09 +02:00
Nikita Popov 45dec8f5fd [ValueTracking] Avoid known bits fallthrough for freeze (NFCI)
The known bits logic should never produce a better result than
the direct recursive non-zero query here, so skip the fallthrough.
2022-10-04 11:02:31 +02:00
Nikita Popov 9c0314f54e [ValueTracking] Switch isKnownNonZero() to switch over opcodes (NFCI)
The change in the assume-queries-counter.ll test is because we skip
and unnecessary known bits query for arguments.
2022-10-04 10:54:28 +02:00
Nikita Popov b20e34b39f [InstCombine] Switch foldOpIntoPhi() to use InstSimplify
foldOpIntoPhi() currently only folds operations into the phi if all
but one operands constant-fold. The two exceptions to this are freeze
and select, where we allow more general simplification.

This patch makes foldOpIntoPhi() generally simplification based and
removes all the instruction-specific logic. We just try to simplify
the instruction for each operand, and for the (potentially) one
non-simplified operand, we move it into the new block with adjusted
operands.

This fixes https://github.com/llvm/llvm-project/issues/57448, which
was my original motivation for the change.
2022-10-04 10:12:14 +02:00
Florian Hahn db720dc17c
[LAA] Use LoopAccessInfoManager in legacy pass.
Simplify LoopAccessLegacyAnalysis by using LoopAccessInfoManager from
D134606. As a side-effect this also removes printing support from
LoopAccessLegacyAnalysis.

Depends on D134606.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D134608
2022-10-04 08:37:11 +01:00
Lang Hames 3019f488f4 [ORC] Don't unnecessarily copy collection element. 2022-10-03 21:50:01 -07:00
Craig Topper 05df15965b [RISCV] Use _TIED form of VFWADD(U)_WV/VFWSUB(U)_WV to avoid early clobber.
One of the sources is the same size as the destination so that source
doesn't have an overlap with the destination register. By using the _TIED
form we avoid an early clobber contraint for that source.

This matches what was already done for instrinsics. ConvertToThreeAddress
will fix it if it can't stay tied.
2022-10-03 21:44:08 -07:00