When zlib is disabled at build time, the diagnostic `LLVM was not compiled with
LLVM_ENABLE_ZLIB: cannot decompress` for --decompress-debug-sections may be
inaccurate: if zstd is enabled, we should still support zstd decompression.
It's not useful to test zlib and zstd. Just remove the diagnostic and add a new
one before `compression::decompress`.
This fixes compress-debug-sections-zstd.test
Reviewed By: mariusz-sikora-at-amd, jhenderson, phosek
Differential Revision: https://reviews.llvm.org/D135744
Add a check (can be disabled via a flag) that the pipeline we generate is actually parsable.
Can be disabled because we don't expect to handle every pass in -print-pipeline-passes.
Fixes#58280.
Reviewed By: ChuanqiXu
Differential Revision: https://reviews.llvm.org/D135703
(X << Z) / (Y << Z) --> X / Y
https://alive2.llvm.org/ce/z/CLKzqT
This requires a surprising "nuw" constraint because we have
to guard against immediate UB via signed-div overflow with
-1 divisor.
This extends 008a89037a and is another transform
derived from issue #58137.
Need to set the insertpoint for extractelement to point to the first
instruction in the node to avoid possible crash during external uses
combine process. Without it we may endup with the incorrect
transformation.
Differential Revision: https://reviews.llvm.org/D135591
(X << Z) / (Y << Z) --> X / Y
https://alive2.llvm.org/ce/z/E5eaxU
This fixes the motivating example from issue #58137,
but it is not the most general transform. We should
probably also convert left-shift in the divisor to
right-shift in the dividend for that, but that exposes
another missed canonicalization for shifts and adds.
Freeze instruction in some cases makes codegen worse, so need to be very
careful when emitting it. Instead improve analysis in isUndefVector
function to generate mask of unused elements and use it in the analysis.
Differential Revision: https://reviews.llvm.org/D135382
If the AM* atomic memory access instruction has the same register number as
rd and rj, the execution will trigger an Instruction Non-defined Exception.
If the AM* atomic memory access instruction has the same register number as
rd and rk, the execution result is uncertain.
Reference: https://github.com/loongson/LoongArch-Documentation
Differential Revision: https://reviews.llvm.org/D135641
optimizeInductions may leave dead recipes which can prevent sinking.
Sinking on the other hand should not introduce new dead recipes, so
clean up dead recipes before sinking.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D133762
They have been scattered over the code. For better structuring, perform
them in one place. Potential CT drop is possible because we collect exit
blocks twice, but it's small price to pay for much better code structure.
Fix crash issue of D129537 and reopen it.
Currently the X86 shuffle lowering would widen the element type for
shuffle if the mask element value is adjacent. For below example
%t2 = add nsw <16 x i32> %t0, %t1
%t3 = sub nsw <16 x i32> %t0, %t1
%t4 = shufflevector <16 x i32> %t2, <16 x i32> %t3,
<16 x i32> <i32 16, i32 17, i32 2, i32 3, i32 4,
i32 5, i32 6, i32 7, i32 8, i32 9, i32 10,
i32 11, i32 12, i32 13, i32 14, i32 15>
ret <16 x i32> %t4
Compiler would transform the shuffle to
%t4 = shufflevector <8 x i64> %t2, <8 x i64> %t3,
<8 x i64> <i32 8, i32 1, i32 2, i32 3, i32 4,
i32 5, i32 6, i32 7>
This may lose the oppotunity to let ISel select mask instruction when
avx512 is enabled.
This patch is to prevent the tranform when avx512 feature is enabled.
Thank Simon for the idea.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D130830
After setting up the FP, the rest of the prologue doesn't need to
be replayed for unwinding the stack frame.
This allows reverting the functional parts of
2f7fbf8376 (but fixing inconsistent
duplicate setting of HasWinCFI).
Differential Revision: https://reviews.llvm.org/D135686
Given this is an OR reduction the two are equivalent and later
optimizations (AArch64InstrInfo::optimizePTestInstr) may rewrite the
sequence to use the flag-setting variant of instruction X, to remove the
PTEST altogether.
Reviewed By: paulwalker-arm, bsmith
Differential Revision: https://reviews.llvm.org/D134946
The BRKNS instruction is unlike the other instructions that set flags
since it has an all active implicit predicate, so the existing
PTEST(PG, BRKN(PG, A, B)) -> BRKNS(PG, A, B)
in AArch64InstrInfo::optimizePTestInstr is incorrect, however
PTEST(PTRUE_B(31), BRKN(PG, A, B)) -> BRKNS(PG, A, B)
is correct.
Spotted by @paulwalker-arm in D134946.
Reviewed By: paulwalker-arm
Differential Revision: https://reviews.llvm.org/D135655
These are semantically two different stages, but were entwined in the
old implementation. Now cost computation does not do legality checks,
and they all are done beforehead.
The patch selects VSELECT/VP_MERGE_VL which uses fmadd/fnmsub as true operand
and the adden of the fmadd/fnmsub as false operand.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D135330
If the source is implicit_def, the register allocator won't have
any constraint on what register it picks for the destination. This
doesn't give the user much control of what register is being used.
So in my mind that means the only reason to honor the policy operand
is to control what policy is used in vsetvli to maybe avoid a vtype
change. Given the other optimizations we do on the policy field, I
don't think allowing the user this control is reliable.
Therefore, I think we should use agnostic policies if the source is
undef.
This should give better performance on some CPUs for VP intrinsics where
there is no merge operand and the backend adds IMPLICIT_DEF to the instruction.
Differential Revision: https://reviews.llvm.org/D135396
(sra X, BW-1) is either 0 or -1. So the multiply is a conditional
negate of Y.
This pattern shows up when type legalizing wide multiplies involving
a sign extended value.
Fixes PR57549.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D133399
For a local linkage GlobalObject in a non-prevailing COMDAT, it remains defined while its
leader has been made available_externally. This violates the COMDAT rule that
its members must be retained or discarded as a unit.
To fix this, update the regular LTO change D34803 to track local linkage
GlobalValues, and port the code to ThinLTO (GlobalAliases are not handled.)
This fixes two problems.
(a) `__cxx_global_var_init` in a non-prevailing COMDAT group used to
linger around (unreferenced, hence benign), and is now correctly discarded.
```
int foo();
inline int v = foo();
```
(b) Fix https://github.com/llvm/llvm-project/issues/58215:
as a size optimization, we place private `__profd_` in a COMDAT with a
`__profc_` key. When FuncImport.cpp makes `__profc_` available_externally due to
a non-prevailing COMDAT, `__profd_` incorrectly remains private. This change
makes the `__profd_` available_externally.
```
cat > c.h <<'eof'
extern void bar();
inline __attribute__((noinline)) void foo() {}
eof
cat > m1.cc <<'eof'
int main() {
bar();
foo();
}
eof
cat > m2.cc <<'eof'
__attribute__((noinline)) void bar() {
foo();
}
eof
clang -O2 -fprofile-generate=./t m1.cc m2.cc -flto -fuse-ld=lld -o t_gen
rm -fr t && ./t_gen && llvm-profdata show -function=foo t/default_*.profraw
clang -O2 -fprofile-generate=./t m1.cc m2.cc -flto=thin -fuse-ld=lld -o t_gen
rm -fr t && ./t_gen && llvm-profdata show -function=foo t/default_*.profraw
```
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D135427
.addrsig_sym forces registering the symbol regardless whether it is otherwise
registered. This creates an undefined symbol which is inconvenient/undesired:
* `extern int x; void f() { (void)x; }` has inconsistent behavior whether `x` is emitted as an undefined symbol.
`-O0 -faddrsig` makes `x` undefined while other -O levels and -fno-addrsig eliminate the symbol.
* In ThinLTO, after a non-prevailing linkonce_odr definition is converted to available_externally, and then a declaration,
the addrsig code emits a symbol while the symbol is otherwise unseen.
D135427 fixed a bug that a non-prevailing `__cxx_global_var_init` was
incorrectly retained. However, the IR declaration causes an undesired
`.addrsig_sym __cxx_global_var_init`. This can be addressed in a way similar
to D101512 (`isTransitiveUsedByMetadataOnly`) but the increased
`OutStreamer->emitAddrsigSym(getSymbol(&GV));` complexity makes me nervous.
Just ignoring unregistered symbols circumvents the problem.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D135642
This adds infrastructural pieces for an analysis to compute the DXIL
shader flags. In this state the analysis can compute two fairly
straightforward feature flags for use of double-precision floating
point values and the DX 11.1 extended double support.
This patch does conflict with D135190, conflicts will be resolved prior
to merging.
Reviewed By: python3kgae
Differential Revision: https://reviews.llvm.org/D135393
# Conflicts:
# llvm/lib/Target/DirectX/CMakeLists.txt
# llvm/lib/Target/DirectX/DirectXTargetMachine.cpp
Unfortunately, we have a broken handling of this in the runtime of rocm
5.3. The runtime is expected to handle this correctly when v5 becomes
the default.
Differential Revision: https://reviews.llvm.org/D134714
The previous code used a APInt(1, 0) to represent the demanded elts of a scalable vector, and then ignored that argument if type was scalable. This was inconsistent with the UndefElts parameter which is set to either APInt(1, 0) or APInt(1,1) - that is, implicitly broadcast across all lanes. Particularly since the undef code relied on the DemandedElts parameter having bitwidth 1 to achieve that result!
This change switches the demanded parameter to APInt(1,1), documents the broadcast semantics, and takes advantage of it to remove one special case for scalable vectors which is no longer required.
Make support more generic to support future instructions.
Currently NFC.
Reviewed By: foad, arsenm
Differential Revision: https://reviews.llvm.org/D135678
Update comment, and add an assertion to check property expected by sole (non-test) caller. Remove tests which appear to have been copied from fixed vector tests, and whose demanded bits don't correspond to the way this interface is otherwise used.
`getFunctionParamOptimizedAlign` was being passed a null function
argument when getting the callee of a bitcasted function symbol. This is
because `CallBase::getCalledFunction` does not look through bitcasts.
There is already code to handle this case in
`NVPTXTargetLowering::getArgumentAlignment`, which is now hoisted into
an NVPTX util.
The alignment computation now gracefully handles computing alignment of
virtual functions with a check for null.
((Op1 * X) / Y) / Op1 --> X / Y
https://alive2.llvm.org/ce/z/JYxWjA
InstSimplify handles the more basic mul+div pattern with
shared operand, but we don't seem to have any reassociation
folds to handle cases where the common op is further away.
This is a generalization of 9cff4711ac and another
transform derived from issue #58137.
Reference: https://gcc.gnu.org/onlinedocs/gccint/Machine-Constraints.html
k: A memory operand whose address is formed by a base register and
(optionally scaled) index register.
m: A memory operand whose address is formed by a base register and
offset that is suitable for use in instructions with the same
addressing mode as st.w and ld.w.
ZB: An address that is held in a general-purpose register. The offset
is zero.
ZC: A memory operand whose address is formed by a base register and
offset that is suitable for use in instructions with the same
addressing mode as ll.w and sc.w.
Note:
The INLINEASM SDNode flags in below tests are updated because the new
introduced enum `Constraint_k` is added before `Constraint_m`.
llvm/test/CodeGen/AArch64/GlobalISel/irtranslator-inline-asm.ll
llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-inline-asm.ll
llvm/test/CodeGen/X86/callbr-asm-kill.mir
This patch passes `ninja check-all` on a X86 machine with all official
targets and the LoongArch target enabled.
Differential Revision: https://reviews.llvm.org/D134638
These are harmless for the unwinder - the unwinder doesn't need to
handle them for being able to unwind correctly.
Only add the opcodes when the branch target is in a SEH prologue;
for jumptables e.g. within a function, we shouldn't add any SEH
opcodes.
Differential Revision: https://reviews.llvm.org/D135277
There are static and dynamic TLS address lowering in DAG stage according
to different TLS models.
TLS address will be lowered to pseudo instruction and then expanded by
the `LoongArch Pre-RA pseudo instruction expansion` pass.
Differential Revision: https://reviews.llvm.org/D134713
As suggested on D135572, return Optional<> from getAllocSizeArgs()
rather than the peculiar pair(0, 0) sentinel.
The method on Attribute itself does not return Optional, because
the attribute must exist in that case.
This regularly comes up as a stumbling stone when adding int
attributes: They currently need to be encoded in a way to avoids
the zero value.
This adds support for zero-value int attributes by a) making the
ctor determine int/enum attribute based on attribute kind, not
whether the value is non-zero and b) switching getRawIntAttr()
to return an Optional, so that it's possible to distinguish a zero
value from non-existence.
Differential Revision: https://reviews.llvm.org/D135572
These accessors are not used. Generally, nowadays it is preferable
to perform queries on AttributeSets/Lists, rather than the
AttrBuilder, which is optimized towards attribute construction now.
This was the odd one out, with similar methods not existing for
any other attributes. In the places where it is used, it is best
replaced by AttrBuilder::getAttribute(), which allows us to both
test for presence of the attribute and retrieve its value at the
same time. (To just check for presence, contains() could be used.)
Proper construction functions for these have long since been
exposed, and these attributes require a type nowadays, so drop the
old compatibility code.
The algorithm in allLoopPathsLeadToBlock() does not handle the case
where the loop latch is part of the predecessor set correctly: In
this case, we may take the backedge (escaping to a different loop
iteration) and not execute other latch successors. This can happen
if the latch is part of an inner cycle.
Fixes https://github.com/llvm/llvm-project/issues/57780.
Differential Revision: https://reviews.llvm.org/D134279
When Index is variable but still trivially known to be equal we can use Value
from before the insertion, possibly eliminating the vector.
Reverts a functional change from:
Author: Philip Reames <listmail@philipreames.com>
Date: Wed Dec 8 12:21:10 2021 -0800
[instcombine] A couple style tweaks to visitExtractElementInst [nfc]
Thanks to Michele Scandale for identifying the bug
Differential Revision: https://reviews.llvm.org/D135625
The patch fixes lowering of anonymous functions, removes file/linkage
info for builtin call demangling, and adds relevant test demonstrating
a fixed problem.
Differential Revision: https://reviews.llvm.org/D135390
These instructions already had errors for operands that could not share
the same register:
VCMUL, VMULL, VQDMULL.
This extends that to a few others:
VREV64, VQDMULLqr, VCADD and VHCADD.
Only the i32 types require the error.
Differential Revision: https://reviews.llvm.org/D135560
This extends the existing SCEV verification to catch cache invalidation
issues as in #57837.
The validation logic is similar to the recently added loop disposition
cache validation in bb68b2402d.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D134531
insertelt DestVec, (fneg (extractelt SrcVec, Index)), Index --> shuffle DestVec, (fneg SrcVec), Mask
This is a specialized form of what could be a more general fold for a binop.
It's also possible that fneg is overlooked by SLP in this kind of
insert/extract pattern since it's a unary op.
This shows up in the motivating example from #issue 58139, but it won't solve
it (that probably requires some x86-specific backend changes). There are also
some small enhancements (see TODO comments) that can be done as follow-up
patches.
Differential Revision: https://reviews.llvm.org/D135278
This reverts commit 4fbe33593c. It causes linking errors, with details provided internally. (Hopefully the author/reviewers will be able to upstream the internal repro).
If the divisor is even, we can first shift the dividend and divisor
right by the number of trailing zeros. Now the divisor is odd and we
can do the original algorithm to calculate a remainder. Then we shift
that remainder left by the number of trailing zeros and add the bits
that were shifted out of the dividend.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D135541
If the single-thread model is used, or the
-licm-force-thread-model-single flag is specified, skip checks
related to thread-safety. This means that store promotion for
conditionally executed stores only requires proof of
dereferenceability and writability, but not of thread-safety. For
example, this enables promotion of stores to (non-constant) globals,
as well as captured allocas.
Fixes https://github.com/llvm/llvm-project/issues/50537.
Differential Revision: https://reviews.llvm.org/D130466
The return type is two u8 packed into a 16 bit VGPR, so this instruction
should be True16.
Reviewed By: dp
Differential Revision: https://reviews.llvm.org/D135478
This commit fixes https://github.com/llvm/llvm-project/issues/57326.
Currently we would take a Mask out and directly use it by doing
auto Mask = SVI->getShuffleMask();
However, if the mask is undef, this Mask is not initialized. It might be
a vector of -1 or random integers.
This would cause an Out-of-bound read later when trying to find a
StartMask.
This change checks if all indices in the Mask is in the allowed range,
and fixes the out-of-bound accesses.
Differential Revision: https://reviews.llvm.org/D132634
When determining the initial value of the object, use the constant
folding API to load a given type at a given offset in the global
initializer. This makes it work for cases where the load doesn't
directly correspond to an aggregate member.
Differential Revision: https://reviews.llvm.org/D135435
Now that ExecutionSession objects alway have ExecutorProcessControl (EPC)
objects attached we can use EPCEHFrameRegistrar by default, rather than
InProcessEHFrameRegistrar. This allows LLJIT to work out-of-the-box with remote
EPCs on platforms that use JITLink, without requiring a custom
ObjectLinkingLayerCreator to override the eh-frame registrar.
On many aarch64 processors (Cortex A78, Neoverse N1/N2/V1, etc), ADD with LSL shift (shift-amount <= 4) has smaller latency and higher
throughput than ADD with larger shift (shift-amunt > 4). This is at least no-op for the rest of the processors.
Differential Revision: https://reviews.llvm.org/D135208
The current decomposition for GEPs did not correctly handle cases where
GEPs access different source types. Adjust the constraints by including
the indexed type-size as coefficients.
Further generalization to allow GEPs with more than one index is a
needed general follow-up improvement.
If the location ptr to be killed is in no loop and the Function does not
have irreducible loops, then we can regard it as loop invariant.
Differential Revision: https://reviews.llvm.org/D135369
Move common logic shared by callers of getConstraint that use the result
to query the constraint system to a new helper getConstraintForSolving.
This includes common legality checks (i.e. not an equality constraint,
no new variables) and the logic to query the unsigned system if possible
for signed predicates.
This is AIX part of update after https://reviews.llvm.org/D117225
Fixed the issue that AIX64 with vector pair enabled saw redundant
spill/reload of callee saved vector registers.
Based on original patch by: Kai Luo
Reviewed By: lkail
Differential Revision: https://reviews.llvm.org/D133466
Otherwise eliminateFrameIndex cannot figure out how to fixup the stack
offset with its stateless logic, because there wouldn't be an immediate
slot for it to trivially write to, and it may not be easy to transform
the surrounding code to make it work.
This fixes a fairly common crash when compiling moderately complex code with
Clang.
Differential Revision: https://reviews.llvm.org/D135251
InclusionRewriter on Windows (CRLF line endings) will exercise this in a
hot path. Calling memcmp repeatedly would be highly suboptimal for that
use case, so give it a specialized path.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D133660
See the updated linkonce_resolution_comdat.ll. For a local linkage GV in a
non-prevailing COMDAT, it remains defined while its leader has been made
available_externally. This violates the COMDAT rule that its members must be
retained or discarded as a unit.
To fix this, update the regular LTO change D34803 to track local linkage
GlobalValues, and port the code to ThinLTO (GlobalAliases are not handled.)
Fix https://github.com/llvm/llvm-project/issues/58215:
as a size optimization, we place private `__profd_` in a COMDAT with a
`__profc_` key. When FuncImport.cpp makes `__profc_` available_externally due to
a non-prevailing COMDAT, `__profd_` incorrectly remains private. This change
makes the `__profd_` available_externally.
```
cat > c.h <<'eof'
extern void bar();
inline __attribute__((noinline)) void foo() {}
eof
cat > m1.cc <<'eof'
#include "c.h"
int main() {
bar();
foo();
}
eof
cat > m2.cc <<'eof'
#include "c.h"
__attribute__((noinline)) void bar() {
foo();
}
eof
clang -O2 -fprofile-generate=./t m1.cc m2.cc -flto -fuse-ld=lld -o t_gen
rm -fr t && ./t_gen && llvm-profdata show -function=foo t/default_*.profraw
# one _Z3foov
clang -O2 -fprofile-generate=./t m1.cc m2.cc -flto=thin -fuse-ld=lld -o t_gen
rm -fr t && ./t_gen && llvm-profdata show -function=foo t/default_*.profraw
# one _Z3foov
```
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D135427
The 1st attempt failed to updated the test checks as expected.
Original commit message:
sdiv exact X, (1<<ShAmt) --> ashr exact X, ShAmt (if shl is non-negative)
https://alive2.llvm.org/ce/z/kB6VF7
It would probably be better to use ValueTracking to replace this
and the existing transform above it, but the analysis does not
account for the no-wrap properly, and it's not immediately clear
to me how to fix it.
sdiv exact X, (1<<ShAmt) --> ashr exact X, ShAmt (if shl is non-negative)
https://alive2.llvm.org/ce/z/kB6VF7
It would probably be better to use ValueTracking to replace this
and the existing transform above it, but the analysis does not
account for the no-wrap properly, and it's not immediately clear
to me how to fix it.
The logic added in 3771310eed was placed sub-optimally. Applying the
transform in ::getConstraint meant that it would also impact conditions
that are added to the system by the signed <-> unsigned transfer logic.
This meant we failed to add some signed facts to the signed system. To
make sure we still add as many useful facts to the signed/unsigned
systems, move the logic to the point where we query the system.
This patch fixes the failure of llvm/test/CodeGen/Generic/vector.ll and
CodeGen/PowerPC/2007-11-19-VectorSplitting.ll for a LoongArch native build.
Differential Revision: https://reviews.llvm.org/D134798
There are intrinsics for most scalar instructions and almost all HVX
instructions. What's somewhat painful is that there are two intrinsics
for each HVX instruction: one for 64- and one for 128-byte mode.
Instead of checking the current codegen settings every time, this
function would simply return the right intrinsic.
The first two parameters of addcarry are commutative. We may face a situation where both variant are present in the DAG, in which case we benefit from using just one.
Depends on D57302 and D33587
Reviewed By: RKSimon, chfast
Differential Revision: https://reviews.llvm.org/D57317
Typically when you match something, you want to check the result.
Fix a couple warnings in the AMDGPUPostLegalizerCombiner which appear as a
result of this.
Differential Revision: https://reviews.llvm.org/D135491
Clear all dispositions if there are any dead blocks (which will get
removed later) and also clear dispositions for removed instructions.
Clearing all dispositions in case there are dead blocks happens first,
which should avoid traversing SCEV use-lists for invalidating
dispositions for individual values.
Fixes#58179.
Apparently StackColoring depends on SlotIndexes, but not
LiveIntervals. If regalloc fast were manually requested, LiveIntervals
would be dropped before SILowerSGPRSpills but not SlotIndexes.
SILowerSGPRSpills preserved SlotIndexes, but only through
LiveIntervals. As a result, SILowerSGPRSpills was incorrectly
reporting it preserved SlotIndexes. Start updating these directly,
instead of depending on LiveIntervals also being available.
This patch was added way back in the beginning of the work which became the statepoint infrastructure. The idea was that safepoints could be inserted late in the optimization pipeline. This is true if the only concern is garbage collection, but this approach turned out to be incompatible with the requirement to also support deoptimization at safepoints.
In theory, this pass would still be quite useful for an AOT compiled language which wants to support garbage collection, but we have no known users, and haven't for over 5 years. Time to remove unused code. If someone wants to use this, restoring it would not be hard. The immediate motivation for removal is that this is one of the last passes remaining which hasn't been ported to the new pass manager and the (straight forward) work to do so is not justified for unused code.
Differential Revision: https://reviews.llvm.org/D135371
Change the behavior of the `llvm-profdata show --debug-info=` command to dump a YAML file when using debug info correlation since it provides more information in a parseable format.
Reviewed By: yozhu, phosek
Differential Revision: https://reviews.llvm.org/D134770
1. `length(value/type)`: return the number of elements in the vector
input,
2. `getHvxTy(elem_type)`: return the HVX vector type with the element
type provided.
These will help write things more succintly.
EVT can be created for any Type, and so this function can now be used to
check if given Type, as-is, is an HVX type (as opposed to a type that may
be subject to legalization to an HVX type).
Extend forgetBlockAndLoopDisposition to allow clearing information for a
single value. This can be useful when only a single value is changed,
e.g. because the instruction is moved.
We also need to clear the cached values for all SCEV users, because they
may depend on the starting value's disposition.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D134614
When SimplifyLibCalls is dealing with wchar_t (e.g. optimizing wcslen)
it uses ValueTracking helpers with a CharSize/ElementSize that isn't
8, but rather 16 or 32 (to match with the size in bits of a wchar_t).
Problem I've seen is that llvm::getConstantDataArrayInfo is taking
both an "ElementSize" argument (basically indicating size of a
char/element in bits) and an "Offset" which afaict is an offset
in the unit "number of elements". Then it also use
stripAndAccumulateConstantOffsets to get a "StartIdx" which afaict
is calculated in bytes. The returned Slice.Length is based on
arithmetics that add/subtract variables that are having different
units (bytes vs elements). Most notably I think the "StartIdx" must
be scaled using the "ElementSize" to get correct results.
The symptom of the above problem was seen in the wcslen-1.ll test
case which miscompiled.
This patch is supposed to resolve the bug by converting between
bytes and elements when needed.
Differential Revision: https://reviews.llvm.org/D135263
This patch moves the emitOffloadingArraysArgument function and
supporting data structures to OpenMPIRBuilder. This will later be used
in flang as well. The TargetDataInfo class was split up into generic
information and clang-specific data, which remain in clang. Further
migration will be done in in the future.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D134662
Loop peeling currently requires that a) the latch is exiting
b) a branch and c) other exits are unreachable/deopt. This patch
removes all of these limitations, and adds the necessary branch
weight updating support. It essentially works the same way as
before with latch -> exiting terminator and
loop trip count -> per exit trip count.
It's worth noting that there are still other limitations in
profitability heuristics: This patch enables peeling of loops to
make conditions invariant (which is pretty much always highly
profitable if possible), while peeling to make loads dereferenceable
still checks that non-latch exits are unreachable and PGO-based
peeling has even more conditions. Those checks could be relaxed
later if we consider those cases profitable.
The motivation for this change is that loops using iterator adaptors
in Rust often optimize very badly, and end up with a loop phi of the
form phi(true, false) in the final result. Peeling eliminates that
phi and conditions based on it, which enables a lot of follow-on
simplification.
Differential Revision: https://reviews.llvm.org/D134803
The scalar instruction of this is `llvm.trunc`. However the naming of
ISD::VP_TRUNC is already taken by `trunc` of the LLVM IR. Naming this as
`vp.ftrunc` would likely cause confusion with `vp.fptrunc`. So adding
`vp.roundtozero` that will look similar to `vp.roundeven`.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D135233
As LoopPredication performs non-equivalent transforms removing some
checks from loops, other passes may not be able to perform transforms
they'd be able to do if the checks were left in loops.
This patch makes LoopPredication insert assumes of the replaced
conditions either after a guard call or in the true block of
widenable condition branch.
Differential Revision: https://reviews.llvm.org/D135354
Relative to the previous attempt, this adjusts simplification to
use the correct context instruction: We need to use the terminator
of the incoming block, not the original instruction.
-----
foldOpIntoPhi() currently only folds operations into the phi if all
but one operands constant-fold. The two exceptions to this are freeze
and select, where we allow more general simplification.
This patch makes foldOpIntoPhi() generally simplification based and
removes all the instruction-specific logic. We just try to simplify
the instruction for each operand, and for the (potentially) one
non-simplified operand, we move it into the new block with adjusted
operands.
This fixes https://github.com/llvm/llvm-project/issues/57448, which
was my original motivation for the change.
Differential Revision: https://reviews.llvm.org/D134954
Similar to the current "Trunc/BuildVector" folding - which folds low element extracts of BuildVectors, folds hi element extracts done using bitshifts.
For D134354
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D135148
Regression from D131400: cross-language LTO causes a crash in the
compiler on the NULL deref of Scope in `isa` call when Rust IR is
involved. Presumably, this might affect other languages too, and
even Rust itself without cross-language LTO when the Rust compiler
switched to LLVM 16.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D134616
Now only DXILTranslateMetadata uses DXILResources, so DXILResourceWrapper is only used by DXILTranslateMetadata.
Once we add lower for createHandle, DXILResourceWrapper will be used in more passes.
Also we can add resource index allocation in DXILResourceWrapper.
Reviewed By: beanz
Differential Revision: https://reviews.llvm.org/D135190
I tend to think we should ignore the policy bit in vsetvli insertion
if the tied operand is IMPLICIT_DEF. But that raises questions about
what the policy operand on RVV intrinsics means if you also pass
vundefined().
This change at least fixes some cases. I'll post a separate patch
for vsetvli insertion for discussion.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D135386
We can lower these as an or with the negative of the condition value. This appears to result in significantly less branch-y code on multiple common idioms (as seen in tests).
Differential Revision: https://reviews.llvm.org/D135316
This is a non-trivial property relied upon by D135396. I wrote this to convince myself it was true.
Differential Revision: https://reviews.llvm.org/D135403
This patch had to be reverted because on gcc 7.5.0 we see an error converting from std::unique_ptr<MCRegisterInfo> to Expected<std::unique_ptr<MCRegisterInfo>> as the return type for the function createRegInfo. This has now been fixed.
Added analysis for invariant extractelement instructions and improved
detection of the CSE blocks for generated extractelement instructions.
Differential Revision: https://reviews.llvm.org/D135279
Clang may optimize conditional tailcall blocks with the following layout:
cmp <condition>
je tailcall_target
ret
When retpoline is in place, indirect calls are converted into direct calls to a retpoline thunk. When these indirect calls are tail calls, they may be subject to the above described optimization (there is no indirect JCC, but since now the jump is direct it can be made conditional). The above layout is non-ideal for the Linux kernel scenario because the branches into thunks may be patched back into indirect branches during runtime depending on the underlying CPU features, what would not be feasible if the binary is emitted with the optimized layout above.
Thus, prevent clang from emitting this it if CodeModel is Kernel.
Feature request from the respective kernel mailing list: https://lore.kernel.org/llvm/Yv3uI%2FMoJVctmBCh@worktop.programming.kicks-ass.net/
Reviewed By: nickdesaulniers, pengfei
Differential Revision: https://reviews.llvm.org/D134915
The limitation in LibCallSimplifier::optimizeStringLength to only
optimize when the string is an i8 array was changed already in
commit 50ec0b5dce back in 2017.
We still only simplify when 's' points at an array of 'CharSize', so
the comment is still valid in the sense that we do not support
arbitrary array types.
Differential Revision: https://reviews.llvm.org/D135261
Optimization for using compressed beqz and bnez
If there is pattern
```
br_cc val1 constval eq/neq place
select_cc val1 constval eq/neq trueval falseval
```
and constval does not fit in compressed imm format(6 bit), but fit in
imm format(12 bit), we can replace by non compress sub and compress
c.beqz/c.bneqz:
```
addi val val -constval
c.beqz val place
```
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D132839
Prior to this patch FixedPointSemantics and APFixedPoint only support semantics where
the Scale larger or equal to zero and the Width is larger or equal to the Scale.
This patch removes both those requirements while staying API compatible.
This changes the default value used for mask policy from mask undisturbed to mask agnostic. In hardware, there may be a minor preference for ta/ma, but since this is only going to apply to instructions which don't use the mask policy bit, this is functionally mostly a nop. The main value is to make future changes to using MA when legal for masked instructions easier to review by reducing test churn.
The prior code was motivated by a desire to minimize state transitions between masked and unmasked code. This patch achieves the same effect using the demanded field logic (landed in afb45ff), and there are no regressions I spotted in the test diffs. (Given the size, I have only been able to skim.) I do want to call out that regressions are possible here; the demanded analysis only works on a block local scope right now, so e.g. a tight loop mixing masked and unmasked computation might see an extra vsetvli or two.
Differential Revision: https://reviews.llvm.org/D133803
Make sure conditions with constant operands come before conditions
without constant operands. This increases the effectiveness of the
current signed <-> unsigned fact transfer logic.
Currently, AAResultBase (from which alias analysis providers inherit)
stores a reference back to the AAResults aggregation it is part of,
so it can perform recursive alias analysis queries via
getBestAAResults().
This patch removes the back-reference from AAResultBase to AAResults,
and instead passes the used aggregation through the AAQueryInfo.
This can be used to perform recursive AA queries using the full
aggregation.
Differential Revision: https://reviews.llvm.org/D94363
We can still get a NaN even if none of the operands are NaN,
e.g. from +inf/-inf. D50804 didn't catch that.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D134854
Previously we would be unable to legalize V2S16 BUILD_VECTOR_TRUNC on GFX8 & below as the custom legalization was missing.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D135149
The patch introduces reading the attributes of kernel arguments both from
function-attached and module-level metadata, during kernel arguments lowering.
Two tests are added to show the improvement.
Differential Revision: https://reviews.llvm.org/D135106
Co-authored-by: Aleksandr Bezzubikov <zuban32s@gmail.com>
Co-authored-by: Michal Paszkowski <michal.paszkowski@outlook.com>
Co-authored-by: Andrey Tretyakov <andrey.tretyakov@mail.com>
Co-authored-by: Konrad Trifunovic <konrad.trifunovic@intel.com>
This patch allows the combines that fold extensions in binary operations
to have more than one use.
The approach here is pretty conservative: if all the users of an
extension can fold the extension, then the folding is done, otherwise we
don't fold.
This is the first step towards avoiding the one-use limitation.
As a result, we make a decision to fold/don't fold for a web of
instructions. An instruction is part of the web of instructions as soon
as it consumes an extension that needs to be folded for all its users.
Because of how SDISel works a web of instructions can be visited over
and over. More precisely, if the folding happens, it happens for the
whole web and that's the end of it, but if the folding fails, the whole
web may be revisited when another member of the web is visited.
To avoid a compile time explosion in pathological cases, we bail out
earlier for webs that are bigger than a given threshold (arbitrarily set
at 18 for now.) This size can be changed using
`--riscv-lower-ext-max-web-size=<maxWebSize>`.
At the current time, I didn't see a better scheme for that. Assuming we
want to stick with doing that in SDISel.
Differential Revision: https://reviews.llvm.org/D133739
This adds some missing tablegen patterns to handle trn1/trn2/zip1/zip2/uzp1/uzp2,
similar to the Arm handling in 5e1a9d319d, but via tablegen
patterns for the AArch64 backend.
This patch centralizes all the combines of add|sub|mul with extended
operands in one "framework".
The rationale for this change is to offer a one-stop-shop for all these
transformations so that, in the future, it is easier to make combine
decisions for a web of instructions (i.e., instructions connected
through s|zext operands).
Technically this patch is not NFC because the new version is more
powerful than the previous version.
In particular, it diverges in two cases:
- VWMULSU can now also be produced from `mul(splat, zext)`, whereas
previously only `mul(sext, splat)` were supported when `splat`s were
involved. (As demonstrated in rvv/fixed-vectors-vwmulsu.ll)
- VWSUB(U) can now also be produced from `sub(splat, ext)`, whereas
previously only `sub(ext, splat)` were supported when `splat`s were
involved. (As demonstrated in rvv/fixed-vectors-vwsub.ll)
If we wanted, we could block these transformations to make this
patch really NFC. For instance, we could do something similar to
`AllowSplatInVW_W`, which prevents the combines to form vw(add|sub)(u)_w
when the RHS is a splat.
Regarding the "framework" itself, the bulk of the patch is some
boilderplate code that abstracts away the actual extensions that are
present in the DAG. This allows us to handle `vwadd_w(ext a, b)` as if
it was a regular `add(ext a, ext b)`. Since the node `ext b` doesn't
actually exist in the DAG, we have a bunch of methods (all in the
NodeExtensionHelper class) that fake all that for us.
The other half of the change is around `CombineToTry` and
`CombineResult`. These helper structures respectively:
- Represent the kind of combines that can be applied to a node, and
- Store what needs to happen to do that combine.
This can be viewed as a two step approach:
- First, check if a pattern applies, and
- Second apply it.
The checks and the materialization of the combines are decoupled so that
in the future we can perform several checks and do all the related
applies in one go.
Differential Revision: https://reviews.llvm.org/D134703
Sometimes when a function is inlined into a different CU, `llvm-dwarfdump --verify` would find an inlined subroutine with an invalid abstract origin. This is because `DwarfUnit::addDIEEntry()` will incorrectly assume the inlined subroutine and the abstract origin are from the same CU if it can't find the CU for the inlined subroutine.
In the added test, the inlined subroutine for `bar()` is created before the CU for `B.swift` is created, so it tries to point to `goo()` in the wrong CU. Interestingly, if we swap the order of the two functions then we don't see a crash since the module for `goo()` is created first.
The fix is to give a parent DIE to `ScopeDIE` before calling `addDIEEntry()` so that its CU can be found. Luckily, `constructInlinedScopeDIE()` is only called once so we can pass it the DIE of the scope's parent and give it a child just after it's created.
`constructInlinedScopeDIE()` should always return a DIE, so assert that it is not null.
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D135114
For V_CMP_CLASS_F16_t16_e64 and V_CMPX_CLASS_F16_t16_e64,
https://reviews.llvm.org/D133723 changed the value type of src1 from i32 to i16.
These src1 operands are 16 bits, therefore need to be encoded as true16
operands. So the _e32 type was correctly set to VGPR_32_Lo128.
In _e64 form the operand class went from
VSrc_b32 to VSrc_b16. For some reason, we cannot encode inline literals for
VSrc_b16, see 5f5f566b26. In this phase of
the true16 implementation, VSrc_b16 and VSrc_b32 are still similar,
except from that quirk of inlines. So set the operand class to regain
that function.
Reviewed By: dp, arsenm
Differential Revision: https://reviews.llvm.org/D134897
Setting up a lazy-save mechanism around calls is done during SelectionDAG
because calls to intrinsics may be expanded into an actual function call
(e.g. calls to @llvm.cos()), and maintaining an allowed-list in the SMEABI
pass is not feasible.
The approach for conditionally restoring the lazy-save based on the runtime
value of TPIDR2_EL0 is similar to how we handle conditional smstart/smstop.
We create a pseudo-node which gets expanded into a conditional branch and
expands to a call to __arm_tpidr2_restore(%tpidr2_object_ptr).
The lazy-save buffer and TPIDR2 block are only allocated once at the start
of the function. For each call, the TPIDR2 block is initialised, and at
the end of the call, a pseudo node (RestoreZA) is planted.
Patch by Sander de Smalen.
Differential Revision: https://reviews.llvm.org/D133900
If a call base use will not capture a pointer we can approximate the
effects. This is important especially for readnone/only uses. Even
may-write uses are not too bad with reachability in place. Capturing
is the problem as we loose track of update sides.
If we have a constant aggregate, e.g., as an initializer, we usually
failed to extract the proper value/type from it. This patch provides the
size and offset information necessary to extract the right part of the
constant.
This was already handled correctly below, but not checked for the
original store pointer operand. Encountered when converting tests
to opaque pointers, where the intermediate bitcast goes away.
In the case of non-opaque pointers, when combining consecutive loads,
need to bitcast the pointer source to the combined type size, otherwise
asserts are triggered.
Differential Revision: https://reviews.llvm.org/D135249
The infinite loop seen on buildbots should be fixed by
11897708c0 (assuming there are not
multiple infinite combine loops...)
-----
foldOpIntoPhi() currently only folds operations into the phi if all
but one operands constant-fold. The two exceptions to this are freeze
and select, where we allow more general simplification.
This patch makes foldOpIntoPhi() generally simplification based and
removes all the instruction-specific logic. We just try to simplify
the instruction for each operand, and for the (potentially) one
non-simplified operand, we move it into the new block with adjusted
operands.
This fixes https://github.com/llvm/llvm-project/issues/57448, which
was my original motivation for the change.
Differential Revision: https://reviews.llvm.org/D134954
Rather than inserting a ptrtoint + inttoptr pair, directly replace
the inttoptr with the new phi node. This ensures that no other
transform can undo it before the pair gets folded away.
This avoids the infinite loop when combined with D134954.
This is NFCI in the sense that it shouldn't make a difference, but
could due to different worklist order.
The new pass implements the following:
* Inserts code at the start of an arm_new_za function to
commit a lazy-save when the lazy-save mechanism is active.
* Adds a smstart intrinsic at the start of the function.
* Adds a smstop intrinsic at the end of the function.
Patch co-authored by kmclaughlin.
Differential Revision: https://reviews.llvm.org/D133896
SimpleLoopUnswitch may remove blocks from loops. Clear block and loop
dispositions in that case, to clean up invalid entries in the cache.
Fixes#58158.
Fixes#58159.
This patch introduces a new AArch64 ISD node (OBSCURE_COPY) that can
be used when we want to prevent SVE object address calculations
from being rematerialised between a smstop/smstart and a call.
At the moment we use COPY to copy the frame index to a register,
which leads to problems because the "simple register coalescing"
pass understands the COPY instruction and attempts to rematerialise
an address calculation with 'addvl' between an smstop and a call.
When in streaming mode the 'addvl' instruction may have different
behaviour because the streaming SVE vector length is not guaranteed
to equal the normal SVE vector length.
The new ISD opcode OBSCURE_COPY gets lowered to a new pseudo
instruction also called OBSCURE_COPY. This ensures it cannot be
rematerialised and we expand this into a simple move very late in
the machine instruction pipeline.
A new test is added here:
CodeGen/AArch64/sme-streaming-interface.ll
Differential Revision: https://reviews.llvm.org/D134940
This makes sure that the instructions of the prologue matches the
SEH opcodes.
Also remove a couple redundant cases of setting HasWinCFI; it was
already set unconditionally after the conditional cases.
Differential Revision: https://reviews.llvm.org/D135101
These intrinsics are simply expanded to regular icmp/fcmp instructions.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D121594
The patch selects VSELECT_VL/VP_MERGE_VL that uses VF(N)M(ACC|SAC) as its
true operand and the adden of the true operand as its false operand.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D135080
The allocation hints for copies of ACC registers assumed that we would only be
copying between VSRp and UACC registers. In reality it is also possible to copy
between UACC and ACC registers.
This patch adds a new case for the ACC copy to fix that issue.
Note that the test case added with this patch will hit an assert without the
fix.
Reviewed By: lei, amyk
Differential Revision: https://reviews.llvm.org/D134501
(Re-Apply with fixes to clang MicrosoftMangle.cpp)
This is a first step towards high level representation for fp8 types
that have been built in to hardware with near term roadmaps. Like the
BFLOAT16 type, the family of fp8 types are inspired by IEEE-754 binary
floating point formats but, due to the size limits, have been tweaked in
various ways in order to maximally use the range/precision in various
scenarios. The list of variants is small/finite and bounded by real
hardware.
This patch introduces the E5M2 FP8 format as proposed by Nvidia, ARM,
and Intel in the paper: https://arxiv.org/pdf/2209.05433.pdf
As the more conformant of the two implemented datatypes, we are plumbing
it through LLVM's APFloat type and MLIR's type system first as a
template. It will be followed by the range optimized E4M3 FP8 format
described in the paper. Since that format deviates further from the
IEEE-754 norms, it may require more debate and implementation
complexity.
Given that we see two parts of the FP8 implementation space represented
by these cases, we are recommending naming of:
* `F8M<N>` : For FP8 types that can be conceived of as following the
same rules as FP16 but with a smaller number of mantissa/exponent
bits. Including the number of mantissa bits in the type name is enough
to fully specify the type. This naming scheme is used to represent
the E5M2 type described in the paper.
* `F8M<N>F` : For FP8 types such as E4M3 which only support finite
values.
The first of these (this patch) seems fairly non-controversial. The
second is previewed here to illustrate options for extending to the
other known variant (but can be discussed in detail in the patch
which implements it).
Many conversations about these types focus on the Machine-Learning
ecosystem where they are used to represent mixed-datatype computations
at a high level. At that level (which is why we also expose them in
MLIR), it is important to retain the actual type definition so that when
lowering to actual kernels or target specific code, the correct
promotions, casts and rescalings can be done as needed. We expect that
most LLVM backends will only experience these types as opaque `I8`
values that are applicable to some instructions.
MLIR does not make it particularly easy to add new floating point types
(i.e. the FloatType hierarchy is not open). Given the need to fully
model FloatTypes and make them interop with tooling, such types will
always be "heavy-weight" and it is not expected that a highly open type
system will be particularly helpful. There are also a bounded number of
floating point types in use for current and upcoming hardware, and we
can just implement them like this (perhaps looking for some cosmetic
ways to reduce the number of places that need to change). Creating a
more generic mechanism for extending floating point types seems like it
wouldn't be worth it and we should just deal with defining them one by
one on an as-needed basis when real hardware implements a new scheme.
Hopefully, with some additional production use and complete software
stacks, hardware makers will converge on a set of such types that is not
terribly divergent at the level that the compiler cares about.
(I cleaned up some old formatting and sorted some items for this case:
If we converge on landing this in some form, I will NFC commit format
only changes as a separate commit)
Differential Revision: https://reviews.llvm.org/D133823
Once we are in the `Unreachable` we want to disable type checking, but
we were unconditionally returning `true` here which means we encountered
and error. Instead we unconditionally return false to signal no error.
Fixes: https://github.com/llvm/llvm-project/issues/56935
Differential Revision: https://reviews.llvm.org/D135195
As a result of making these legal, and tweaking the combine to allow vectors,
we generate vector G_SEXT_INREG during legalization.
The reason we want to make these legal in the first place is to allow for
more combine opportunities. Once those have been done, we can just lower them
back to shifts in the post-legalizer lowering.
This needs to be one commit otherwise we start causing tests to fail due to
incomplete support for selection etc.
This information is not preserved in MIR today. So this patch adds
information to RISCVMachineFunctionInfo when the vreg is created for
the argument.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D134621
Loop versioning changes the control-flow, which may impact SCEVs cached
by for other loops in LoopAccessInfoManager. Clear the manager after
making changes.
Fixes#57825.
Depends on D134609.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D134611
performCONDCombine removes and 0xff in patterns of
SUBS (and (add(..), 0xff), C)
under certain complex conditions. It doesn't come up often,
but in the lowering of usub.sat where the SUBS is both used as a
condition and as a value, the And is removed where it would only be
valid for the condition.
Fixes#58109.
Differential Revision: https://reviews.llvm.org/D135043
Since SROA chooses promotion based on reaching load / stores of allocas, we may run into scenarios in which we alloca a vector, but promote it to an integer. The result of which is the familiar LoadCombine pattern (i.e. ZEXT, SHL, OR). However, instead of coming directly from distinct loads, the elements to be combined are coming from ExtractVectorElements which stem from a shared load.
This patch identifies such a pattern and combines it into a load.
Change-Id: I0bc06588f11e88a0a975cde1fd71e9143e6c42dd
isOuterMostDepPositive()
The function isOuterMostDepPositive() is checked after negative dependence
vectors are normalized to be non-negative, so there will not be any negative
dependency ('>' as the outermost non-equal sign) after normalization. And
therefore the check in isOuterMostDepPositive() is irrelevent and redundant.
Reviewed By: congzhe
Differential Revision: https://reviews.llvm.org/D132982
AArch64LoadStoreOptimizer has a bunch of different guards to avoid
corrupting Windows SEH prologues/epilogues, but apparently we missed the
case of merging two instructions where the first instruction isn't part
of the epilogue, but the second instruction is.
Fixes issue discovered at https://reviews.llvm.org/D130049#3704064
Differential Revision: https://reviews.llvm.org/D134992
This code adds initial support for generating the HLSL resources
metadata entries. It has a lot of `FIXMEs` laying around because there
is a lot more work to do here, but this lays a solid groundwork and can
accurately handle some trivial cases.
I've filed a swath of issues covering the deficiencies here and left the
issues in comments so that we can easily follow them.
One big change to make sooner rather than later is to move some of this
code into a new libLLVMFrontendHLSL so that we can share it with the
Clang CodeGen layer.
Reviewed By: python3kgae
Differential Revision: https://reviews.llvm.org/D134682
This is a split of D134250.
Supports for parsing and dumping the LC_DATA_IN_CODE contents (as binary
data).
This allows more complete testing of llvm-objdump in D133974.
Reviewed By: Higuoxing
Differential Revision: https://reviews.llvm.org/D134569
There are few changes mixed in here.
-Try to reuse the destination register from ADDI instead of always
creating a virtual register. This way we lean on the register
scavenger in fewer case.
-Explicitly reuse the primary virtual register when possible. There's
still a case where both getVLENFactoredAmount and handling large
fixed offsets can both create a secondary virtual register.
-Combine similar BuildMI calls by manipulating the Register variables.
There are still a couple early outs for ADDI, but overall I tried to
arrange the code into steps.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D135009
The old code took two different paths based on whether there is
a scalable offset, but these two paths had some code in common.
The main difference between the two code paths was whether we needed
to create a GPR or not for the ADDI that gets created for RVVSpill.
If we had a scalable offset, the same GPR was used as the destination
for adding the scalable offset and the ADDI. To manage this, we now
cache the scratch register and reuse it if it has already been created.
This is a pre-patch for D135009.
Reviewed By: reames, frasercrmck
Differential Revision: https://reviews.llvm.org/D135092
The `dumpExportEntry` was dumping everything using signed LEB128, but
the format seems to use unsigned LEB128. This can be cross-checked with
the implementation in MachOObjectFile.cpp, the implementation in LLD's
ExportTrie.cpp, and the implementation in macho2yaml.cpp, which all use
ULEB128 functions..
The difference is only apparent when encoding some values with specific
bit patterns (bit active in the 7th, 14th, ... bits of the binary). The
encoding was not always creating problems in the resulting binaries
because if the extra byte was part of the padding, the result of
decoding it as ULEB128 is the same as decoding as SLEB128, however, the
code of MachOObjectFile.cpp (used by llvm-objdump) checks the buffer
decoding position against the reported length, which triggered an error.
Modified a test that used an address with this pattern (0x3FA0, the 14th
bit is active), to show that a round trip still produces the same
results, and added a check using llvm-objdump to use their extra checks
to verify this implementation.
Reviewed By: pete
Differential Revision: https://reviews.llvm.org/D134563
https://alive2.llvm.org/ce/z/oShzr3
This was noted as a missing fold in D134876 (with additional
examples based on issue #58046).
I'm assuming that fmul with a zero operand is rare enough
that the use of ValueTracking will not noticeably increase
compile-time.
This adjusts a PowerPC codegen test that was added with D88388
because it would get folded away and no longer provide coverage
for the bug fix.
In the canonical form of the shuffle the poison/undef operand is the
second operand, the patch tries to emit canonical form for partial
vectorization of the buildvector sequence.
Also, this patch starts emitting freeze instruction for shuffles with undef indices if the second shuffle operan is undef, not poison. It is an initial step to D93818, where undef mask element are treated as returning poison value.
Differential Revision: https://reviews.llvm.org/D134377
The bitmask used to extract the bits assumed 16 bit elements and wasn't taking the size of the elements into account.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D135156
Fix a crash in the FMA combine added by D132837 and amended by D134810.
In cases where the newly created node could be folded, the combiner
would fail this assertion:
llc: DAGCombiner.cpp:268: void (anonymous namespace)::DAGCombiner::AddToWorklist(llvm::SDNode *): Assertion `N->getOpcode() != ISD::DELETED_NODE && "Deleted Node added to Worklist"' failed.
Differential Revision: https://reviews.llvm.org/D135150
If 'order(concurrent)' clause is specified, then the iterations of SIMD loop
can be executed concurrently.
This patch adds support for LLVM IR codegen via OMPIRBuilder for SIMD loop
with 'order(concurrent)' clause. The functionality added to OMPIRBuilder is
similar to the functionality implemented in 'CodeGenFunction::EmitOMPSimdInit'.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D134046
Signed-off-by: Dominik Adamski <dominik.adamski@amd.com>
Reapply with a fix for the case where an operand simplified back
to the original phi: We need to map this case to the new phi node.
-----
foldOpIntoPhi() currently only folds operations into the phi if all
but one operands constant-fold. The two exceptions to this are freeze
and select, where we allow more general simplification.
This patch makes foldOpIntoPhi() generally simplification based and
removes all the instruction-specific logic. We just try to simplify
the instruction for each operand, and for the (potentially) one
non-simplified operand, we move it into the new block with adjusted
operands.
This fixes https://github.com/llvm/llvm-project/issues/57448, which
was my original motivation for the change.
This currently does not make much of a difference (only one tests is
affected), but it is helpful e.g. for the out-of-tree CHERI target where
Builder.CreateMemCpy() can add attributes other than parameter alignment.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D135075
The helpers in BuildLibCalls normally expect that the Value
arguments already have the correct type (matching the lib call
signature). And exception has been emitFPutC which casted the Char
argument to 'int' using CreateIntCast. This patch moves the cast to
the caller instead of doing it inside emitFPutC.
I think it makes sense to make the BuildLibCall API:s a bit
more consistent this way, despite the need to handle the int cast
in two different places now.
Differential Revision: https://reviews.llvm.org/D135066
Stop assuming that an 'int' is 32 bits in helpers that emit libcalls
to lib functions that had 'int' in the signature. For most targets
this is NFC. For a target with 16 bit 'int' type this could help out
detecting if trying to emit a libcall with incorrect signature.
Similarly we now derive the type mapping to 'size_t' by asking TLI
about the size of 'size_t'. This should be NFC (at least for in-tree
targets) since getSizeTSize(), in TLI, is deriving the size in the
same way as DataLayout::getIntPtrType().
Differential Revision: https://reviews.llvm.org/D135065
Lots of BuildLibCalls helpers are using Builder::getInt32Ty to get
a type matching an 'int', and DataLayout::getIntPtrType to get a
type matching 'size_t'. The former is not true for all targets, since
and 'int' isn't always 32 bits. And the latter is a bit weird as well
as the definition of DataLayout::getIntPtrType isn't clearly mapping
it to 'size_t'.
This patch is not aiming at solving any such problems. It is merely
highlighting when a libcall is expecting to use 'int' and 'size_t'
by naming the types as IntTy and SizeTTy when preparing the type
signatures for the emitted libcalls.
Differential Revision: https://reviews.llvm.org/D135064
Use LoopAccessInfoManager directly instead of various GetLAA lambdas.
Depends on D134608.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D134609
If nonnull is already set, we currently skip setting both nonnull
and dereferenceable. Make these independent, to avoid regressions
when additional nonnull attributes are inferred earlier.
foldOpIntoPhi() currently only folds operations into the phi if all
but one operands constant-fold. The two exceptions to this are freeze
and select, where we allow more general simplification.
This patch makes foldOpIntoPhi() generally simplification based and
removes all the instruction-specific logic. We just try to simplify
the instruction for each operand, and for the (potentially) one
non-simplified operand, we move it into the new block with adjusted
operands.
This fixes https://github.com/llvm/llvm-project/issues/57448, which
was my original motivation for the change.
Simplify LoopAccessLegacyAnalysis by using LoopAccessInfoManager from
D134606. As a side-effect this also removes printing support from
LoopAccessLegacyAnalysis.
Depends on D134606.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D134608
One of the sources is the same size as the destination so that source
doesn't have an overlap with the destination register. By using the _TIED
form we avoid an early clobber contraint for that source.
This matches what was already done for instrinsics. ConvertToThreeAddress
will fix it if it can't stay tied.