Our current strategy of computing ranges of SCEVUnknown Phis was to simply
compute the union of ranges of all its inputs. In order to avoid infinite recursion,
we mark Phis as pending and conservatively return full set for them. As result,
even simplest patterns of cycled phis always have a range of full set.
This patch makes this logic a bit smarter. We basically do the same, but instead
of taking inputs of single Phi we find its strongly connected component (SCC)
and compute the union of all inputs that come into this SCC from outside.
Processing entire SCC together has one more advantage: we can set range for all
of them at once, because the only thing that happens to them is the same value is
being passed between those Phis. So, despite we spend more time analyzing a
single Phi, overall we may save time by not processing other SCC members, so
amortized compile time spent should be approximately the same.
Differential Revision: https://reviews.llvm.org/D110620
Reviewed By: reames
The goal is support tail and mask policy in RVV builtins.
We focus on IR part first.
If the passthru operand is undef, we use tail agnostic, otherwise
use tail undisturbed.
Reviewed By: rogfer01
Differential Revision: https://reviews.llvm.org/D119686
`UsedAssumedInformation` is a return argument utilized to determine what
information is known. Most APIs used it already but
`genericValueTraversal` did not. This adds it to `genericValueTraversal`
and replaces `AllCallSitesKnown` of `checkForAllCallSites` with the
commonly used `UsedAssumedInformation`.
This was supposed to be a NFC commit, then the test change appeared.
Turns out, we had one user of `AllCallSitesKnown` (AANoReturn) and the
way we set `AllCallSitesKnown` was wrong as we ignored the fact some
call sites were optimistically assumed dead. Included a dedicated test
for this as well now.
Fixes https://github.com/llvm/llvm-project/issues/53884
It's not particularly user-friendly to have to call `initLRU` everywhere. Also,
it wasn't particularly great that the LRU for registers used in a sequence was
also initialized by `initLRU`.
This patch hides this stuff behind some helper functions:
* `isAvailableAcrossAndOutOfSeq`
* `isAnyUnavailableAcrossOrOutOfSeq`
* `isAvailableInsideSeq`
This allows the user to avoid calling `initLRU` explicitly. Also, it allows
us to separate initializing the used-in-sequence LRU from the main LRU.
Since both ARM and AArch64 check LR liveness in `insertOutlinedCall`, this
refactor requires that we de-const the Candidate there.
Some other quality-of-code improvements:
* LRUs in outliner::Candidate now have more descriptive names
* Use `Register` instead of `unsigned` in some places
* Improve readability in some places by using ranges rather than `std::for_each`
This is a preparatory commit for a larger compile time related change for the
AArch64 outliner.
In a prior review I was asked to move the helper function canIgnoreSNaN()
out to FPEnv.h. This wasn't possible at the time because that function
needs the fast math flags, and including them includes lots of other stuff
that isn't needed.
This patch moves the fast math flags out into a new FMF.h file unchanged,
and moves the helper function out to FPEnv.h also unchanged. This ticket
only moves code around.
Differential Revision: https://reviews.llvm.org/D119752
The copy and pristine versions of Utility diverged and one didn't
include <algorithm>. As that's a rather large header, let's just open
code the comparisons.
Reviewed By:
Differential Revision: https://reviews.llvm.org/
This relands 73e585e44d (and 0574b5fc65), with a fix for
the failing test (by using Optional<StringRef>s instead of
making StringRef::empty() mean absence of value).
Differential Revision: https://reviews.llvm.org/D118070
parseNestedName's main loop allowed parsing a grammar that was more
flexible than the actual grammar. This refactors that to rule out
some more incorrect manglings.
1) The 'L' extension only applies to unqualified-name components, so
check it just there.
2) The 'M' suffix is, AFAICT, removed from the grammar. Rather than
eliminate it, let's parse it after we've parsed a component.
Added some additional bad mangling tests, which are now rejected.
I don't break the 'T' and 'D[tT]' cases out of the loop, even though
they can only appear at first position, as it seems simpler to just
check there is nothing SoFar.
Reviewed By: ChuanqiXu
Differential Revision: https://reviews.llvm.org/D119542
The goal is support tail and mask policy in RVV builtins.
We focus on IR part first.
If the passthru operand is undef, we use tail agnostic, otherwise
use tail undisturbed.
My plan is to handle more complex operations in follow-up patches.
Reviewers: frasercrmck
Differential Revision: https://reviews.llvm.org/D118253
The goal is support tail and mask policy in RVV builtins.
We focus on IR part first.
If the passthru operand is undef, we use tail agnostic, otherwise
use tail undisturbed.
Add passthru operand for VSLIDE1UP_VL and VSLIDE1DOWN_VL to support
i64 scalar in rv32.
The masked VSLIDE1 would only emit mask undisturbed policy regardless
of giving mask agnostic policy until InsertVSETVLI supports mask agnostic.
Reviewed by: craig.topper, rogfer01
Differential Revision: https://reviews.llvm.org/D117989
vp.select|merge both select lanes based on a condition mask. Unlike
other VP intrinsics the lanes are defined where the condition mask is
false. Hence, the condition mask in vp.select|mask is not a mask in the
sense of VP intrinsics. By doing not treating the condition mask
specially, vp.select becomes the canonical VP translation of the select
instruction.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D118981
Since https://reviews.llvm.org/D53892 it is possible to emit a custom stackmap by overwriting the emitStackMaps method of GCMetadataPrinter. That way even AOT compilers can generate a more efficient and more suitable format for their needs.
This patch updates documentation and stale comments in source code. In particular it removes the issue from the issue list in the Statepoints documentation and adjusts comments in GCStrategy.
Differential Revision: https://reviews.llvm.org/D119660
As usual with that header cleanup series, some implicit dependencies now need to
be explicit:
llvm/DebugInfo/DWARF/DWARFContext.h no longer includes:
- "llvm/DebugInfo/DWARF/DWARFAcceleratorTable.h"
- "llvm/DebugInfo/DWARF/DWARFCompileUnit.h"
- "llvm/DebugInfo/DWARF/DWARFDebugAbbrev.h"
- "llvm/DebugInfo/DWARF/DWARFDebugAranges.h"
- "llvm/DebugInfo/DWARF/DWARFDebugFrame.h"
- "llvm/DebugInfo/DWARF/DWARFDebugLoc.h"
- "llvm/DebugInfo/DWARF/DWARFDebugMacro.h"
- "llvm/DebugInfo/DWARF/DWARFGdbIndex.h"
- "llvm/DebugInfo/DWARF/DWARFSection.h"
- "llvm/DebugInfo/DWARF/DWARFTypeUnit.h"
- "llvm/DebugInfo/DWARF/DWARFUnitIndex.h"
Plus llvm/Support/Errc.h not included by a bunch of llvm/DebugInfo/DWARF/DWARF*.h files
Preprocessed lines to build llvm on my setup:
after: 1065629059
before: 1066621848
Which is a great diff!
Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup
Differential Revision: https://reviews.llvm.org/D119723
The new encoder directive can be used to specify custom encoder for a
single operand or slice. This is different from the EncoderMethod field
within an Operand, which affects every operands in the target.
In addition, this patch also changes the function signature of the
encoder method -- a new argument, InsertPost, is added to both the
default one (i.e. getMachineValue) and the custom one. This argument
provides the bit position where the operand will eventually be inserted.
Differential Revision: https://reviews.llvm.org/D119100
This reverts commit e6999040f5.
Update test to fix signed int comparison warning, fix whitespace in
compiler-rt MIBEntryDef.inc file.
Differential Revision: https://reviews.llvm.org/D117256
Currently you can run the DWARF verifier on the linked dsymutil output.
This patch extends this functionality and makes it possible to
run the DWARF verifier on the input as well.
A new option --verify-dwarf allows you to specify input, output, all and
none. The existing --verify flag remains unchanged and acts and alias
for --verify-dwarf=output.
Input verification issues do not result in a non-zero exit code because
dsymutil is capable of taking invalid DWARF as input and producing valid
DWARF as output.
Differential revision: https://reviews.llvm.org/D89216
This reverts commit 857ec0d01f.
Fixes -DLLVM_ENABLE_MODULES=On build by adding the new textual
header to the modulemap file.
Reviewed in https://reviews.llvm.org/D117722
Separate MCRegisterInfo::regsOverlap out from
TargetRegisterInfo::regsOverlap. This is useful in the AMDGPU AsmParser
where we only have access to MCRegisterInfo.
Differential Revision: https://reviews.llvm.org/D119533
This introduces a new "ptrauth" operand bundle to be used in
call/invoke. At the IR level, it's semantically equivalent to an
@llvm.ptrauth.auth followed by an indirect call, but it additionally
provides additional hardening, by preventing the intermediate raw
pointer from being exposed.
This mostly adds the IR definition, verifier checks, and support in
a couple of general helper functions. Clang IRGen and backend support
will come separately.
Note that we'll eventually want to support this bundle in indirectbr as
well, for similar reasons. indirectbr currently doesn't support bundles
at all, and the IR data structures need to be updated to allow that.
Differential Revision: https://reviews.llvm.org/D113685
This patch changes the argument from template-IRBuilder to IRBuilderBase
thus allowing us to write less code while getting the location from a
builder.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D119717
This reverts commit 0f73fb18ca.
Use llvm/Profile/MIBEntryDef.inc instead of relative path.
Generated the raw profile data with `-mllvm
-enable-name-compression=false` so that builbots where the reader is
built without zlib do not fail.
Also updated the test build instructions.
This patch adds support for optional memory profile information to be
included with and indexed profile. The indexed profile header adds a new
field which points to the offset of the memory profile section (if
present) in the indexed profile. For users who do not utilize this
feature the only overhead is a 64-bit offset in the header.
The memory profile section contains (1) profile metadata describing the
information recorded for each entry (2) an on-disk hashtable containing
the profile records indexed via llvm::md5(function_name). We chose to
introduce a separate hash table instead of the existing one since the
indexing for the instrumented fdo hash table is based on a CFG hash
which itself is perturbed by memprof instrumentation.
Differential Revision: https://reviews.llvm.org/D118653
While the contents of the profile are backwards compatible the header
itself is not. For example, when adding new fields to the header results
in significant issues. This change adds allows for portable
instantiation of the header across indexed format versions.
Differential Revision: https://reviews.llvm.org/D118390
Use the macro based format to add a wrapper around the MemInfoBlock
when stored in the MemProfRecord. This wrapped block can then be
serialized/deserialized based on a schema specified by a list of enums.
Differential Revision: https://reviews.llvm.org/D117256
This patch refactors out the MemInfoBlock definition into a macro based
header which can be included to generate enums, structus and code for
each field recorded by the memprof profiling runtime.
Differential Revision: https://reviews.llvm.org/D117722
We have the `clang -cc1` command-line option `-funwind-tables=1|2` and
the codegen option `VALUE_CODEGENOPT(UnwindTables, 2, 0) ///< Unwind
tables (1) or asynchronous unwind tables (2)`. However, this is
encoded in LLVM IR by the presence or the absence of the `uwtable`
attribute, i.e. we lose the information whether to generate want just
some unwind tables or asynchronous unwind tables.
Asynchronous unwind tables take more space in the runtime image, I'd
estimate something like 80-90% more, as the difference is adding
roughly the same number of CFI directives as for prologues, only a bit
simpler (e.g. `.cfi_offset reg, off` vs. `.cfi_restore reg`). Or even
more, if you consider tail duplication of epilogue blocks.
Asynchronous unwind tables could also restrict code generation to
having only a finite number of frame pointer adjustments (an example
of *not* having a finite number of `SP` adjustments is on AArch64 when
untagging the stack (MTE) in some cases the compiler can modify `SP`
in a loop).
Having the CFI precise up to an instruction generally also means one
cannot bundle together CFI instructions once the prologue is done,
they need to be interspersed with ordinary instructions, which means
extra `DW_CFA_advance_loc` commands, further increasing the unwind
tables size.
That is to say, async unwind tables impose a non-negligible overhead,
yet for the most common use cases (like C++ exceptions), they are not
even needed.
This patch extends the `uwtable` attribute with an optional
value:
- `uwtable` (default to `async`)
- `uwtable(sync)`, synchronous unwind tables
- `uwtable(async)`, asynchronous (instruction precise) unwind tables
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D114543
We have to special-case 'u 8__uuidof [tz]' demangling for legacy
support. That handling is a little duplicative.
* It seems better to just push the single expected node.
* We can also use 'consumeIf' rather than open-coding the peeking and increment.
* We don't need the numLeft < 2 check, as if there are few than that
other paths will end up with detecting the error.
FWIW This simplifies a future change adding operator precedence.
Reviewed By: ChuanqiXu
Differential Revision: https://reviews.llvm.org/D119543
The output buffer growth algorithm had a few issues:
a) An off-by-one error in the initial size check, which uses
'>='. This error was safe, but could cause us to reallocate when there
was no need.
b) An inconsistency between the initial size check (>=) and the
post-doubling check (>). The latter was somewhat obscured by the
swapped operands.
c) There would be many reallocs with an initially-small buffer. Add a
little initialization hysteresis.
Reviewed By: ChuanqiXu
Differential Revision: https://reviews.llvm.org/D119177
This was deprecated before the LLVM 14 branch cut, remove the
method now.
As a temporary workaround, Type::getPointerElementType() can be
used instead.
See https://llvm.org/docs/OpaquePointers.html for information on
the opaque pointers migration.