This is a first step towards high level representation for fp8 types
that have been built in to hardware with near term roadmaps. Like the
BFLOAT16 type, the family of fp8 types are inspired by IEEE-754 binary
floating point formats but, due to the size limits, have been tweaked in
various ways in order to maximally use the range/precision in various
scenarios. The list of variants is small/finite and bounded by real
hardware.
This patch introduces the E5M2 FP8 format as proposed by Nvidia, ARM,
and Intel in the paper: https://arxiv.org/pdf/2209.05433.pdf
As the more conformant of the two implemented datatypes, we are plumbing
it through LLVM's APFloat type and MLIR's type system first as a
template. It will be followed by the range optimized E4M3 FP8 format
described in the paper. Since that format deviates further from the
IEEE-754 norms, it may require more debate and implementation
complexity.
Given that we see two parts of the FP8 implementation space represented
by these cases, we are recommending naming of:
* `F8M<N>` : For FP8 types that can be conceived of as following the
same rules as FP16 but with a smaller number of mantissa/exponent
bits. Including the number of mantissa bits in the type name is enough
to fully specify the type. This naming scheme is used to represent
the E5M2 type described in the paper.
* `F8M<N>F` : For FP8 types such as E4M3 which only support finite
values.
The first of these (this patch) seems fairly non-controversial. The
second is previewed here to illustrate options for extending to the
other known variant (but can be discussed in detail in the patch
which implements it).
Many conversations about these types focus on the Machine-Learning
ecosystem where they are used to represent mixed-datatype computations
at a high level. At that level (which is why we also expose them in
MLIR), it is important to retain the actual type definition so that when
lowering to actual kernels or target specific code, the correct
promotions, casts and rescalings can be done as needed. We expect that
most LLVM backends will only experience these types as opaque `I8`
values that are applicable to some instructions.
MLIR does not make it particularly easy to add new floating point types
(i.e. the FloatType hierarchy is not open). Given the need to fully
model FloatTypes and make them interop with tooling, such types will
always be "heavy-weight" and it is not expected that a highly open type
system will be particularly helpful. There are also a bounded number of
floating point types in use for current and upcoming hardware, and we
can just implement them like this (perhaps looking for some cosmetic
ways to reduce the number of places that need to change). Creating a
more generic mechanism for extending floating point types seems like it
wouldn't be worth it and we should just deal with defining them one by
one on an as-needed basis when real hardware implements a new scheme.
Hopefully, with some additional production use and complete software
stacks, hardware makers will converge on a set of such types that is not
terribly divergent at the level that the compiler cares about.
(I cleaned up some old formatting and sorted some items for this case:
If we converge on landing this in some form, I will NFC commit format
only changes as a separate commit)
Differential Revision: https://reviews.llvm.org/D133823
1. Save typed pointer type for GlobalVariable/Function instead of the ObjectType.
This will allow use GlobalVariable/Function as value.
2. Save target type for global ctors for Constant.
3. In DXILBitcodeWriter::getTypeID, check PointerMap first for Constant case.
Reviewed By: beanz
Differential Revision: https://reviews.llvm.org/D133283
Functions that implement expansion of response and config files depend
on many options, which are passes as arguments. Extending the expansion
requires new options, it in turn causes changing calls in various places
making them even more bulky.
This change introduces a class ExpansionContext, which represents set of
options that control the expansion. Its methods implements expansion of
responce files including config files. It makes extending the expansion
easier.
No functional changes.
Differential Revision: https://reviews.llvm.org/D132379
Add a utility function which returns true if the given value is a constant
false value.
This is necessary to port one of the compare simplifications in
TargetLowering::SimplifySetCC.
Differential Revision: https://reviews.llvm.org/D91754
This commit adds in two new features to the ML regalloc eviction
analysis that can be used in ML models, a vector of MBB frequencies and
a vector of indicies mapping instructions to their corresponding basic
blocks. This will allow for further experimentation with per-instruction
features and give a lot more flexibility for future experimentation over
how we're extracting MBB frequency data currently.
Reviewed By: mtrofin, jacobhegna
Differential Revision: https://reviews.llvm.org/D134166
Functions that implement expansion of response and config files depend
on many options, which are passes as arguments. Extending the expansion
requires new options, it in turn causes changing calls in various places
making them even more bulky.
This change introduces a class ExpansionContext, which represents set of
options that control the expansion. Its methods implements expansion of
responce files including config files. It makes extending the expansion
easier.
No functional changes.
Differential Revision: https://reviews.llvm.org/D132379
Previous commit 8b00b24f85 missed to add `int_ceil` anchor for the
llvm.ceil.* section under LangRef.rst
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D134586
A given function is compatible with all previous arch versions.
To avoid compering values of the attribute this logic adds all predecessor
architecture values.
Reviewed By: dmgreen, DavidSpickett
Differential Revision: https://reviews.llvm.org/D134353
It allows finding all intervals that overlap with any given point.
At this time, it does not support any deletion or rebalancing
operations.
The IntervalTree is designed to be set up once, and then queried
without any further additions.
Reviewed By: psamolysov, probinson
Differential Revision: https://reviews.llvm.org/D125776
Add vp.maxnum and vp.minnum which are vector predicted intrinsics of llvm.maxnum
and llvm.minnum.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D134639
Prefixing the the SubArch with plus sign makes the ArchFeature name.
Reviewed By: DavidSpickett
Differential Revision: https://reviews.llvm.org/D134349
This patch adds support for if clause to task construct in OpenMP
IRBuilder.
Reviewed By: raghavendhra
Differential Revision: https://reviews.llvm.org/D130615
Spurious ref edges are ref edges that still exist in the call graph even
though the corresponding IR reference no longer exists. This can cause
issues when deleting a dead function which has a spurious ref edge
pointed at it because currently we expect the dead function's RefSCC to
be trivial.
In the case that the dead function's RefSCC is not trivial, remove all
ref edges from other nodes in the RefSCC to it.
Removing a ref edge can result in splitting RefSCCs. There's actually no
reason to revisit those RefSCCs because currently we only run passes on
SCCs, and we've already added all SCCs in the RefSCC to the worklist.
(as opposed to removing the ref edge in
updateCGAndAnalysisManagerForPass() which can modify the call graph of
SCCs we have not visited yet). We also don't expect that RefSCC
refinement will allow us to glean any more information for optimization
use. Also, doing so would drastically increase the complexity of
LazyCallGraph::removeDeadFunction(), requiring us to return a list of
invalidated RefSCCs and new RefSCCs to add to the worklist.
Fixes#56503
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D133907
They're roughly ARMv8.6. This works in the .td file, but in
AArch64TargetParser.def, marking them v8.6 brings in support for the SM4
cryptographic hash and we don't actually have that. So TargetParser side
they're marked as v8.5, with the extra features (BF16 and I8MM added manually).
Finally, A16 supports the HCX extension in addition to v8.6. This has no
TargetParser implications.
We were using the native triple to parse the benchmarks. Use the triple
from the benchmarks file.
Right now this still only allows analyzing files produced by the current
target until D133605 is in.
This also makes the `Analysis` class much less ad-hoc.
Differential Revision: https://reviews.llvm.org/D133697
Unicode 15.0 adds 4,489 characters, for a total of 149,186 characters.
These additions include 2 new scripts along with 20 new emoji characters,
and 4,193 CJK ideographs.
This changes modify most existing tables including
- XID_Start/XID_Continue in Clang
- The character name database (used by \N{} in Clang)
- The list of formattable/printable codepoints
- The case folding algorithm (which we had not updated since Unicode 9)
- The list of nonspacing/enclosing marks used by the column width
computation algorithm. The rest of the column width algorithm
is not updated.
Reviewed By: tahonermann
Differential Revision: https://reviews.llvm.org/D133807
I don't know what was going on originally with these tests. It seems reasonable
to have the immediate be the same byte alignment unit as the IR, in which case
we need to take the log2 in order to set the right number of low bits.
This fixes a miscompile in chromium.
Differential Revision: https://reviews.llvm.org/D134380
Providing access to the mapping of annotations allows test helpers to
be expressive by using the annotations as expectations. For example, a
matcher could verify that all annotated points were matched by a
matcher, or that an refactoring surgically modifies specific ranges.
Differential Revision: https://reviews.llvm.org/D134072
This patch deprecates llvm::empty as I've migrated all known uses of
llvm::empty(x) to x.empty().
Differential Revision: https://reviews.llvm.org/D134141
Serialized calls to void-wrapper-functions should have zero bytes of argument
data, but accessing ArgData[0] may (and will, in the case of SmallVector) fail
if the argument data buffer is empty.
This commit fixes the issue by adding a check for empty argument buffers.
I'm planning to deprecate and eventually remove llvm::empty.
Note that no use of llvm::empty requires the ability of llvm::empty to
determine the emptiness from begin/end only.
This patch adds in instruction based features to the regalloc advisor
gated behind a flag so a user can decide at runtime whether or not they
want to enable the feature. The features are only enabled when LLVM is
compiled in MLGO develpment mode (LLVM_HAVE_TF_API) is set to true.
To extract the instruction features, I'm taking a list of segments from
each LiveInterval and noting the start and end SlotIndices. This list is then
sorted based on the start SlotIndex and I iterate through each SlotIndex
to grab instructions, making sure to check for overlaps. This results in
a vector of opcodes and binary mapping matrix that maps live ranges to the
opcodes of the instructions within that LR.
Reviewed By: mtrofin
Differential Revision: https://reviews.llvm.org/D131930
This improves consistency with other places (e.g. llvm::compression::decompress,
llvm::object::Decompressor::decompress, llvm-objcopy).
Note: when zstd::uncompress was added, we noticed that the API `ZSTD_decompress`
is fine while the zlib API `uncompress` is a misnomer.
This reverts commit 01ffe31cbb.
A build breakage with GCC 7.3 has been reported:
https://reviews.llvm.org/D132311#3797053
FWIW, GCC 7.5 is OK according to Pavel Chupin. I also personally
tested GCC 8.4.0.
This NFC prepares the TimeProfiler to support the construction
and completion of time profiling 'entries' across threads.
Add ClockType alias so we can change the clock in one place.
(trivial) Use c++ usings instead of typedefs
Rename Entry to TimeTraceProfilerEntry since this type will eventually become public.
Add an intro comment.
Add some smoke unit tests.
Reviewed By: russell.gallop, rriddle, lattner, jloser
Differential Revision: https://reviews.llvm.org/D133153
There are two ctlz intrinsics here with the zero_is_poison flag
set. There are also two comparisons that check if either of the
inputs the ctlzs are zero. We need to use a logical or to block
the poison from the ctlz if either of the inputs is zero.
Reviewed By: arsenm, aqjune
Differential Revision: https://reviews.llvm.org/D130680
Currently, FunctionModRefBehavior tracks whether the function reads
or writes memory (ModRefInfo) and which locations it can access
(argmem, inaccessiblemem and other). This patch changes it to track
ModRef information per-location instead.
To give two examples of why this is useful:
* D117095 highlights a weakness of ModRef modelling in the presence
of operand bundles. For a memcpy call with deopt operand bundle,
we want to say that it can read any memory, but only write argument
memory. This would allow them to be treated like any other calls.
However, we currently can't express this and have to say that it
can read or write any memory.
* D127383 would ideally be modelled as a separate threadid location,
where threadid Refs outside pre-split coroutines can be ignored
(like other accesses to constant memory). The current representation
does not allow modelling this precisely.
The patch as implemented is intended to be NFC, but there are some
obvious opportunities for improvements and simplification. To fully
capitalize on this we would also want to change the way we represent
memory attributes on functions, but that's a larger change, and I
think it makes sense to separate out the FunctionModRefBehavior
refactoring.
Differential Revision: https://reviews.llvm.org/D130896
The mutation the action generates tries to change the input type into the
element type of larger vector type. This doesn't work if the larger element
type is a vector of pointers since it creates an illegal mutation between
scalar and pointer types.
Differential Revision: https://reviews.llvm.org/D133671
This patch adds a utility class that will be used in subsequent patches
for parsing the function/callsite attributes and determining whether
changes to PSTATE.SM are needed, or whether a lazy-save mechanism is
required.
It also implements some of the restrictions on the SME attributes
in the IR Verifier pass.
More details about the SME attributes and design can be found
in D131562.
Reviewed By: david-arm, aemerson
Differential Revision: https://reviews.llvm.org/D131570
In the Tensorflow C lib utilities, an error gets thrown if some features
haven't gotten passed into the model (due to differences in ordering
which now don't exist with the transition to TFLite). However, this is
not currently the case when using TFLiteUtils. This patch makes some
minor changes to throw an error when not all inputs of the model have
been passed, which when not handled will result in a seg fault within
TFLite.
Reviewed By: mtrofin
Differential Revision: https://reviews.llvm.org/D133451