mamba/docs/source/developer_zone/internals.rst

323 lines
10 KiB
ReStructuredText
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

Internals of mamba
==================
Mamba comes with a C++ core (for speed and efficiency), and a clean Python API on top. The core of mamba uses:
- ``libsolv`` to solve dependencies (the same library used in RedHat dnf and other Linux package managers)
- ``curl`` for reliable and fast downloads
- ``libarchive`` to extract the packages
.. _libsolv_internals:
libsolv
-------
pool
****
| The ``pool`` holds pointers of almost all the data manipulated in ``libsolv``.
Some are presented here:
- string pool ``Stringpool``
- relations of dependency ``Reldep``
- solvables/packages ``Solvable``
- repositories ``Repo``
- data about providers of a specific name or relation, as ``Id`` s and ``Offset`` s (resp. ``int`` and ``unsigned int``) for maximum performance
Sizes of allocated memory or offset to the first free slot are also stored.
.. mermaid::
:align: center
classDiagram
class Pool{
+Stringpool ss
+Reldep *rels
+int nrels
+Repo **repos
+int nrepos
+int urepos
+Repo *installed
+Solvable *solvables
+int nsolvables
+Offset *whatprovides
+Offset *whatprovides_rel
+Id *whatprovidesdata
+Offset whatprovidesdataoff
+int whatprovidesdataleft
...
}
Ids
###
Strings (package names, requirements, etc.) are converted to ``Id`` s using a hash table for efficient data manipulation (storage, usage).
- ``Id pool_str2id(Pool *pool, const char *str, int create)`` is used to get an ``Id`` from a string
- It eventually creates a fresh one if not existing
- ``const char *pool_id2str(const Pool *pool, Id id)`` is used to get back the string from ``Id``
.. warning::
Do not use ``solvable`` ``Id`` s with ``pool_id2str``, use the equivalent ``const char *pool_solvid2str(Pool *pool, Id p)``
Reldeps
#######
A ``Reldep`` describes a relation of dependency with:
- a ``name`` Id
- an epoque version ``evr`` Id
- some ``flags``: ``REL_CONDA`` in our case
.. mermaid::
:align: center
classDiagram
class Reldep{
+Id name
+Id evr
+int flags
}
An example could be ``{1, 2, 30}``, with:
- ``1`` the Id for ``"xtensor"``
- ``2`` the Id for ``">=0.20"``
- ``30`` the ``REL_CONDA`` flag
| The pool holds a ``Reldep *rels`` pointer on the first memory location and the size ``int nrels`` of allocated memory.
``Reldep`` are also handled with ``Id`` s that have bit 31 used as a flag to distinguish them from regular ``Id`` s.
There are multiple macros that help to convert those special ``Id`` s:
- ``MAKERELDEP(id)``: turn a regular ``Id`` into a ``Reldep`` ``Id``
- ``ISRELDEP(id)``: test the bit 31
- ``GETRELID(id)``: get the regular ``Id``
- ``GETRELDEP(pool, id)``: get the ``Reldep`` from its ``Id``
.. note::
``pool_id2str`` also works with ``Reldep`` ``Id`` s! But it will only returns the ``Reldep`` 's ``name``
Offsets
#######
An ``Offset`` represents a positive or negative shift of a pointer on an array.
For example, a *solvable* does not contain all its data but rather holds multiple offsets on its ``repo->idarraydata`` storage.
- ``idarraydata`` is an ``Id`` pointer
- ``provides``, ``requires``, etc. are offsets in ``idarraydata``
whatprovides
************
A ``provider`` is a *solvable* fulfilling a *specification*. The following definitions are key to disambiguate how ``libsolv`` works:
- a *package* is an identification of the resource handled: a name such as ``xtensor``
- a *solvable* is a specific version of the *package*. It can be assimilated to its tarball.
- in ``Mamba``, a *solvable* name MUST match the package name
- ``libsolv`` handles cases where *solvables* are providing different entities than what identified in their names (example: ``pynum`` providing ``numpy``)
- this is not used, but important to know to understand the terminology
- a *specification*: an expression to match *solvables* providing the same *package*
- the package name can be used to select all providers/*solvables*
- a more specific *spec* like ``s2="xtensor>=0.20"`` will only match a subset of *solvables*: the ones that have version >=0.20 (whatever the build string)
Let's take a simple example to recap:
- the package: ``xtensor``
- solvable(s):
- ``xtensor=0.20.10=hc9558a2_0``
- ``xtensor=0.23.10=h4bd325d_0``
- etc.
- specification(s):
- ``xtensor``
- ``xtensor>=0.20``
- etc.
.. note::
It's possible that a *package* is not provided by any *solvable*. It is then uninstallable.
.. mermaid::
:align: center
%%{init: {'themeVariables':{'edgeLabelBackground':'white'}}}%%
graph TD
subgraph " "
spec1((s1)) -.-> solvable1:::solvable
spec1 -.-> solvable2:::solvable
...:::empty
spec1 -.-> solvableN:::solvable
spec2((s2)) -.-> solvableN:::solvable
end
classDef solvable fill:#f96;
classDef empty fill:#ffffff00,stroke:#ffffff00;
.. note::
Specifications are stored as ``Id`` s (see the previous section)
.. warning::
While a ``solvable`` is a ``libsolv`` notion, ``specs`` and ``packages`` are not.
Storage
#######
| The free function ``void pool_createwhatprovides(Pool *pool)`` is used to create hashes over pool of solvables to ease provide lookups.
It computes and store what *solvables* provide each *spec*, using a two-step indirect list:
- from the spec ``Id``, an offset is computed in ``Id *whatprovidesdata``
- ``Offset *whatprovides`` stores regular string specs
- ``Offset *whatprovides_rel`` stores ``Reldep`` specs
- ``whatprovidesdata`` at the given offset is a 0-terminated list of *solvables* ``Id`` s, providing the spec ``Id``
.. image:: whatprovides.svg
:height: 300
:align: center
Lookups
#######
The ``pool_whatprovides(Pool *pool, Id d)`` function returns the offset of the first solvable ``Id`` in ``whatprovidesdata``:
.. mermaid::
:align: center
graph LR
A{"ISRELDEP(d)?"} -->|Yes| B1["whatprovides[d]"];
A -->|No| B2["whatprovides_rel[GETRELID(d)]"];
B1 --> C{0?};
B2 --> C;
C -->|Yes| D1["pool_addrelproviders[d]"];
C -->|No| D2((return));
D1 --> D2;
Rules
*****
A *rule* is all about *solvables*, it represents a logical disjunction ``OR`` between one or more literals.
Rules are created to translate in mathematical logic:
- a specification: installation/removal/updates
- ``(A)`` means ``A`` must be installed
- ``(-A)`` means ``A`` must be removed or kept uninstalled
- ``(A|B1|B2|...)`` means ``A`` could be updated with ``B1``, ``B2``, etc.
- a dependency: ``A`` needs/requires ``b`` (a *spec*)
- ``(-A|B1|B2|...)`` means ``A`` requires one of ``B1``, ``B2``, etc. (``b`` providers)
- an incompatibility: ``A1`` can't be installed alongside ``A2``
- ``(-A1|-A2), (-A1|-A3), ...`` means ``A1`` is not compatible with ``A2``, ``A3``, etc.
- this is a common case: multiple providers of the same package
- etc.
Still for efficiency, *rules* are storing ``Id`` s of *solvables* and *specs*:
- ``p`` is the package ``Id`` of ``A``
- ``d`` is the package ``Id`` offset into the list of providers (negative value means the rule is disabled)
- ``w1`` and ``w2`` are watches
- ``n1`` and ``n2`` are the next rules in linked-lists, corresponding to ``w1`` and ``w2``
.. mermaid::
:align: center
classDiagram
class Rule{
+Id p
+Id d
+Id w1
+Id w2
+Id n1
+Id n2
}
Dependencies
############
Each dependency is turned into a rule to be satisfied during the solving stage.
Example:
- ``p1`` and ``p2`` are 2 specs
- ``p1`` is provided by a single solvable ``s11``
- ``p2`` is provided by ``s21`` and ``s22``
.. mermaid::
:align: center
%%{init: {'themeVariables':{'edgeLabelBackground':'white'}}}%%
graph LR
subgraph p1 providers[ ]
p1((p1)):::package -.-> s11:::solvable
end
s11:::solvable --> p2((p2)):::package
subgraph p2 providers[ ]
p2((p2)):::package -.-> s21:::solvable
p2((p2)):::package -.-> s22:::solvable
end
classDef solvable fill:#f96;
The corresponding rule ``r`` is ``-s11|s21|s22``.
That means that taking the decision to install ``s11``:
- ``-s11`` is not satisfied
- either ``s21`` or ``s22`` need to be satisfied
.. note::
Exclusive rules are used to avoid installation of multiple solvables providing the same package
Watches
#######
| Watches are a way to link rules. They are triggered during a phase called ``propagation`` after decisions taken on previous rules for some solvable.
The possible decisions on solvables are ``installation`` or ``removal``/``conflict``, this is stored as resp. positive and negative Ids.
Related rules are then evaluated during another level of decision: those are the one with an opposite first literal.
Example:
- the install spec is to install ``p1`` provided by a single solvable ``s1``: ``r1=(s1)``
- ``s1`` depends on spec ``p2``, provided by ``s21`` or ``s22``: ``r2=(-s1|s21|s22)``
- decision to install ``s1`` triggers rule ``r2``
Transaction
***********
Another important part of libsolv is the ``Transaction``. A transaction governs what packages are installed or removed, and a transaction is the result of a successful solve process.
A transaction in libsolv is a single list (Queue) of Solvable ``Id's`` and is thus rather simple. The Queue contains either a positive or negative ``Id``. Each negative ``Id`` is ``uninstalled`` from the environment, and each positive ``Id`` is to be ``installed``. libsolv classifies the entire range of Id's into different types of transaction operations. For example, if we have ``{ -5, 5 }`` that would be a reinstall transaction for the Solvable with ``Id == 5``.
If the ``Id's`` are different then it can be a downgrade or upgrade operation (first, the previous package needs removal before the higher or lower version can be installed). The ``transaction_classify`` and ``transaction_classify_pkgs`` functions of libsolv take care of this classification to present a nice output to the user.
Another crucial libsolv function is ``transaction_order`` to order the transaction in a way that they are installed with the lowest dependency first (topological sort). This ensures that e.g. ``python`` is installed before any packages depending on ``python`` as they are sometimes needed during installation (for example for ``noarch`` packages with ``entry_points``).
Lastly, we can force installation or explicitly install from URL's by crafting transactions without using the solver just by adding the correct ``Id's`` into the Transaction queue.