phono3py/doc/pypolymlp.md

24 KiB

(pypolymlp-interface)=

Force constants calculation using pypolymlp (machine learning potential)

With the --pypolymlp option, phono3py can interface with the polynomial machine learning potential (MLP) code, pypolymlp, to perform training and evaluation tasks of MLPs. This feature aims to reduce the computational cost of anharmonic force constant calculations by using MLPs as an intermediary layer, efficiently representing atomic interactions.

The training process involves using a dataset consisting of supercell displacements, forces, and energies. The trained MLPs are then employed to compute forces for supercells with specific displacements.

For further details on combining phono3py calculations with pypolymlp, refer to A. Togo and A. Seko, J. Chem. Phys. 160, 211001 (2024) [doi] [arxiv].

Examples of its usage can be found in the example/NaCl-pypolymlp and example/AlN-rd directories in the distribution from GitHub or PyPI.

Citation of pypolymlp

"Tutorial: Systematic development of polynomial machine learning potentials for elemental and alloy systems", A. Seko, J. Appl. Phys. 133, 011101 (2023).

@article{pypolymlp,
    author = {Seko, Atsuto},
    title = "{"Tutorial: Systematic development of polynomial machine learning potentials for elemental and alloy systems"}",
    journal = {J. Appl. Phys.},
    volume = {133},
    number = {1},
    pages = {011101},
    year = {2023},
    month = {01},
}

Requirements

  • pypolymlp

    For linux (x86-64), a compiled package of pypolymlp can be installed via conda-forge (recommended). Otherwise, pypolymlp can be installed from source-code.

How to calculate

Workflow

  1. Generate random displacements in supercells. Use {ref}--rd <random_displacements_option> option.
  2. Calculate corresponding forces and energies in supercells. Use of VASP interface is recommended for {ref}--sp <sp_option> option is supported.
  3. Prepare dataset composed of displacements, forces, and energies in supercells. The dataset must be stored in a phono3py-yaml-like file, e.g., phono3py_params.yaml. Use {ref}--cf3 <cf3_option> and {ref}--sp <sp_option> option simultaneously.
  4. Develop MLPs. By default, 90 and 10 percents of the dataset are used for the training and test, respectively. At this step polymlp.yaml is saved.
  5. Generate displacements in supercells either systematic or random displacements.
  6. Evaluate MLPs for forces of the supercells generated in step 5.
  7. Calculate force constants from displacement-force dataset from steps 5 and 6.

The steps 4-7 are executed in running phono3py with --pypolymlp option.

Steps 1-3: Dataset preparation

For the training, the following supercell data are required in the phono3py setting to use pypolymlp:

  • Displacements
  • Forces
  • Total energies

These data must be stored in phono3py.yaml-like file.

The supercells with displacements are generated by

% phono3py --pa auto --rd 100 -c POSCAR-unitcell --dim 2 2 2
        _                      _____
  _ __ | |__   ___  _ __   ___|___ / _ __  _   _
 | '_ \| '_ \ / _ \| '_ \ / _ \ |_ \| '_ \| | | |
 | |_) | | | | (_) | | | | (_) |__) | |_) | |_| |
 | .__/|_| |_|\___/|_| |_|\___/____/| .__/ \__, |
 |_|                                |_|    |___/
                                           3.5.0

-------------------------[time 2024-09-19 14:40:00]-------------------------
Compiled with OpenMP support (max 10 threads).
Python version 3.12.4
Spglib version 2.5.0

Unit cell was read from "POSCAR-unitcell".
-------------------------------- unit cell ---------------------------------
Lattice vectors:
  a    5.603287477054753    0.000000000000000    0.000000000000000
  b    0.000000000000000    5.603287477054753    0.000000000000000
  c    0.000000000000000    0.000000000000000    5.603287477054753
Atomic positions (fractional):
    1 Na  0.00000000000000  0.00000000000000  0.00000000000000  22.990
    2 Na  0.00000000000000  0.50000000000000  0.50000000000000  22.990
    3 Na  0.50000000000000  0.00000000000000  0.50000000000000  22.990
    4 Na  0.50000000000000  0.50000000000000  0.00000000000000  22.990
    5 Cl  0.50000000000000  0.50000000000000  0.50000000000000  35.453
    6 Cl  0.50000000000000  0.00000000000000  0.00000000000000  35.453
    7 Cl  0.00000000000000  0.50000000000000  0.00000000000000  35.453
    8 Cl  0.00000000000000  0.00000000000000  0.50000000000000  35.453
----------------------------------------------------------------------------
Supercell (dim): [2 2 2]
Primitive matrix:
  [0.  0.5 0.5]
  [0.5 0.  0.5]
  [0.5 0.5 0. ]
Displacement distance: 0.03
Number of displacements: 100
NAC parameters were read from "BORN".
Spacegroup: Fm-3m (225)
Displacement dataset was written in "phono3py_disp.yaml".
-------------------------[time 2024-09-19 14:40:00]-------------------------
                 _
   ___ _ __   __| |
  / _ \ '_ \ / _` |
 |  __/ | | | (_| |
  \___|_| |_|\__,_|

For the generated supercells, forces and energies are calculated. Here it is assumed to use the VASP code. Once the calculations are complete, the data (forces and energies) can be extracted using the following command:

% phono3py --sp --cf3 vasprun_xmls/vasprun-{00001..00100}.xml

This command extracts the necessary data and stores it in the phono3py_params.yaml file. For more details, refer to the description of the {ref}--sp <sp_option> option. Currently, supercell energy extraction from calculator outputs is only supported when using the VASP interface.

A set of the VASP calculation results is placed in `example/NaCl-rd`. It is
obtained by

```bash
% tar xvfa ../NaCl-rd/vasprun_xmls.tar.xz
```

Step 4: Development of MLPs

The phono3py_params.yaml file contains the training data required for developing polynomial MLPs when running with the --pypolymlp option.

 phono3py-load --pypolymlp phono3py_params.yaml
        _                      _____
  _ __ | |__   ___  _ __   ___|___ / _ __  _   _
 | '_ \| '_ \ / _ \| '_ \ / _ \ |_ \| '_ \| | | |
 | |_) | | | | (_) | | | | (_) |__) | |_) | |_| |
 | .__/|_| |_|\___/|_| |_|\___/____/| .__/ \__, |
 |_|                                |_|    |___/
                                      3.18.0

-------------------------[time 2025-07-26 13:59:10]-------------------------
Compiled with OpenMP support (max 10 threads).
Running in phono3py.load mode.
Python version 3.13.3
Spglib version 2.6.1
----------------------------- General settings -----------------------------
Run mode: pypolymlp
HDF5 data compression filter: gzip
Crystal structure was read from "phono3py_params.yaml".
Supercell (dim): [2 2 2]
Primitive matrix:
  [0.  0.5 0.5]
  [0.5 0.  0.5]
  [0.5 0.5 0. ]
Spacegroup: Fm-3m (225)
Use -v option to watch primitive cell, unit cell, and supercell structures.
NAC parameters were read from "phono3py_params.yaml".
Displacement dataset for fc3 was read from "phono3py_params.yaml".
----------------------------- pypolymlp start ------------------------------
Pypolymlp version 0.12.9
Pypolymlp is a generator of polynomial machine learning potentials.
Please cite the paper: A. Seko, J. Appl. Phys. 133, 011101 (2023).
Pypolymlp is developed at https://github.com/sekocha/pypolymlp.
Parameters:
  cutoff: 8.0
  model_type: 3
  max_p: 2
  gtinv_order: 3
  gtinv_maxl: (8, 8)
  gaussian_params1: (1.0, 1.0, 1)
  gaussian_params2: (0.0, 7.0, 10)
Developing MLPs by pypolymlp...
MLPs were written into "polymlp.yaml"
------------------------------ pypolymlp end -------------------------------
Generate displacements (--rd or -d) for proceeding to phonon calculations.
Summary of calculation was written in "phono3py.yaml".
-------------------------[time 2025-07-26 14:00:12]-------------------------
                 _
   ___ _ __   __| |
  / _ \ '_ \ / _` |
 |  __/ | | | (_| |
  \___|_| |_|\__,_|

Information about the development of MLPs using pypolymlp is provided between the pypolymlp start and pypolymlp end sections. The polynomial MLPs are saved in the polymlp.yaml file, which can be reused in subsequent phono3py executions with the --pypolymlp option when only displacements (and no forces) are provided.

(systematic displacements)

Steps 5-7: Force constants calculation (systematic displacements in step 5)

With the -d option, displacements are systematically generated while taking crystal symmetry into account. When running with the --pypolymlp option, MLPs are read from polymlp.yaml if the file exists. In this case, training data is no longer required, and files such as phono3py.yaml can be used as the input structure file.

% phono3py-load --pypolymlp -d phono3py.yaml
        _                      _____
  _ __ | |__   ___  _ __   ___|___ / _ __  _   _
 | '_ \| '_ \ / _ \| '_ \ / _ \ |_ \| '_ \| | | |
 | |_) | | | | (_) | | | | (_) |__) | |_) | |_| |
 | .__/|_| |_|\___/|_| |_|\___/____/| .__/ \__, |
 |_|                                |_|    |___/
                                      3.18.0

-------------------------[time 2025-07-26 14:00:49]-------------------------
Compiled with OpenMP support (max 10 threads).
Running in phono3py.load mode.
Python version 3.13.3
Spglib version 2.6.1
----------------------------- General settings -----------------------------
Run mode: pypolymlp + force constants
HDF5 data compression filter: gzip
Crystal structure was read from "phono3py.yaml".
Supercell (dim): [2 2 2]
Primitive matrix:
  [0.  0.5 0.5]
  [0.5 0.  0.5]
  [0.5 0.5 0. ]
Spacegroup: Fm-3m (225)
Use -v option to watch primitive cell, unit cell, and supercell structures.
NAC parameters were read from "phono3py.yaml".
----------------------------- pypolymlp start ------------------------------
Pypolymlp version 0.12.9
Pypolymlp is a generator of polynomial machine learning potentials.
Please cite the paper: A. Seko, J. Appl. Phys. 133, 011101 (2023).
Pypolymlp is developed at https://github.com/sekocha/pypolymlp.
Load MLPs from "polymlp.yaml".
------------------------------ pypolymlp end -------------------------------
Generate displacements
  Displacement distance: 0.01
Evaluate forces in 292 supercells by pypolymlp
Dataset generated using MLPs was written in "phono3py_mlp_eval_dataset.yaml".
----------------------------- Force constants ------------------------------
Computing fc3[ 1, x, x ] using numpy.linalg.pinv.
Displacements (in Angstrom):
    [ 0.0100  0.0000  0.0000]
    [-0.0100  0.0000  0.0000]
Computing fc3[ 33, x, x ] using numpy.linalg.pinv.
Displacements (in Angstrom):
    [ 0.0100  0.0000  0.0000]
    [-0.0100  0.0000  0.0000]
Expanding fc3.
Symmetrizing fc3 by symfc projector.
Symfc version 1.5.3 (https://github.com/symfc/symfc)
Citation: A. Seko and A. Togo, Phys. Rev. B, 110, 214302 (2024)
Symmetrizing fc2 by symfc projector.
Symfc version 1.5.3 (https://github.com/symfc/symfc)
Citation: A. Seko and A. Togo, Phys. Rev. B, 110, 214302 (2024)
Max drift of fc3: 0.00000000 (zyz) 0.00000000 (yzz) 0.00000000 (yzz)
Max drift of fc2: -0.00000000 (yy) -0.00000000 (yy)
fc3 was written into "fc3.hdf5".
fc2 was written into "fc2.hdf5".
--------------------------- Calculation settings ---------------------------
Non-analytical term correction (NAC): True
NAC unit conversion factor:  14.39965
BZ integration: Tetrahedron-method
Temperatures: 0.0  300.0
Cutoff frequency: 0.01
Frequency conversion factor to THz:  15.63330
----------- None of ph-ph interaction calculation was performed. -----------
Summary of calculation was written in "phono3py.yaml".
-------------------------[time 2025-07-26 14:00:58]-------------------------
                 _
   ___ _ __   __| |
  / _ \ '_ \ / _` |
 |  __/ | | | (_| |
  \___|_| |_|\__,_|

After the MLPs are read, systematic displacements, such as those involving the displacement of one or two atoms in supercells, are generated with a displacement distance of 0.01 Angstrom. The forces for these supercells are then evaluated using pypolymlp. Both the generated displacements and the corresponding forces are stored in the phono3py_mlp_eval_dataset.yaml file.

Steps 5-7: Force constants calculation (random displacements in step 5)

Random displacements are generated by specifying {ref}--rd <random_displacements_option> option. When running with the --pypolymlp option, MLPs are read from polymlp.yaml if the file exists. In this case, training data is no longer required, and files such as phono3py.yaml can be used as the input structure file.

 % phono3py-load --pypolymlp --rd auto phono3py.yaml
        _                      _____
  _ __ | |__   ___  _ __   ___|___ / _ __  _   _
 | '_ \| '_ \ / _ \| '_ \ / _ \ |_ \| '_ \| | | |
 | |_) | | | | (_) | | | | (_) |__) | |_) | |_| |
 | .__/|_| |_|\___/|_| |_|\___/____/| .__/ \__, |
 |_|                                |_|    |___/
                                      3.18.0

-------------------------[time 2025-07-26 14:02:24]-------------------------
Compiled with OpenMP support (max 10 threads).
Running in phono3py.load mode.
Python version 3.13.3
Spglib version 2.6.1
----------------------------- General settings -----------------------------
Run mode: pypolymlp + force constants
HDF5 data compression filter: gzip
Crystal structure was read from "phono3py.yaml".
Supercell (dim): [2 2 2]
Primitive matrix:
  [0.  0.5 0.5]
  [0.5 0.  0.5]
  [0.5 0.5 0. ]
Spacegroup: Fm-3m (225)
Use -v option to watch primitive cell, unit cell, and supercell structures.
NAC parameters were read from "phono3py.yaml".
----------------------------- pypolymlp start ------------------------------
Pypolymlp version 0.12.9
Pypolymlp is a generator of polynomial machine learning potentials.
Please cite the paper: A. Seko, J. Appl. Phys. 133, 011101 (2023).
Pypolymlp is developed at https://github.com/sekocha/pypolymlp.
Load MLPs from "polymlp.yaml".
------------------------------ pypolymlp end -------------------------------
Generate random displacements
  Twice of number of snapshots will be generated for plus-minus displacements.
  Displacement distance: 0.01
Evaluate forces in 32 supercells by pypolymlp
Dataset generated using MLPs was written in "phono3py_mlp_eval_dataset.yaml".
----------------------------- Force constants ------------------------------
Type-II dataset for displacements and forces was provided,
but the selected force constants calculator cannot process it.
Use another force constants calculator, e.g., symfc,
to generate force constants.
Try symfc to handle general (or random) displacements.
-------------------------------- Symfc start -------------------------------
Symfc version 1.5.3 (https://github.com/symfc/symfc)
Citation: A. Seko and A. Togo, Phys. Rev. B, 110, 214302 (2024)
Computing [2, 3] order force constants.
Increase log-level to watch detailed symfc log.
--------------------------------- Symfc end --------------------------------
Max drift of fc3: -0.00000000 (xyx) -0.00000000 (yxx) -0.00000000 (yxx)
Max drift of fc2: -0.00000000 (yy) -0.00000000 (yy)
fc3 was written into "fc3.hdf5".
fc2 was written into "fc2.hdf5".
--------------------------- Calculation settings ---------------------------
Non-analytical term correction (NAC): True
NAC unit conversion factor:  14.39965
BZ integration: Tetrahedron-method
Temperatures: 0.0  300.0
Cutoff frequency: 0.01
Frequency conversion factor to THz:  15.63330
----------- None of ph-ph interaction calculation was performed. -----------
Summary of calculation was written in "phono3py.yaml".
-------------------------[time 2025-07-26 14:02:29]-------------------------
                 _
   ___ _ __   __| |
  / _ \ '_ \ / _` |
 |  __/ | | | (_| |
  \___|_| |_|\__,_|

After the MLPs are read, 16 supercells with random directional displacements are generated by the option --rd auto. These displacements are then inverted (such as \Delta \mathbf{u}_i and -\Delta \mathbf{u}_i of all atoms i in each supercell), resulting in an additional 16 supercells. In total, 32 supercells are created. The forces for these supercells are then evaluated. Finally, the force constants are calculated using symfc. The --rd-auto-factor option can change the number of supercells generated.

Command options for force constants calculation

After obtaining the MLPs, displacements are generated using these MLPs, and the resulting forces are computed. The displacement distance is controlled by the --amplitude option, with a default value of 0.01 Angstrom. When -d is specified, systematic displacements are introduced. When the --rd option is used, it specifies the number of supercells with random directional displacements. To ensure accurate force constants, the actual number of generated supercells is twice the specified value.

When atoms in the unit cell have positional degrees of freedom within the crystal symmetry, the --relax-atomic-positions option can relax their positions using MLPs. In the example/AlN-rd case, the following command develops polynomial MLPs and then relaxes atomic positions using these MLPs. The force constants are calculated using supercells with 0.005 Angstrom systematic displacements.

% phono3py-load phonopy_params_mp-661.yaml.xz --pypolymlp --relax-atomic-positions -d

        _                      _____
  _ __ | |__   ___  _ __   ___|___ / _ __  _   _
 | '_ \| '_ \ / _ \| '_ \ / _ \ |_ \| '_ \| | | |
 | |_) | | | | (_) | | | | (_) |__) | |_) | |_| |
 | .__/|_| |_|\___/|_| |_|\___/____/| .__/ \__, |
 |_|                                |_|    |___/
                      3.18.0-dev21+ge26f3ecb

-------------------------[time 2025-07-26 14:29:16]-------------------------
Compiled with OpenMP support (max 10 threads).
Running in phono3py.load mode.
Python version 3.13.3
Spglib version 2.6.1
----------------------------- General settings -----------------------------
Run mode: pypolymlp + force constants
HDF5 data compression filter: gzip
Crystal structure was read from "phonopy_params_mp-661.yaml.xz".
Supercell (dim): [4 4 2]
Primitive matrix:
  [1. 0. 0.]
  [0. 1. 0.]
  [0. 0. 1.]
Spacegroup: P6_3mc (186)
Use -v option to watch primitive cell, unit cell, and supercell structures.
NAC parameters were read from "phonopy_params_mp-661.yaml.xz".
Displacement dataset for fc3 was read from "phonopy_params_mp-661.yaml.xz".
----------------------------- pypolymlp start ------------------------------
Pypolymlp version 0.12.9.post0
Pypolymlp is a generator of polynomial machine learning potentials.
Please cite the paper: A. Seko, J. Appl. Phys. 133, 011101 (2023).
Pypolymlp is developed at https://github.com/sekocha/pypolymlp.
Parameters:
  cutoff: 8.0
  model_type: 3
  max_p: 2
  gtinv_order: 3
  gtinv_maxl: (8, 8)
  gaussian_params1: (1.0, 1.0, 1)
  gaussian_params2: (0.0, 7.0, 10)
Developing MLPs by pypolymlp...
MLPs were written into "polymlp.yaml"
------------------------------ pypolymlp end -------------------------------
Relaxing atomic positions using polynomial MLPs...
Change in fractional position and in distance:
  1 N :  0.00000000  0.00000000 -0.00000021 (|d|=0.00000105)
  2 N :  0.00000000  0.00000000 -0.00000021 (|d|=0.00000105)
  3 Al:  0.00000000  0.00000000  0.00000021 (|d|=0.00000105)
  4 Al:  0.00000000  0.00000000  0.00000021 (|d|=0.00000105)
----------------------------------------------------------------------------
Generate displacements
  Displacement distance: 0.005
Evaluate forces in 3720 supercells by pypolymlp
Dataset generated using MLPs was written in "phono3py_mlp_eval_dataset.yaml".
----------------------------- Force constants ------------------------------
Computing fc3[ 1, x, x ] using numpy.linalg.pinv.
Displacements (in Angstrom):
    [ 0.0050  0.0000  0.0000]
    [-0.0050  0.0000  0.0000]
    [ 0.0000  0.0000  0.0050]
    [ 0.0000  0.0000 -0.0050]
Computing fc3[ 65, x, x ] using numpy.linalg.pinv.
Displacements (in Angstrom):
    [ 0.0050  0.0000  0.0000]
    [-0.0050  0.0000  0.0000]
    [ 0.0000  0.0000  0.0050]
    [ 0.0000  0.0000 -0.0050]
Expanding fc3.
Symmetrizing fc3 by symfc projector.
Symfc version 1.5.3 (https://github.com/symfc/symfc)
Citation: A. Seko and A. Togo, Phys. Rev. B, 110, 214302 (2024)
Symmetrizing fc2 by symfc projector.
Symfc version 1.5.3 (https://github.com/symfc/symfc)
Citation: A. Seko and A. Togo, Phys. Rev. B, 110, 214302 (2024)
Max drift of fc3: -0.00000000 (xxz) -0.00000000 (xxz) -0.00000000 (xzx)
Max drift of fc2: -0.00000000 (xx) -0.00000000 (xx)
fc3 was written into "fc3.hdf5".
fc2 was written into "fc2.hdf5".
--------------------------- Calculation settings ---------------------------
Non-analytical term correction (NAC): True
NAC unit conversion factor:  14.39965
BZ integration: Tetrahedron-method
Temperatures: 0.0  300.0
Cutoff frequency: 0.01
Frequency conversion factor to THz:  15.63330
----------- None of ph-ph interaction calculation was performed. -----------
Summary of calculation was written in "phono3py.yaml".
-------------------------[time 2025-07-26 14:39:11]-------------------------
                 _
   ___ _ __   __| |
  / _ \ '_ \ / _` |
 |  __/ | | | (_| |
  \___|_| |_|\__,_|

Parameters for developing MLPs

A few parameters can be specified using the --mlp-params option for the development of MLPs. The parameters are provided as a string, e.g.,

% phono3py-load phono3py_params.yaml --pypolymlp --mlp-params="ntrain=80, ntest=20"

Parameters are separated by commas for configuration. A brief explanation of the available parameters can be found in the docstring of PypolymlpParams that is found by

In [1]: from phonopy.interface.pypolymlp import PypolymlpParams

In [2]: help(PypolymlpParams)

ntrain and ntest are implemented in phono3py, while the remaining parameters are directly passed to pypolymlp. Optimizing pypolymlp parameters can be difficult, both in terms of achieving accuracy and managing the computational resources required. The current default parameters are likely suitable for systems up to ternary compounds. For binary systems, the calculations can generally be run on standard laptop computers, but for ternary systems, around 40 GB of memory or more may be necessary.

For parameter adjustments, it is recommended to consult the pypolymlp documentation and review the relevant research papers.

ntrain and ntest

This method provides a straightforward dataset split: the first ntrain supercells from the list are used for training, while the last ntest supercells are reserved for testing.

Convergence with respect to dataset size

In general, increasing the amount of data improves the accuracy of representing force constants. Therefore, it is recommended to check the convergence of the target property with respect to the number of supercells in the training dataset. Lattice thermal conductivity may be a convenient property to monitor when assessing convergence.

For example, by preparing an initial set with 100 supercell data, calculations can then be performed by varying the size of the training dataset while keeping the test dataset unchanged as follows:

% phono3py-load --pypolymlp --mlp-params="ntrain=20, ntest=20" --br --mesh 40 phono3py_params.yaml | tee log-20
% phono3py-load --pypolymlp --mlp-params="ntrain=40, ntest=20" --br --mesh 40 phono3py_params.yaml | tee log-40
% phono3py-load --pypolymlp --mlp-params="ntrain=60, ntest=20" --br --mesh 40 phono3py_params.yaml | tee log-60
% phono3py-load --pypolymlp --mlp-params="ntrain=80, ntest=20" --br --mesh 40 phono3py_params.yaml | tee log-80

The computed lattice thermal conductivities (LTCs) are plotted against the size of the training dataset to observe LTC convergence. If the LTC has not converged, an additional set of supercell data (e.g., forces and energies in the next 100 supercells) will be computed and included. With this procedure in mind, it may be convenient to generate a sufficiently large number of supercells with random displacements in advance, such as 1000 supercells, before starting the LTC calculation with pypolymlp.

Converting phono3py.pmlp to polymlp.yaml

In older versions, polynomial MLPs were stored in phono3py.pmlp. This file can be converted to polymlp.yaml using the following Python snippet.

from pypolymlp.mlp_dev.pypolymlp import Pypolymlp
polymlp = Pypolymlp()
polymlp.convert_to_yaml(filename_txt="phono3py.pmlp", filename_yaml="polymlp.yaml”)