New Feature LightPFP

What is LightPFP?

LightPFP is a feature that enables the construction of lightweight machine learning potentials tailored to specific research targets, using prediction results from PFP, the general-purpose MLIP of Matlantis, as training data.
Because it is significantly lighter, LightPFP allows for faster inference than PFP, and increases the maximum simulation scale from tens of thousands to hundreds of thousands. This expansion makes it possible to simulate more complex phenomena and larger-scale systems that even PFP alone could not handle.

In general, constructing a machine learning potential requires generating large amounts of training data through quantum chemical calculations, a process that often takes several months or more. However, by leveraging PFP, which combines speed, accuracy, and versatility, it is possible to build high-performance potentials in just a few days.

	Traditional machine learning force field construction	Machine learning force field construction using LightPFP
Data Construction	Prepare DFT calculation results 500 GPUdays	Prepare PFP calculation results on Matlantis <1 GPUdays
Training	Depends on computing resources ~3 GPUdays	Use Matlantis environment resources and pre-trained models <1 GPUdays
Evaluation	Comparison with DFT calculation results Comparison with experimental results	Comparison with PFP calculation results Comparison with experimental results
Time Required to Complete Construction	A few months	A few days

Features That Support LightPFP

In LightPFP, we offer the Moment Tensor Potential (MTP) architecture for building AI models trained on data generated by PFP. MTP achieves an excellent balance between computational efficiency and accuracy, and requires only a small number of parameters—typically between 100 and 1,000—enabling both lightweight models and high-speed inference. That said, like any machine learning potential, constructing an MTP model still involves time-consuming steps such as data generation and model training. To address this, LightPFP includes a number of features designed to reduce the burden of the development process. In the following section, we introduce some of these key features.

Automated Training Data Generation (dataset-generation Tool)

LightPFP includes a dataset-generation tool that automates the creation of training data required for building LightPFP models. It supports a variety of sampling methods—including MD simulations, strain application, and atomic substitution—allowing users to efficiently generate comprehensive training datasets with minimal manual effort.

Pre-trained models

Although MTP is known for its relatively small number of parameters, solving the optimization problem during training is not always straightforward. To support parameter optimization, pre-trained models are available—trained on large and diverse datasets covering a wide range of structures. By selecting a pre-trained model suited to the target system and fine-tuning it, users can significantly reduce training time.

Active Learning

LightPFP’s active learning feature enables efficient training data generation by starting from just a small initial dataset. It automatically performs quality checks by comparing results against PFP and identifies gaps in the training data, allowing missing configurations to be sampled effectively. This capability frees users from having to determine the required amount and variety of training data to achieve a target model accuracy.

Calculation Cases

As examples of calculations using LightPFP, we will introduce a case of a solid-liquid interface and a case of a complex alloy.

Pt(111) / benzene solid-liquid interface (84,600 atoms)

In simulations of catalytic reactions, solid–liquid interfaces often pose challenges due to system size limitations. Here, we reproduced the dynamics at a Pt/benzene interface using a large-scale model. The results confirmed that LightPFP significantly improves computational efficiency in long-timescale simulations, while maintaining the accuracy of PFP.

MD steps	PFP [hour]	LightPFP [hour]
1,000	0.41	7.01
10,000	4.08	7.09
100,000	40.83	7.89
1,000,000	408.33	15.86

High Entropy Cantor Alloy (503,712 atoms)

In general, as the number of constituent elements increases, the potential energy surface becomes more complex, making it increasingly difficult to construct reliable interatomic potentials. In this study, we applied LightPFP to AlCoCrFeNi, a high-entropy Cantor alloy containing five elements. We successfully built a model that maintains PFP-level accuracy for key properties such as lattice constants and the equation of state, and conducted a large-scale simulation of a polycrystalline high-entropy alloy containing over 500,000 atoms.

Please contact us for other examples and specific construction methods.

Get the Matlantis Portable GuideLearn all about Matlantis, from its features to the benefits of installing it! Contact UsLearn more about Matlantis