- class light_pfp_data.utils.dataset.H5DatasetWriter(h5_file: Union[str, Path, File], mode: str = ‘copy’)#
-
Bases:
object
A class to save the training structure and the corresponding potential energy,
forces, and stress into HDF5 files.- add_item(model_version: str, calc_mode: str, atoms: Atoms, potential_energy: float, forces: ndarray, stress: Optional[ndarray] = None, **kwargs) None #
-
Add one item into the dataset.
- Parameters
-
-
model_version (str) – The PFP version to get the potential energy.
-
calc_mode (str) – The PFP calculation mode to to get the potential energy.
-
atoms (Atoms) – The input ASE atoms.
-
potential_energy (float) – The potential energy.
-
forces (np.ndarray) – The forces.
-
stress (np.ndarray, optional) – The virial stress. Defaults to None.
-
- get_atoms(key: str, add_calc: bool = False) Atoms #
-
Get ASE atoms object from one item.
- Parameters
-
-
key (str) – The key of the item.
-
add_calc (bool, optional) – Whether to attach the calculator with the information of
potential energy, forces etc. Defaults to False.
-
- Returns
-
The ASE atoms object.
- Return type
-
Atoms
- property n_items: int#
- recalculate(model_version: str, calc_mode: str, show_progress_bar: bool = False, executor: Optional[ThreadPoolExecutor] = None, num_threads: int = 8, max_retries: int = 0) List[Future] #
-
Recalculate the potential energy, forces, and stress of all items in the dataset
with the given model version and calculation mode.- Parameters
-
-
model_version (str) – The PFP version.
-
calc_mode (str) – The PFP calculation mode.
-
show_progress_bar (bool, optional) – Show progress bar. Defaults to False.
-
executor (ThreadPoolExecutor, optional) – Thread pool executor parallel calculation. Defaults to None.
-
num_threads (int, optional) – Max number of threads to use for executor if no executor is passed. Defaults to 8.
-
max_retries (int, optional) – Max retries for PFP calculation. Defaults to 0.
-
- update_item(key: str, **kwargs)#
-
Update one item in the dataset by overwriting the old values with a new ones.
- Parameters
-
key (str) – The key of the item.
- light_pfp_data.utils.dataset.check_quality(datasets: List[Union[H5DatasetWriter, File]], max_energy: float = 0.0, max_forces: float = 20.0, delete_invalid_keys: bool = False, print_info: bool = True) bool #
-
Checks the quality of a group of datasets.
Specifically, this function checks whether the calculation parameters
(model_version and calc_mode) are consistent both within each dataset and across datasets.
If there are multiple inconsistent values, training is unsuitable for this group of datasets.This dataset also checks that all items in a dataset are below a specified maximum
energy and force threshold, which suggests that a data point is unsuitable for training.- Parameters
-
-
datasets (list[H5DatasetWriter or File]]) – The list of datasets to check.
-
max_energy (float, optional) – The threshold for acceptable maximum energy per atom (eV/atom).
Defaults to 0.0. -
max_force (float, optional) – The threshold for acceptable maximum force (eV/angstrom).
Defaults to 20.0. -
delete_invalid_keys (bool, optional) – delete invalid keys. Defaults to False.
-
- Returns
-
Whether the quality of the given group of datasets is suitable for training.
- Return type
-
bool
- light_pfp_data.utils.dataset.dataset_dist_analysis(h5_file: File, filename: str) None #
-
Analyze a dataset and draw the distribution of energy and forces.
- Parameters
-
-
h5_file (File) – The dataset in h5py.File format.
-
filename (str) – The path to save the output figure.
-