light_pfp_autogen.utils.check_md_log(log_file: Path) bool#

Check the MD log file.
If finished, return True.
If not finished, return False.

light_pfp_autogen.utils.check_model_accuracy(model_id: str) Tuple[float, float, float]#

Check the accuracy of the model
Return the MAE of the model.

light_pfp_autogen.utils.check_train_log(log_file: Path) bool#

Check the training log file and validate the previous training job finished successfully.

light_pfp_autogen.utils.check_training_job_status(job_id: str) bool#

Check the status of the training job

If finished, return True.
If failed, return False.

light_pfp_autogen.utils.count_num_atoms(datasets_list: List[Path]) int#

Count the number of atoms in the given list of datasets.

Parameters

datasets_list (List[Path]) – A list of paths to the datasets.

Returns

The total number of atoms in all the datasets.

Return type

int

light_pfp_autogen.utils.estimate_epoch(datasets_list: List[Path], training_time: float = 0.5) int#

Estimate the number of epochs based on the number of atoms in the dataset and
expected training time in hours.

Parameters
  • datasets_list (List[Path]) – The list of dataset files.

  • training_time (float, optional) – The expected training time in hours. Defaults to 0.5.

light_pfp_autogen.utils.get_model_id_from_log(log_file: Path) str#

Get the model ID from the training log file.

light_pfp_autogen.utils.md_log_statistic(log_file: Path) Tuple[int, int, int, int, int, float, float, float]#
light_pfp_autogen.utils.submit_training_job(training_args: TrainConfig, datasets_list: List[Path], task_name: str) str#

Submit a training job.