MD Workflow

The MD workflow is organized as a small sequence of reusable steps.

1. Label trajectories

Use mltsa.md.label_trajectories(...) to classify each trajectory as IN, OUT, or TS based on the mean rule value over the final window_size frames.

2. Build feature sets

Use mltsa.md.featurize_dataset(...) to create one named feature set inside the same MD HDF5 file.

Supported v1 feature families:

  • closest_residue_distances

  • all_ligand_protein_distances

  • bubble_distances

  • contact_map

  • pca_xyz

3. Train and explain

Use mltsa.md.run_mltsa(...) to load a selected feature set, fit a model, compute feature importances, and optionally save results to a separate results HDF5.

4. Export structures

Use mltsa.md.export_state_structures(...) to write multi-model IN.pdb, OUT.pdb, and TS.pdb files from an existing label experiment.

Minimal example

from mltsa.md import featurize_dataset, label_trajectories, run_mltsa

label_trajectories(
    trajectory_paths=["traj_0001.dcd", "traj_0002.dcd"],
    topology="topology.pdb",
    h5_path="md_dataset.h5",
    experiment_id="labels",
    rule="sum_distances",
    selection_pairs=[("index 10", "index 220")],
    lower_threshold=0.4,
    upper_threshold=0.8,
    window_size=25,
)

featurize_dataset(
    h5_path="md_dataset.h5",
    feature_set="closest",
    feature_type="closest_residue_distances",
    label_experiment_id="labels",
)

result = run_mltsa(
    "md_dataset.h5",
    "closest",
    model="random_forest",
    explanation_method="native",
    results_h5_path="md_results.h5",
    experiment_id="rf_native",
)

print(result.training_score)