MD Workflow¶
The MD workflow is organized as a small sequence of reusable steps.
1. Label trajectories¶
Use mltsa.md.label_trajectories(...) to classify each trajectory as
IN, OUT, or TS based on the mean rule value over the final
window_size frames.
2. Build feature sets¶
Use mltsa.md.featurize_dataset(...) to create one named feature set inside
the same MD HDF5 file.
Supported v1 feature families:
closest_residue_distancesall_ligand_protein_distancesbubble_distancescontact_mappca_xyz
3. Train and explain¶
Use mltsa.md.run_mltsa(...) to load a selected feature set, fit a model,
compute feature importances, and optionally save results to a separate results
HDF5.
4. Export structures¶
Use mltsa.md.export_state_structures(...) to write multi-model IN.pdb,
OUT.pdb, and TS.pdb files from an existing label experiment.
Minimal example¶
from mltsa.md import featurize_dataset, label_trajectories, run_mltsa
label_trajectories(
trajectory_paths=["traj_0001.dcd", "traj_0002.dcd"],
topology="topology.pdb",
h5_path="md_dataset.h5",
experiment_id="labels",
rule="sum_distances",
selection_pairs=[("index 10", "index 220")],
lower_threshold=0.4,
upper_threshold=0.8,
window_size=25,
)
featurize_dataset(
h5_path="md_dataset.h5",
feature_set="closest",
feature_type="closest_residue_distances",
label_experiment_id="labels",
)
result = run_mltsa(
"md_dataset.h5",
"closest",
model="random_forest",
explanation_method="native",
results_h5_path="md_results.h5",
experiment_id="rf_native",
)
print(result.training_score)