MD Workflow =========== The MD workflow is organized as a small sequence of reusable steps. 1. Label trajectories --------------------- Use ``mltsa.md.label_trajectories(...)`` to classify each trajectory as ``IN``, ``OUT``, or ``TS`` based on the mean rule value over the final ``window_size`` frames. 2. Build feature sets --------------------- Use ``mltsa.md.featurize_dataset(...)`` to create one named feature set inside the same MD HDF5 file. Supported v1 feature families: - ``closest_residue_distances`` - ``all_ligand_protein_distances`` - ``bubble_distances`` - ``contact_map`` - ``pca_xyz`` 3. Train and explain -------------------- Use ``mltsa.md.run_mltsa(...)`` to load a selected feature set, fit a model, compute feature importances, and optionally save results to a separate results HDF5. 4. Export structures -------------------- Use ``mltsa.md.export_state_structures(...)`` to write multi-model ``IN.pdb``, ``OUT.pdb``, and ``TS.pdb`` files from an existing label experiment. Minimal example --------------- .. code-block:: python from mltsa.md import featurize_dataset, label_trajectories, run_mltsa label_trajectories( trajectory_paths=["traj_0001.dcd", "traj_0002.dcd"], topology="topology.pdb", h5_path="md_dataset.h5", experiment_id="labels", rule="sum_distances", selection_pairs=[("index 10", "index 220")], lower_threshold=0.4, upper_threshold=0.8, window_size=25, ) featurize_dataset( h5_path="md_dataset.h5", feature_set="closest", feature_type="closest_residue_distances", label_experiment_id="labels", ) result = run_mltsa( "md_dataset.h5", "closest", model="random_forest", explanation_method="native", results_h5_path="md_results.h5", experiment_id="rf_native", ) print(result.training_score)