Molecular Dynamics Analysis tool for MLTSA

This package called “MD_DATA” has all functions related with the analysis of Molecular Dynamics trajectories prior to applying MLTSA on MD data. It includes CV selection, CV calculation for dataset generation and binary label classification similar to IN/OUT in ligand unbinding by manually defining the relevant distances for labeling. Everything is implemented using the mdtraj package, it can read the same topology files and trajectory files as mdtraj.

This module will encompass the code for analyzing the molecular dynamics generated on the simulations as dcd files. It also takes care of the CV generation for further feeding into the MLTSA pipeline.

class CV_from_MD.CVs(top)

This class is a container of the Collective Variables defined to calculate on mdtraj later. It contains different methods for CV definition to later on calculate for the MLTSA.

define_variables(CV_type, custom_selection_string=None, CV_indices=None)

This method defines the variables depending on the type of CV passed, labels such as ‘all’ for example, will calculate every interatomic distance between all atoms. Custom CVs can be passed in with CV_indices=list and CV_type = “custom_CVs” as pairs of atom indices to calculate distances on. A custom_selection_string atom selection using mdtraj’s syntax can be passed with custom_selection_string= to select atoms to calculate all distances from by using CV_type=”custom_selection” .

Parameters
  • CV_type – str Label to specify the CV definition, it can be “all” for all atoms, “Calpha_water” for ligand +water+Calpha atoms, “Calpha” for ligand+Calpha atoms, “all_closest_atoms” for all close atoms between residues, “all_closest_heavy_atoms” for all closest heavy inter-residue atoms, “bubble_ligand” for all distances between ligand and protein for a 6 Angstroms bubble around the ligand. “custom_CVs” for a selected set of CV_indices to be passed, and “custom_selection” to pass a custom_selection_string to use on mdtraj as an atom selection sytnax.

  • custom_selection_string – str Atom selection from mdtraj’s atom selection reference syntax which will select the atom indices and use them for CV definition.

  • CV_indices – list, array CVs can be defined outside of this class and passed here as atom indices.

Returns

class CV_from_MD.MDs

Analyzer wrapper based on mdtraj, that can generate distances out of a previously defined CVs object with calculate_CVs(). It can also make use of a list of dcd files and topology along with a set of selection strings and upper/lower values to check for an automatic labeling of simulations with label_simulations().

calculate_CVs(CVs, dcd_paths, loading='normal', iter_chunk=None)

Method for calculating the Collective Variables previously defined by passing on a CVs object along the list of trajectories to use and calculate the data. It has different methods for loading depending on the complexity of the dataset to analyze.

Parameters
  • CVs – class CVs object class previously defined with a set of CVs already defined. It will be used to calculate the distances.

  • dcd_paths – list List of strings containing the paths to the different .dcd/trajectory files.

  • loading – str Label for the type of trajectory loading to use, it can affect the performance.

Returns

label_simulations(top, dcd_paths, selection_strings_to_label, upper_lim, lower_lim, loading='normal', end_perc=0.25, get_sum=True, plotting_sum=False, plotting_all=False, show_plots=False, save_labels=False, save_plots=False, save_path='')

Method for the labeling of a given set of trajectory files on the desired string selections to check and the upper/lower limit to classify. It can also plot figures with the values for each of the distances throughout the trajectories and save them in the specified path.

Parameters
  • top – str Path to the topology file to use (.pdb/.psf) or any mdtraj compatible topology file.

  • dcd_paths – list List containing the paths to the trajectory files (.dcd/other)

  • selection_strings_to_label – str String selection using mdtraj’s atom selection reference syntax.

  • upper_lim – float Upper limit which sets the OUT label for the trajectories when labeled. Anything bigger than this will be considered as OUT. Anything smaller than this and bigger than lower_lim will be labeled as UCL.

  • lower_lim – float Lower limit which sets the IN label for the trajectories when labeled. Anything smaller than this will be considered as IN. Anything biggerr than this and smaller than upper_lim will be labeled as UCL.

  • loading – str Label to specify the loading procedure, affects performance.

  • plotting – boolean Determines whether to plot in matplotlib the evolution of the labeling distances throughout the trajectories. Figures will be saved in the given save_path, one per simulation.

  • show_plots – boolean Whether to show the plots in the current session or not. If this is False and plotting=True and save_plots=True it will still save them without showing them.

  • save_labels – boolean Determines if the labels should be saved in a file on the desired destination with save_path.

  • save_plots – boolean Determines whether to save the plots or not.Figures will be saved in the given save_path, one per simulation.

  • save_path – str Path to save the figures generated by the labelling if plotting=True. If not specified it saves in the working directory.

Returns

list Returns the list of labelled simulations as [“IN”, “OUT”, etc.] for each trajectory passed in the dcd_paths list.