Here are some notebooks demonstrating basic tasks implemented with libmolgrid. We demonstrate usage with PyTorch and Keras with the Tensorflow backend, and show how to implement and train a few types of models. For examples of usage with Caffe, we recommend you look at the gnina scripts and gnina models repositories.
Input files
The “types” files expected as input to ExampleProvider are text files where each line is a training example. The first few columns are numerical labels followed by molecular structure files, like so:
1 6.05 0.162643 4kqp/4kqp_rec_0.gninatypes 4kqp/4kqp_min_0.gninatypes
1 6.05 0.216481 4kqp/4kqp_rec_0.gninatypes 4kqp/4kqp_docked_0.gninatypes
0 -6.05 4.28411 4kqp/4kqp_rec_0.gninatypes 4kqp/4kqp_docked_1.gninatypes
0 -6.05 2.50741 4kqp/4kqp_rec_0.gninatypes 4kqp/4kqp_docked_2.gninatypes
0 -6.05 2.78808 4kqp/4kqp_rec_0.gninatypes 4kqp/4kqp_docked_3.gninatypes
Structure files are recommended to be relative paths (the data_root
setting can
be used to specify what they are relative to). The gninatype
format is a minimal
file format that contains nothing other than Cartesian coordinates and atom types.
It can only be used if indexed atom types (as opposed to vector types) are used.
These files are generated using the gninatyper
binary that is part of gnina
.
Individual gninatypes
files can be assembled into a single large, efficient to
access “cache” file using the script create_caches2.py.