Quickstartο
The easiest way to use mols2grid is through the mols2grid.display
function. The input can be a DataFrame, a list of RDKit molecules, or an SDFile.
[1]:
# uncomment and run if you're on Google Colab
# !pip install rdkit mols2grid
# !wget https://raw.githubusercontent.com/rdkit/rdkit/master/Docs/Book/data/solubility.test.sdf
[2]:
from pathlib import Path
from rdkit import RDConfig
import mols2grid
SDF_FILE = (
f"{RDConfig.RDDocsDir}/Book/data/solubility.test.sdf"
if Path(RDConfig.RDDocsDir).is_dir()
else "solubility.test.sdf"
)
Letβs start with an SDFile (.sdf
and .sdf.gz
are both supported):
[3]:
mols2grid.display(SDF_FILE)
[3]:
From this interface, you can:
Make simple text searches using the searchbar on the top right.
Make substructure queries by clicking on
SMARTS
instead ofText
and typing in the searchbar.Sort molecules by clicking on
Sort
and selecting a field (click the arrows on the right side of theSort
dropdown to reverse the order).View metadata by hovering your mouse over the ``i`` button of a cell, you can also press that button to anchor the information.
Select a couple of molecules (click on a cell or on a checkbox, or navigate using your keyboard arrows and press the
ENTER
key).Export the selection to a SMILES or CSV file, or directly to the clipboard (this last functionality might be blocked depending on how you are running the notebook). If no selection was made, the entire grid is exported.
We can also use a pandas DataFrame as input, containing a column of RDKit molecules (specified using mol_col=...
) or SMILES strings (specified using smiles_col=...
):
[4]:
df = mols2grid.sdf_to_dataframe(SDF_FILE)
subset_df = df.sample(50, random_state=0xac1d1c)
mols2grid.display(subset_df, mol_col="mol")
[4]:
Finally, we can also use a list of RDKit molecules:
[5]:
mols = subset_df["mol"].to_list()
mols2grid.display(mols)
[5]:
But the main point of mols2grid is that the widget letβs you access your selections from Python afterwards:
[6]:
mols2grid.get_selection()
[6]:
{}
If you were using a DataFrame, you can get the subset corresponding to your selection with:
[7]:
df.iloc[list(mols2grid.get_selection().keys())]
[7]:
_Name | _MolFileInfo | _MolFileComments | ID | NAME | SOL | SMILES | SOL_classification | mol |
---|
Finally, you can save the grid as a standalone HTML document. Simply replace display
by save
and add the path to the output file with output="path/to/molecules.html"
[8]:
mols2grid.save(mols, output="quickstart-grid.html")