Quickstart

Open In Colab

The easiest way to use mols2grid is through the mols2grid.display function. The input can be a DataFrame, a list of RDKit molecules, or an SDFile.

[1]:
# uncomment and run if you're on Google Colab
# !pip install rdkit mols2grid
# !wget https://raw.githubusercontent.com/rdkit/rdkit/master/Docs/Book/data/solubility.test.sdf
[2]:
from pathlib import Path

from rdkit import RDConfig

import mols2grid


SDF_FILE = (
    f"{RDConfig.RDDocsDir}/Book/data/solubility.test.sdf"
    if Path(RDConfig.RDDocsDir).is_dir()
    else "solubility.test.sdf"
)

Let’s start with an SDFile (.sdf and .sdf.gz are both supported):

[3]:
mols2grid.display(SDF_FILE)
[3]:

From this interface, you can:

  • Make simple text searches using the searchbar on the top right.

  • Make substructure queries by clicking on SMARTS instead of Text and typing in the searchbar.

  • Sort molecules by clicking on Sort and selecting a field (click the arrows on the right side of the Sort dropdown to reverse the order).

  • View metadata by hovering your mouse over the ``i`` button of a cell, you can also press that button to anchor the information.

  • Select a couple of molecules (click on a cell or on a checkbox, or navigate using your keyboard arrows and press the ENTER key).

  • Export the selection to a SMILES or CSV file, or directly to the clipboard (this last functionality might be blocked depending on how you are running the notebook). If no selection was made, the entire grid is exported.

We can also use a pandas DataFrame as input, containing a column of RDKit molecules (specified using mol_col=...) or SMILES strings (specified using smiles_col=...):

[4]:
df = mols2grid.sdf_to_dataframe(SDF_FILE)
subset_df = df.sample(50, random_state=0xac1d1c)
mols2grid.display(subset_df, mol_col="mol")
[4]:

Finally, we can also use a list of RDKit molecules:

[5]:
mols = subset_df["mol"].to_list()
mols2grid.display(mols)
[5]:

But the main point of mols2grid is that the widget let’s you access your selections from Python afterwards:

[6]:
mols2grid.get_selection()
[6]:
{}

If you were using a DataFrame, you can get the subset corresponding to your selection with:

[7]:
df.iloc[list(mols2grid.get_selection().keys())]
[7]:
_Name _MolFileInfo _MolFileComments ID NAME SOL SMILES SOL_classification mol

Finally, you can save the grid as a standalone HTML document. Simply replace display by save and add the path to the output file with output="path/to/molecules.html"

[8]:
mols2grid.save(mols, output="quickstart-grid.html")