PDB Reader

Background

The Protein Data Bank (PDB) format allows for a relatively straightforward process of describing biomolecular structures. The RCSB protein data bank (RCSB PDB) has information on thousands of structures. Unfortunately, simulating these structures is not always straightforward, as their structures do not always model all relevant info explicitly. Modifying the structures can be especially difficult, even if you know what you are doing.

Most MolCube projects start with a PDB Reader project. PDB Reader parses a .pdb or .cif file, determines what structures already exist in the file, and allows manipulating the structure to add or remove features like Disulfide bonds, Mutations, Glycosylations, etc.

A finished PDB Reader project can then be used for more complex operations like Solvation, embedding in a Membrane, and calculating Free Energy.

Overview

The general procedure for using PDB Reader in MolCube-API Client goes like this:

Authenticate server connection.
Create project.
Select chains.
Manipulate structure (optional).
Finalize model.
Download project (optional).

The example below shows how this works in the simplest case by using only the default settings that you’d see on the MolCube Apps site. This is equivalent to entering a PDB ID and just clicking “Next” until you get to the final page, then clicking “Download Project”.

import molcube as mc
from pprint import pprint

molcube = mc.API('alphaapi.molcube.com', 443)
molcube.authenticate(api_token=api_access_key)

#
# Initialize project by downloading structure from RCSB
#
pdbreader = molcube.create_pdb_reader_project()
assert pdbreader.create_project(title='test-defaults', ff='charmmff', pdbId='2hac')
pdbreader.set_defaults()

#
# if modifying chain selection, do so here
#
assert pdbreader.confirm_chains()

#
# if modifying manipulation options, do so here
#
assert pdbreader.model_pdb()

pdbreader.download_project('myproject.tgz')

The assert keyword is prepended to each command that submits a step. This prevents the script from proceeding if a step fails, though it is not required.

Create PDB Reader project

Creating a PDB Reader project requires setting a force field and project title, and either providing a RCSB pdbId or uploading a customPdb. Alternatively, if your project already exists, e.g. because you created it interactively in MolCube Apps, you can Resume an existing project.

Let’s walk through how to create a PDB Reader project.

The following arguments are available for the method:

correct_topo (bool): Correct chains and bonds information using distance between each atom. (default: False)
rename_dupl_atoms (bool): Rename hetero atoms if there are duplicate atom names. (default: True)
calc_pka (bool): Calculate pKa of protein residues to apply system pH. (defualt: False)

Two force field options are available: charmmff and amberff. Although MolCube supports martiniff and drudeff, these options are not yet supported in the Python API client.

For amberff, you can select different force field options. The default selections for amberff are as follows:

amberOptions = {
    "protein": "FF19SB",
    "dna": "OL15",
    "rna": "OL3",
    "glycan": "GLYCAM_06j",
    "lipid": "Lipid21",
    "water": "OPC"
}

Here are all available choices for amberOptions:

Protein: [FF19SB, FF14SB, FF14SBonlysc]
DNA: [OL15, BSC1]
RNA: [OL3, YIL, Shaw]
Glycan: [GLYCAM_06j]
Lipid: [Lipid21, Lipid17]
Water: [OPC, TIP3P, TIP4PEW, TIP4PD]

You can also find these options in the molcube.pdbreader.enums module:

from molcube.pdbreader import enums

print(f"Protein options: [{', '.join(enums.Protein)}]")
print(f"DNA options: [{', '.join(enums.DNA)}]")
print(f"RNA options: [{', '.join(enums.RNA)}]")
print(f"Glycan options: [{', '.join(enums.Glycan)}]")
print(f"Lipid options: [{', '.join(enums.Lipid)}]")
print(f"Water options: [{', '.join(enums.Water)}]")

Fetch PDB from RCSB

Using the pdbId keyword argument will attempt to obtain the PDB from RCSB automatically. create_project() returns True on success:

# Create a PDB Reader project and fetch PDB from RCSB using PDB ID
pdbreader = molcube.create_pdb_reader_project()
assert pdbreader.create_project(title='test', ff='charmmff', pdbId="2hac")

Upload a custom PDB file

If you already have a local copy of your structure, you can pass the path to your structure with the customPdb keyword argument:

pdbreader = molcube.create_pdb_reader_project()
assert pdbreader.create_project(title='test', ff='charmmff', customPdb="files/2hac.cif")

MolCube recognizes structures in PDB (.pdb), PDBx/mmCIF (.cif), and GROMACS (.gro) formats.

Resume an existing project

An existing project can be resumed by passing the project ID to resume_project():

pdbreader = molcube.create_pdb_reader_project()
assert pdbreader.resume_project(project_id='b33384ed-7e4e-48cb-9afd-b2f0fe6456a2')

Search project list

If you don’t already know the ID, you can use search_projects().

Acceptable args:

page (int) page to return (default: 1)
perPage (int) number of results per page (default: 10)
keyword (str) limit results to those containing a keyword
searchKey (title|pk) restrict keyword search to either the title string or project ID (pk).

Other args: projectStatus (str), projectStep (int), projectCategory (designer|builder), forceField (str), startDate (datetime), endDate (edatetime), hasStandaloneLigand (bool), pdbAmberOption (dict). See API Reference for more detail.

Example:

>>> search_results = molcube.search_projects()
>>> search_results
{'projects': [{'pk': '1eebb792-f267-4c01-9f9f-1b179819c3f3',
   'createdAt': '2026-04-02 13:51:33',
   'forcefieldType': 'charmmff',
   'projectType': 'PDB Reader',
   'title': 'My Test Project',
   'step': 2,
   'status': 'Success',
   'fileName': '2hac.cif',
   'sideChainOriented': False,
   'tag': None,
   'user': 'Your User Name',
   'team': None,
   'teamId': None,
   'workspace': 'Personal'},
  {'pk': '205dffb4-2b15-43ce-9228-305e6ae510a6', ...},
  ...,
 ],
 'totalPages': 2,
 'currentPage': 1,
 'totalCount': 11,
 'hasNext': True,
 'hasPrevious': False}

E.g., to resume the most recent project, use the first pk from the returned object:

my_projects = search_results['projects']
project_id = my_projects[0]['pk']

pdbreader = molcube.create_pdb_reader_project()
assert pdbreader.resume_project(project_id=project_id)

Check available info about PDB

The get_chains() method returns a list of chains for each chain type and (where applicable) the available terminal caps:

>>> pdbreader.get_chains()
{'protein': [{'chainIndex': 'PROT_A',
   'terminal': {'nter': ['NTER', 'NNEU', 'ACE', 'NONE'],
    'cter': ['CTER', 'CNEU', 'CT1', 'CT2', 'CT3', 'NONE']},
   'nsdTerminal': {'nter': ['ACE'], 'cter': ['NONE']},
   'chainId': 'A'},
  {'chainIndex': 'PROT_B',
   'terminal': {'nter': ['NTER', 'NNEU', 'ACE', 'NONE'],
    'cter': ['CTER', 'CNEU', 'CT1', 'CT2', 'CT3', 'NONE']},
   'nsdTerminal': {'nter': ['ACE'], 'cter': ['NONE']},
   'chainId': 'B'}],
 'nucleicAcid': [],
 'standaloneLigand': [],
 'heme': [],
 'ion': [],
 'glycan': [],
 'water': []}

The get_pdb_info() method returns a large dict with all info the MolCube server was able to parse from the structure:

>>> pdb_info = pdbreader.get_pdb_info()
>>> pdb_info.keys()

dict_keys(['ph', 'pdbId', 'source',
   'forceFieldType', 'models', 'availResnames',
   'resnames', 'titrableResidues',
   'ptmResidues', 'protonationStates',
   'ssbondResidues', 'phosphorylatableResidues',
   'phosphorylationStates', 'staplingPatches',
   'missingResidues', 'ssbonds',
   'glycosylations', 'hemes', 'staplings',
   'covalentLigands', 'nonStandards',
   'terminalCappings', 'acidsOptions',
   'surfaceProteinResidues', 'calcPka',
   'invalidCovalentLigands', 'selectedChains',
   'ffGeneration', 'ffGenAtomType'])

Check default settings

To see what settings MolCube would use by default, use get_defaults():

>>> pdbreader.get_defaults()

{'projectPk': '1eebb792-f267-4c01-9f9f-1b179819c3f3',
 'ph': 7.0,
 'chain': {'calcPka': False,
  'ffGeneration': None,
  'ssbond': [{'residue1': {'chainIndex': 'PROT_A', 'resid': '2'},
    'residue2': {'chainIndex': 'PROT_B', 'resid': '2'}}],
  'glycosylation': [],
  'heme': [],
  'protein': [{'chainIndex': 'PROT_A', 'missing': [], 'selected': True},
   {'chainIndex': 'PROT_B', 'missing': [], 'selected': True}],
  'nucleicAcid': [],
  'standaloneLigand': [],
  'ion': [],
  'glycan': [],
  'water': [],
  'projectPk': '1eebb792-f267-4c01-9f9f-1b179819c3f3',
  'ph': 7.0},
 'glycosylation': [],
 'ffGeneration': None,
 'ssbond': [{'residue1': {'chainIndex': 'PROT_A', 'resid': '2'},
   'residue2': {'chainIndex': 'PROT_B', 'resid': '2'}}],
 'heme': []}

This is the format of the request that will be sent to the server if you use the defaults. To tell the pdbreader object to set its internal settings to match the server’s defaults, use set_defaults() with no arguments:

# you do not need to call get_defaults() if you
# are not going to edit the defaults dict manually
pdbreader.set_defaults()

While you could edit the manipulation options dict directly and pass the result as an argument to set_defaults(), it is easier to use the dedicated manipulation methods, which are demonstrated below.

A quick summary of supported manipulations:

Chain selection: toggle_chain(), toggle_chains_by_type()
Terminal patching: set_terminal_patch(), get_terminal_residues()
Mutations: add_mutation(), remove_mutation()
Phosphorylation: add_phosphorylation(), remove_phosphorylation()
Protonation: add_protonation(), remove_protonation()
Disulfide bonds: add_ssbond(), remove_ssbond()
Peptide stapling: add_staple(), remove_staple(), get_valid_staples()
Missing residue modeling: add_missing_residues(), remove_missing_residues(), get_valid_missing_terminals()
Side chain orientation: orient_side_chains()
Glycosylation: See examples in Glycosylation.

Chain selection

The chain selection functions are toggle_chain() and toggle_chains_by_type(). The default chain selection when using set_defaults() is to select all chains except for water. You only need to call one of the these methods if deviating from this default.

Both functions take up to two arguments:

toggle_chain() args:
    enable (str | list[str]): chain ID or list of chain IDs to enable
    disable (str | list[str]): chain ID or list of chain IDs to disable

toggle_chains_by_type() args:
    enable (str | list[str]): a category or list or categories to enable
    disable (str | list[str]): same as above

Chain types:

protein
nucleicAcid
standaloneLigand
ion
water
glycan

See molcube.pdbreader.enums.CHAIN_TYPE:

>>> from molcube.pdbreader import enums
>>> print(f"Chain types: [{', '.join(enums.CHAIN_TYPE)}]")
Chain types: [protein, nucleicAcid,
   standaloneLigand, heme, ion, water, glycan]

Usage example

3PQR has several chains, as shown below:

import molcube as mc
from pprint import pprint

molcube = mc.API('api.molcube.com', 443)
molcube.authenticate(api_token=api_access_key)

pdbreader = molcube.create_pdb_reader_project()
pdbreader.create_project(title='test-chains', ff='charmmff', customPdb='files/3pqr.cif')
chains = pdbreader.get_chains()
pdbreader.set_defaults()

>>> pdbreader
<PdbReaderProject with settings: {
    "projectPk": "5fcd30b3-281c-4962-8dec-3bdda99baaa6",
    "ph": 7.0,
    "chain": {
        "calcPka": false,
        "ssbond": [ ... ],
        "glycosylation": [ ... ],
        "protein": [
            { "chainIndex": "PROT_A", "missing": [], "selected": true, "terminal": { "nter": "NTER", "cter": "CTER" } },
            { "chainIndex": "PROT_B", "missing": [], "selected": true, "terminal": { "nter": "NTER", "cter": "CTER" } } ],
        "nucleicAcid": [],
        "standaloneLigand": [
            { "chainIndex": "HETE_C", "selected": false },
            { "chainIndex": "HETE_D", "selected": false },
            { "chainIndex": "HETE_E", "selected": false } ],
        "ion": [],
        "glycan": [
            { "chainIndex": "GLYC_A", "selected": true },
            { "chainIndex": "GLYC_B", "selected": true },
            { "chainIndex": "GLYC_C", "selected": true },
            { "chainIndex": "GLYC_D", "selected": true },
            { "chainIndex": "GLYC_E", "selected": true } ],
        "water": [
            { "chainIndex": "WATE_A", "selected": false },
            { "chainIndex": "WATE_B", "selected": false } ],
        "ph": 7.0
    },
    "glycosylation": [ ... ], "ffGeneration": null,
    "ssbond": [ ... ],
}>

Chains can be enabled/disabled individually or by category:

# enable or disable a single chain
pdbreader.toggle_chain(enable='PROT_A')
pdbreader.toggle_chain(disable='GLYC_A')
# same as above
pdbreader.toggle_chain(enable='PROT_A', disable='GLYC_B')

# enable or disable multiple chains
pdbreader.toggle_chain(enable=['PROT_A', 'GLYC_C'],
                       disable='PROT_B')

# disable everything except protein
pdbreader.toggle_chains_by_type(enable='protein',
   disable=['glycan', 'water', 'ion', 'standaloneLigand'])

Confirm chain selection (required)

After using set_defaults() and (optionally) a toggle function, your settings are still local to your machine. To tell the MolCube server to apply your chain selection, you must use the confirm_chains() method.

Args it accepts:

ph (float): pH to use (default: 7.0)
model (int): PDB model to use (default: 1st model)

This is equivalent to pressing “Submit” on the chain selection page of PDB Reader:

assert pdbreader.confirm_chains()

Model manipulation (required)

The sections below demonstrate model manipulation. Each of them is optional.

Use model_pdb() to confirm model manipulation. This must be performed _after_ confirm_chains().

In the simplest case where you want to use default chain selection and default manipulations, then this is all you need to do:

import molcube as mc
from pprint import pprint

molcube = mc.API('api.molcube.com', 443)
molcube.authenticate(api_token=api_access_key)

# simplest possible case: use defaults for everything
pdbreader = molcube.create_pdb_reader_project()
assert pdbreader.create_project(title='test-defaults', ff='charmmff', customPdb='files/2hac.cif')
pdbreader.set_defaults()

#
# if modifying chain selection, do so here
#

assert pdbreader.confirm_chains()

#
# if modifying manipulation options, do so here
#

assert pdbreader.model_pdb()

Terminal patching

Each protein chain returned by get_chains() has a 'terminal' key with a list of valid N-/C-terminals. The default terminal patch is the first one in the list. E.g., defaults for PROT_A below are CTER and NTER:

>>> from pprint import pprint
>>> pprint(chains['protein'])
{'chainId': 'A',
  'chainIndex': 'PROT_A',
  'nsdTerminal': {'cter': ['CT3'], 'nter': ['ACE']},
  'terminal': {'cter': ['CTER', 'CNEU', 'CT1', 'CT2', 'CT3', 'NONE'],
               'nter': ['NTER', 'NNEU', 'ACE', 'NONE']}},
 {'chainId': 'B',
  'chainIndex': 'PROT_B',
  'nsdTerminal': {'cter': ['NONE'], 'nter': ['ACE']},
  'terminal': {'cter': ['CTER', 'CNEU', 'CT1', 'CT2', 'CT3', 'NONE'],
               'nter': ['NTER', 'NNEU', 'ACE', 'NONE']}}]
>>> pprint({chain: pdbreader._option_by_chain[chain] for chain in ('PROT_A', 'PROT_B')})

{'PROT_A': {'chainIndex': 'PROT_A',
            'missing': [],
            'selected': True,
            'terminal': {'cter': 'CTER', 'nter': 'NTER'}},
 'PROT_B': {'chainIndex': 'PROT_B',
            'missing': [],
            'selected': True,
            'terminal': {'cter': 'CTER', 'nter': 'NTER'}}}

To set a different terminal patch, use set_terminal_patch().

Expected arguments:

chain_id (str, required): chain to set
nter (str, optional): use this patch, if given; else use default patch
cter (str, optional): use this patch, if given; else use default patch

Mutations

Point mutations are added with add_mutation() and removed with remove_mutation().

Expected arguments:

add_mutation()
chain_id (str): chain containing residue to mutate resid (str): residue ID to mutate new_resname (str): name of residue to mutate to

remove_mutation() is used like above, but
requires only the chain_id and resid arguments.

Example usage:

import molcube as mc
from pprint import pprint

molcube = mc.API('api.molcube.com', 443)
molcube.authenticate(api_token=api_access_key)

# simplest possible case: use defaults for everything
pdbreader = molcube.create_pdb_reader_project()
pdbreader.create_project(title='test-mutation', ff='charmmff', customPdb='files/2klu.cif')

pdbreader.set_defaults()
assert pdbreader.confirm_chains()

pdbreader.add_mutation(chain_id='PROT_A', resid='364', new_resname='ASN')  # GLY 364 -> ASN
pdbreader.add_mutation(chain_id='PROT_A', resid='365', new_resname='ALA')  # PRO 365 -> ALA

assert pdbreader.model_pdb()

Phosphorylation and Protonation

add_phosphorylation() and add_protonation() take the same arguments and differ only by what patch residues are considered valid.

Args:

chain_id (str): chain index, e.g. PROT_A
resid (str): resid to protonate
patch (str): name of phosphorylation/titration patch to apply

You can find valid options in the dict returned by get_pdb_info():

pdb_info = pdbreader.get_pdb_info()
print('protonations:')
pprint( pdb_info['titrableResidues'] )
# protonations:
# {'ARG': ['RN1', 'RN2', 'RN3'],
#  'ASP': ['ASPP'],
#  'CYS': ['CYM'],
#  'GLU': ['GLUP'],
#  'HIE': ['HSP', 'HSD', 'HSE'],
#  'HIP': ['HSP', 'HSD', 'HSE'],
#  'HIS': ['HSP', 'HSD', 'HSE'],
#  'HSD': ['HSP', 'HSE'],
#  'HSE': ['HSP', 'HSD'],
#  'HSP': ['HSD', 'HSE'],
#  'LYS': ['LSN']}

print()
print('phosphorylations:')
pprint( pdb_info['phosphorylatableResidues'] )
# phosphorylations:
# {'SER': ['SP2', 'SP1'], 'THR': ['THPB',
#  'THP1'], 'TYR': ['TP2', 'TP1']}

Example usage:

import molcube as mc

molcube = mc.API('api.molcube.com', 443)
molcube.authenticate(api_token=api_access_key)

pdbreader = molcube.create_pdb_reader_project()
assert pdbreader.create_project(title='test-phosphorylation-2klu', ff='charmmff', customPdb='files/2klu.cif')

pdbreader.set_defaults()
assert pdbreader.confirm_chains()

pdbreader.add_phosphorylation(chain_id='PROT_A', resid='394', patch='SP1')
pdbreader.add_phosphorylation(chain_id='PROT_A', resid='415', patch='SP1')
pdbreader.add_phosphorylation(chain_id='PROT_A', resid='431', patch='SP1')
assert pdbreader.model_pdb()

Disulfide bonds

Disulfide bonds present in the PDB are shown in pdb_info:

>>> pdb_info = pdbreader.get_pdb_info()
>>> pdb_info['ssbonds']
{'residue1': {'chainIndex': 'PROT_A', 'resid': '2'},
  'residue2': {'chainIndex': 'PROT_B', 'resid': '2'}}]

Disulfide bonds are added with add_ssbond() and removed with remove_ssbond().

Both require the same arguments:

residue1 (str | dict): first ssbond residue
residue2 (str | dict): second ssbond residue

The two acceptable formats are shown below:

# string format: "chain_id residue_id"
pdbreader_project.add_ssbond('PROT_A 50', 'PROT_A 62')

# if passing dict, it must be structed like below
pdbreader_project.add_ssbond(
    residue1={'chainIndex': 'PROT_A', 'resid': '50'},
    residue2={'chainIndex': 'PROT_A', 'resid': '62'})

Stapling

Staples are added with add_staple() and removed with remove_staple(). Usage is almost exactly like with disulfide bonds, except for an additional argument when adding a staple: the staple type.

To see the valid staple types, use get_valid_staples():

>>> pdbreader.get_valid_staples()
['META3', 'META4', 'META5', 'META6', 'META7',
 'RMETA3', 'RMETA4', 'RMETA5', 'RMETA6',
 'RMETA7', 'DIBM', 'DIBP', 'CR12', 'CR21']

Example usage:

import molcube as mc

molcube = mc.API('api.molcube.com', 443)
molcube.authenticate(api_token=api_access_key)

pdbreader = molcube.create_pdb_reader_project()
assert pdbreader.create_project(title='test-staple', ff='charmmff', customPdb='files/1ubq.pdb')

pdbreader.set_defaults()
pdbreader.add_staple('RMETA3', 'PROT_A 1', 'PROT_A 3')

assert pdbreader.confirm_chains()
assert pdbreader.model_pdb()