Glycosylation
For initial setup, we’ll use a PDB that includes many glycosylations:
import molcube as mc
molcube = mc.API()
# Replace with your actual API token
api_token = 'your_api_token_here'
molcube.authenticate(api_token=api_token)
pdbreader = molcube.create_pdb_reader_project()
pdbreader.create_project(title=title, ff='charmmff',
pdbId='3pqr')
pdbreader.set_defaults()
When using set_defaults(), MolCube-API Client automatically sets the appropriate options for modeling glycans already in the PDB.
Check default GLYC Chains
The PdbReaderProject has a glycosylations attribute mapping glycan chain ID to an object representing the glycan sequence. When converted to a string (e.g., by printing it), this object shows a condensed representation of the sequence:
>>> from pprint import pprint
>>> pprint(pdbreader.glycosylations)
{'GLYC_A': <Glycosylation: aDMan(1->3)bDMan(1->4)bDGlcNAc(1->4)bDGlcNAc(1->)PROT_A-15>,
'GLYC_B': <Glycosylation: bDGlc>,
'GLYC_C': <Glycosylation: aDGlc(1->1)aDGlc>,
'GLYC_D': <Glycosylation: bDGlcNAc(1->)PROT_A-2>,
'GLYC_E': <Glycosylation: bDGlc>}
Understanding Glycosylation Structure
Core Concepts
A glycosylation structure is represented as a tree with:
Root Node: The anchor point (PROTEIN residue or LIPID)
Sugar Nodes: Glycan residues (e.g., DGlc, DMan, DGal)
Links: Connections between nodes with linkage site information
Node Types (GLYP_NODE_TYPE)
Type |
Description |
Usage |
|---|---|---|
PROTEIN |
Protein residue (ASN, SER, THR) |
Root node only, requires chain and resid |
LIPID |
Lipid residue |
Root node only, created via add_lipid_root() method |
ALPHA (or A) |
Alpha-linked sugar |
Sugar nodes |
BETA (or B) |
Beta-linked sugar |
Sugar nodes |
Important Constraints
PROTEIN root glycosylations
Only ASN, SER, THR residues can host glycans
N-glycans attach to ASN or SER
O-glycans attach to THR only
Requires chain and resid in the root node
Chain index is auto-assigned (GLYC_A, GLYC_B, …)
LIPID root glycosylations
Created by calling add_lipid_root(‘LIPID_NAME’) on an existing glycosylation
Must be based on a non-PROTEIN default glycosylation from get_applied_glycosylations()
Inserts a new LIPID node as root, pushing the original root down one level
Useful for modifying lipid-attached glycans from PDB
Root node can only have ONE direct child
The first sugar attaches directly to the protein/lipid
Additional sugars branch from subsequent sugar nodes
Resid is auto-assigned for sugar nodes
Sugar nodes get sequential resid values (1, 2, 3, …)
Only PROTEIN root requires explicit chain and resid
Building a Glycosylation Structure
Step-by-Step Process
To create a custom glycosylation, follow these steps:
Import required classes
Create a root node (PROTEIN type with chain and resid)
Create a Glycosylation object with the root node
Add sugar nodes using add_sugar() method
Optionally add modifications using set_modification() method
Add to project using add_glycosylation() method
Linkage Sites
When adding sugars, you need to specify: - `site1`: The linkage site on the child (sugar being added), typically “1” - `site2`: The linkage site on the parent (where the sugar attaches)
For root node (PROTEIN/LIPID): usually empty “”
For sugar nodes: “2”, “3”, “4”, “6” depending on the sugar type
Example 1: Building a Simple N-Glycan
We’ll start with a simple N-glycan with 2 sugars.
# Target structure: bDGlcNAc(1->4)bDGlcNAc(1->)ASN
#
# ASN (root, PROTEIN)
# └── bDGlcNAc (id=1) [site1=1, site2=""]
# └── bDGlcNAc (id=2) [site1=1, site2=4]
# Step 1: Create root node (PROTEIN type requires chain and resid)
root = Node(
name="ASN", # Residue name (must match actual PDB residue)
# Root type for user-added glycosylation
type=GLYP_NODE_TYPE.PROTEIN,
chain="PROT_A", # Chain index from PDB
resid="160" # Residue ID from PDB
)
# Step 2: Create Glycosylation object
# (chain_index will be auto-assigned)
gly = Glycosylation(root=root)
print(f"Root node created: id={root.id}, "
f"name={root.name}, "
f"type={root.type}")
print(f"Registry after root: {list(gly._registry.keys())}")
# Step 3: Add first sugar (directly attached to root)
sugar1 = Node(
name="DGlcNAc", # Sugar name (case-insensitive)
type=GLYP_NODE_TYPE.BETA # Beta linkage (can also use "B")
)
# Note: resid is auto-assigned for sugar nodes (1, 2, 3, ...)
sugar1_id = gly.add_sugar(
node=sugar1,
parent_id=0, # 0 is the root node id
site1="1", # Child linkage site (usually "1" for sugars)
site2="" # Parent site (empty for root)
)
print(f"Sugar1 added: id={sugar1_id}, "
f"name={sugar1.name}, "
f"resid={sugar1.resid}")
# Step 4: Add second sugar (attached to sugar1 at site 4)
sugar2 = Node(
name="DGlcNAc",
type="B" # String "B" also works for BETA
)
sugar2_id = gly.add_sugar(
node=sugar2,
parent_id=sugar1_id, # Attach to sugar1
site1="1", # Child linkage site
site2="4" # Parent linkage site (1->4 linkage)
)
print(f"Sugar2 added: id={sugar2_id}, "
f"name={sugar2.name}, "
f"resid={sugar2.resid}")
# Check the structure before adding to project
state = pdbreader.get_glycosylation_state(gly)
assert state, "Failed to get glycosylation"
print(state)
pdbreader.add_glycosylation(gly)
assert pdbreader.confirm_chains()
assert pdbreader.model_pdb()
pdbreader.download_project(f'glycan_1.pdb')
Example 2: Branched Glycan Structure
This time we’ll build a more complex one with branching.
# Target structure: bDMan(1->6)[bDMan(1->3)]bDMan(1->4)bDGlcNAc(1->)ASN # # ASN (root) # └── bDGlcNAc (id=1) [1->] # └── bDMan (id=2) [1->4] # ├── bDMan (id=3) [1->6] ← branch 1 # └── bDMan (id=4) [1->3] ← branch 2 # Create root node root = Node(name="ASN", type=GLYP_NODE_TYPE.PROTEIN, chain="PROT_A", resid="315") gly_branched = Glycosylation(root=root) # Add first sugar (core GlcNAc) sugar1 = Node(name="DGlcNAc", type=GLYP_NODE_TYPE.BETA) sugar1_id = gly_branched.add_sugar(sugar1, parent_id=0, site1="1", site2="") # Add second sugar (core Man) sugar2 = Node(name="DMan", type=GLYP_NODE_TYPE.BETA) sugar2_id = gly_branched.add_sugar(sugar2, parent_id=sugar1_id, site1="1", site2="4") # Add branch 1: Man at site 6 branch1 = Node(name="DMan", type=GLYP_NODE_TYPE.BETA) branch1_id = gly_branched.add_sugar(branch1, parent_id=sugar2_id, site1="1", site2="6") # Add branch 2: Man at site 3 branch2 = Node(name="DMan", type=GLYP_NODE_TYPE.BETA) branch2_id = gly_branched.add_sugar(branch2, parent_id=sugar2_id, site1="1", site2="3") # Check the structure before adding to project state = pdbreader.get_glycosylation_state(gly_branched) assert state, "Failed to get glycosylation" print(state) pdbreader.add_glycosylation(gly_branched) assert pdbreader.confirm_chains() assert pdbreader.model_pdb() pdbreader.download_project(f'glycan_2.pdb')
Example 3: Glycan with Modifications
You can add modifications (like sulfate) to sugar nodes using set_modification().
Important constraints:
Modifications can only be added to sugar nodes (A or B type), not to root nodes
A site cannot be used for both modification and child linkage
Available modifications can be checked in
get_available_glycosylation_info()output
root = Node(name="ASN", type=GLYP_NODE_TYPE.PROTEIN, chain="PROT_A", resid="160")
gly_mod = Glycosylation(root=root)
# Add sugar
sugar = Node(name="DGlcNAc", type=GLYP_NODE_TYPE.BETA)
sugar_id = gly_mod.add_sugar(sugar, parent_id=0, site1="1", site2="")
# Add modification to sugar node
# Parameters: node_id, site (key), modification value
gly_mod.set_modification(node_id=sugar_id, key="6", value="S") # Sulfate at site 6
print("Glycan with modification:")
# Note: This would fail - cannot add modification to root node
#gly_mod.set_modification(node_id=0, key="6", value="S")
# Check the structure before adding to project
state = pdbreader.get_glycosylation_state(gly_mod)
assert state, "Failed to get glycosylation"
print(state)
pdbreader.add_glycosylation(gly_mod)
assert pdbreader.confirm_chains()
assert pdbreader.model_pdb()
pdbreader.download_project(f'glycan_3.pdb')
Example 4: Using Predefined N-Glycan (M3)
The system provides commonly used glycan structures that can be applied with a single call.
Available Predefined Glycans (PREDEFINED_GLYCAN):
Category |
Glycans |
|---|---|
N-glycans (attach to ASN or SER) |
M3, M5, M9, FA2G2S2, FA3G3S3, FA4G4S4, FA2BG2S2, Hybrid |
O-glycans (attach to THR) |
Core_1 ~ Core_8, Extended_Core_1 ~ Extended_Core_4 |
# Create root (ASN for N-glycans)
root = Node(name="ASN", type=GLYP_NODE_TYPE.PROTEIN, chain="PROT_A", resid="160")
gly_m3 = Glycosylation(root=root)
# Apply M3 predefined glycan
gly_m3.apply_predefined_glycan(PREDEFINED_GLYCAN.M3) # or "M3"
print("M3 N-glycan structure:")
print(f"Total nodes: {len(gly_m3._registry)}")
# Check the structure before adding to project
state = pdbreader.get_glycosylation_state(gly_mod)
assert state, "Failed to get glycosylation"
print(state)
pdbreader.add_glycosylation(gly_mod)
assert pdbreader.confirm_chains()
assert pdbreader.model_pdb()
pdbreader.download_project(f'glycan_3.pdb')
Managing Glycosylations in PDBReader
Once you’ve built a Glycosylation object, you can manage it using these methods:
Method |
Description |
|---|---|
|
Add a new glycosylation to the project |
|
Replace an existing glycosylation |
|
Remove a glycosylation |
|
Reset to default glycosylations |
|
View glycosylation structure |
|
Get all registered glycosylations |
add_glycosylation()
Adds a new glycosylation to the project. The chain_index is automatically assigned.
Constraints:
Root must be PROTEIN type (not LIPID)
Root residue (chain + resid) must exist in the PDB
Root residue name must match actual residue in PDB
Cannot add if chain_index is manually set
# Check current glycosylation chain indices
print("Before adding:")
print(f"Chain Indices: {pdbreader.glyc_chain_indices}")
# Add the glycosylation we built earlier (gly with 2 sugars)
pdbreader.add_glycosylation(gly)
print(f"\nAssigned chain_index: {gly.chain_index}")
print(f"After adding - Chain Indices: {pdbreader.glyc_chain_indices}")
# View the added glycosylation
print(pdbreader.get_glycosylation_state(gly.chain_index))
update_glycosylation()
Replaces an existing glycosylation with a new one. The chain_index must already exist.
# Create a new glycosylation to replace the existing one
root_update = Node(name="ASN", type=GLYP_NODE_TYPE.PROTEIN,
chain="PROT_A", resid="160")
gly_update = Glycosylation(root=root_update)
# Use M5 predefined glycan
gly_update.apply_predefined_glycan(PREDEFINED_GLYCAN.M5)
# Update the existing glycosylation
chain_to_update = gly.chain_index
pdbreader.update_glycosylation(chain_to_update, gly_update)
print(f"\nUpdated glycosylation at {chain_to_update}:")
pdbreader.get_glycosylation_state(chain_to_update)
delete_glycosylation()
Removes a glycosylation from the project by its chain index.
# Delete the glycosylation we just added/updated
print(f"Before delete - Chain Indices: {pdbreader.glyc_chain_indices}")
pdbreader.delete_glycosylation(chain_to_update)
print(f"After delete - Chain Indices: {pdbreader.glyc_chain_indices}")
reset_glycosylations()
Resets all glycosylations to the server’s default state (original PDB glycosylations).
# Reset to default glycosylations
print("Before reset:")
print(f"Chain Indices: {pdbreader.glyc_chain_indices}")
pdbreader.reset_glycosylations()
print("\nAfter reset:")
print(f"Chain Indices: {pdbreader.glyc_chain_indices}")
Common Errors and Constraints
Below are common error scenarios to be aware of.
# ============================================================
# Example 5: LIPID Root Glycosylation using add_lipid_root()
# ============================================================
# LIPID root is added above an existing non-PROTEIN default glycosylation
#
# Before: OldRoot(type=B) → Sugar1 → Sugar2
# After: LIPID_NAME(type=LIPID) → OldRoot(type=B) → Sugar1 → Sugar2
#
# NOTE: This feature requires:
# 1. The PDB to have non-PROTEIN root glycosylations (glycolipid structures)
# 2. A valid lipid name from avail_glyc_lipids in available_glycosylation.json
# Check which default glycosylations have non-PROTEIN roots
print("Checking default glycosylations for non-PROTEIN roots...")
default_glycs = pdbreader.get_applied_glycosylations()
avail_add_lipid_obj = []
for chain_idx, gly_obj in default_glycs.items():
root_type = gly_obj.root.type.upper()
if root_type != "PROTEIN":
print(f" ✓ {chain_idx}: {gly_obj.root.name} (type={root_type}) - Can use add_lipid_root()")
avail_add_lipid_obj.append(gly_obj)
else:
print(f" ✗ {chain_idx}: {gly_obj.root.name} (type=PROTEIN) - Cannot use add_lipid_root()")
add_lipid_obj = avail_add_lipid_obj[0] # Using the first available glycosylation object
print(f"Before add_lipid_root:")
print(f" root.id={add_lipid_obj.root.id}, root.name={add_lipid_obj.root.name}, root.type={add_lipid_obj.root.type}")
pdbreader.get_glycosylation_state(add_lipid_obj)
# Add LIPID root above existing structure
# Use a lipid name from available_glycosylation.json -> avail_glyc_lipids
# LIPID root gets id = -1, existing nodes keep their original IDs
add_lipid_obj.add_lipid_root("CER160")
print(f"After add_lipid_root:")
print(f" root.id={add_lipid_obj.root.id}, root.name={add_lipid_obj.root.name}, root.type={add_lipid_obj.root.type}")
print(f" Original root (id=0) is now child of LIPID (id=-1)")
pdbreader.get_glycosylation_state(add_lipid_obj)
# Run model_pdb with LIPID root glycosylation applied
print("Running model_pdb with LIPID root...")
pdbreader.model_pdb(options)
# ============================================================
# Error Case 1: add_lipid_root() on glycosylation without chain_index
# ============================================================
# add_lipid_root() requires the glycosylation to have a chain_index (from default glycosylations)
try:
# Create a new glycosylation (doesn't have chain_index yet)
root = Node(name="ASN", type=GLYP_NODE_TYPE.PROTEIN, chain="PROT_A", resid="160")
gly_new = Glycosylation(root=root)
# Try to add lipid root - ERROR: no chain_index
gly_new.add_lipid_root("CER160")
except ValueError as e:
print(f"✗ Error: {e}")
# ============================================================
# Error Case 2: add_lipid_root() on PROTEIN-rooted default glycosylation
# ============================================================
# add_lipid_root() can only be used on non-PROTEIN default glycosylations
# Find a PROTEIN-rooted default glycosylation to demonstrate the error
default_glycs = pdbreader.get_applied_glycosylations()
protein_chain_idx = None
for chain_idx, gly_obj in default_glycs.items():
if gly_obj.root.type.upper() == "PROTEIN":
protein_chain_idx = chain_idx
break
if protein_chain_idx:
try:
# Get PROTEIN-rooted glycosylation
gly_prot = default_glycs[protein_chain_idx]
# Try to add lipid root - ERROR: cannot add LIPID above PROTEIN
gly_prot.add_lipid_root("CER160")
except ValueError as e:
print(f"✗ Error: {e}")
else:
print("No PROTEIN-rooted default glycosylation to demonstrate this error")
# ============================================================
# Error Case 3: Root node can only have ONE child
# ============================================================
try:
root = Node(name="ASN", type=GLYP_NODE_TYPE.PROTEIN, chain="PROT_A", resid="160")
gly_err = Glycosylation(root=root)
# First child - OK
sugar1 = Node(name="DGlcNAc", type=GLYP_NODE_TYPE.BETA)
gly_err.add_sugar(sugar1, parent_id=0, site1="1", site2="")
# Second child to root - ERROR
sugar2 = Node(name="DGlcNAc", type=GLYP_NODE_TYPE.BETA)
gly_err.add_sugar(sugar2, parent_id=0, site1="1", site2="")
pdbreader.add_glycosylation(gly_err)
except ValueError as e:
print(f"✗ Error: {e}")
# ============================================================
# Error Case 4: PROTEIN/LIPID cannot be child nodes
# ============================================================
try:
root = Node(name="ASN", type=GLYP_NODE_TYPE.PROTEIN, chain="PROT_A", resid="160")
gly_err = Glycosylation(root=root)
# Try to add PROTEIN as child - ERROR
wrong_child = Node(name="THR", type=GLYP_NODE_TYPE.PROTEIN, chain="PROT_A", resid="161")
gly_err.add_sugar(wrong_child, parent_id=0, site1="1", site2="")
except ValueError as e:
print(f"✗ Error: {e}")
# ============================================================
# Error Case 5: Same site cannot have both modification and linkage
# ============================================================
try:
root = Node(name="ASN", type=GLYP_NODE_TYPE.PROTEIN, chain="PROT_A", resid="160")
gly_err = Glycosylation(root=root)
sugar1 = Node(name="DGlcNAc", type=GLYP_NODE_TYPE.BETA)
sugar1_id = gly_err.add_sugar(sugar1, parent_id=0, site1="1", site2="")
# Add modification at site "6"
gly_err.set_modification(node_id=sugar1_id, key="6", value="S")
# Try to add child at same site "6" - ERROR (during validation)
sugar2 = Node(name="DGlc", type=GLYP_NODE_TYPE.BETA)
gly_err.add_sugar(sugar2, parent_id=sugar1_id, site1="1", site2="6")
pdbreader.add_glycosylation(gly_err)
except ValueError as e:
print(f"✗ Error: {e}")
# ============================================================
# Error Case 6: Wrong predefined glycan for root residue type
# ============================================================
try:
# THR is for O-glycans, but M3 is an N-glycan
root = Node(name="THR", type=GLYP_NODE_TYPE.PROTEIN, chain="PROT_A", resid="200")
gly_err = Glycosylation(root=root)
# N-glycan on THR - ERROR
gly_err.apply_predefined_glycan(PREDEFINED_GLYCAN.M3)
except ValueError as e:
print(f"✗ Error: {e}")