Glycosylation

For initial setup, we’ll use a PDB that includes many glycosylations:

import molcube as mc

molcube = mc.API()

# Replace with your actual API token
api_token = 'your_api_token_here'

molcube.authenticate(api_token=api_token)

pdbreader = molcube.create_pdb_reader_project()
pdbreader.create_project(title=title, ff='charmmff',
                         pdbId='3pqr')
pdbreader.set_defaults()

When using set_defaults(), MolCube-API Client automatically sets the appropriate options for modeling glycans already in the PDB.

Check default GLYC Chains

The PdbReaderProject has a glycosylations attribute mapping glycan chain ID to an object representing the glycan sequence. When converted to a string (e.g., by printing it), this object shows a condensed representation of the sequence:

>>> from pprint import pprint
>>> pprint(pdbreader.glycosylations)
{'GLYC_A': <Glycosylation: aDMan(1->3)bDMan(1->4)bDGlcNAc(1->4)bDGlcNAc(1->)PROT_A-15>,
 'GLYC_B': <Glycosylation: bDGlc>,
 'GLYC_C': <Glycosylation: aDGlc(1->1)aDGlc>,
 'GLYC_D': <Glycosylation: bDGlcNAc(1->)PROT_A-2>,
 'GLYC_E': <Glycosylation: bDGlc>}

Understanding Glycosylation Structure

Core Concepts

A glycosylation structure is represented as a tree with:

  • Root Node: The anchor point (PROTEIN residue or LIPID)

  • Sugar Nodes: Glycan residues (e.g., DGlc, DMan, DGal)

  • Links: Connections between nodes with linkage site information

Node Types (GLYP_NODE_TYPE)

Type

Description

Usage

PROTEIN

Protein residue (ASN, SER, THR)

Root node only, requires chain and resid

LIPID

Lipid residue

Root node only, created via add_lipid_root() method

ALPHA (or A)

Alpha-linked sugar

Sugar nodes

BETA (or B)

Beta-linked sugar

Sugar nodes

Important Constraints

  1. PROTEIN root glycosylations

    • Only ASN, SER, THR residues can host glycans

    • N-glycans attach to ASN or SER

    • O-glycans attach to THR only

    • Requires chain and resid in the root node

    • Chain index is auto-assigned (GLYC_A, GLYC_B, …)

  2. LIPID root glycosylations

    • Created by calling add_lipid_root(‘LIPID_NAME’) on an existing glycosylation

    • Must be based on a non-PROTEIN default glycosylation from get_applied_glycosylations()

    • Inserts a new LIPID node as root, pushing the original root down one level

    • Useful for modifying lipid-attached glycans from PDB

  3. Root node can only have ONE direct child

    • The first sugar attaches directly to the protein/lipid

    • Additional sugars branch from subsequent sugar nodes

  4. Resid is auto-assigned for sugar nodes

    • Sugar nodes get sequential resid values (1, 2, 3, …)

    • Only PROTEIN root requires explicit chain and resid

Building a Glycosylation Structure

Step-by-Step Process

To create a custom glycosylation, follow these steps:

  1. Import required classes

  2. Create a root node (PROTEIN type with chain and resid)

  3. Create a Glycosylation object with the root node

  4. Add sugar nodes using add_sugar() method

  5. Optionally add modifications using set_modification() method

  6. Add to project using add_glycosylation() method

Linkage Sites

When adding sugars, you need to specify: - `site1`: The linkage site on the child (sugar being added), typically “1” - `site2`: The linkage site on the parent (where the sugar attaches)

  • For root node (PROTEIN/LIPID): usually empty “”

  • For sugar nodes: “2”, “3”, “4”, “6” depending on the sugar type

Example 1: Building a Simple N-Glycan

We’ll start with a simple N-glycan with 2 sugars.

# Target structure: bDGlcNAc(1->4)bDGlcNAc(1->)ASN
#
# ASN (root, PROTEIN)
#  └── bDGlcNAc (id=1) [site1=1, site2=""]
#       └── bDGlcNAc (id=2) [site1=1, site2=4]

# Step 1: Create root node (PROTEIN type requires chain and resid)
root = Node(
    name="ASN", # Residue name (must match actual PDB residue)
    # Root type for user-added glycosylation
    type=GLYP_NODE_TYPE.PROTEIN,
    chain="PROT_A", # Chain index from PDB
    resid="160"     # Residue ID from PDB
)

# Step 2: Create Glycosylation object
# (chain_index will be auto-assigned)
gly = Glycosylation(root=root)

print(f"Root node created: id={root.id}, "
      f"name={root.name}, "
      f"type={root.type}")
print(f"Registry after root: {list(gly._registry.keys())}")

# Step 3: Add first sugar (directly attached to root)
sugar1 = Node(
    name="DGlcNAc",           # Sugar name (case-insensitive)
    type=GLYP_NODE_TYPE.BETA  # Beta linkage (can also use "B")
)
# Note: resid is auto-assigned for sugar nodes (1, 2, 3, ...)

sugar1_id = gly.add_sugar(
    node=sugar1,
    parent_id=0, # 0 is the root node id
    site1="1",   # Child linkage site (usually "1" for sugars)
    site2=""     # Parent site (empty for root)
)

print(f"Sugar1 added: id={sugar1_id}, "
      f"name={sugar1.name}, "
      f"resid={sugar1.resid}")

# Step 4: Add second sugar (attached to sugar1 at site 4)
sugar2 = Node(
    name="DGlcNAc",
    type="B"  # String "B" also works for BETA
)

sugar2_id = gly.add_sugar(
    node=sugar2,
    parent_id=sugar1_id,  # Attach to sugar1
    site1="1",  # Child linkage site
    site2="4"   # Parent linkage site (1->4 linkage)
)

print(f"Sugar2 added: id={sugar2_id}, "
      f"name={sugar2.name}, "
      f"resid={sugar2.resid}")

# Check the structure before adding to project
state = pdbreader.get_glycosylation_state(gly)
assert state, "Failed to get glycosylation"
print(state)

pdbreader.add_glycosylation(gly)
assert pdbreader.confirm_chains()
assert pdbreader.model_pdb()

pdbreader.download_project(f'glycan_1.pdb')

Example 2: Branched Glycan Structure

This time we’ll build a more complex one with branching.

# Target structure: bDMan(1->6)[bDMan(1->3)]bDMan(1->4)bDGlcNAc(1->)ASN
#
# ASN (root)
#  └── bDGlcNAc (id=1) [1->]
#       └── bDMan (id=2) [1->4]
#            ├── bDMan (id=3) [1->6] ← branch 1
#            └── bDMan (id=4) [1->3] ← branch 2

# Create root node
root = Node(name="ASN", type=GLYP_NODE_TYPE.PROTEIN, chain="PROT_A", resid="315")
gly_branched = Glycosylation(root=root)

# Add first sugar (core GlcNAc)
sugar1 = Node(name="DGlcNAc", type=GLYP_NODE_TYPE.BETA)
sugar1_id = gly_branched.add_sugar(sugar1, parent_id=0, site1="1", site2="")

# Add second sugar (core Man)
sugar2 = Node(name="DMan", type=GLYP_NODE_TYPE.BETA)
sugar2_id = gly_branched.add_sugar(sugar2, parent_id=sugar1_id, site1="1", site2="4")

# Add branch 1: Man at site 6
branch1 = Node(name="DMan", type=GLYP_NODE_TYPE.BETA)
branch1_id = gly_branched.add_sugar(branch1, parent_id=sugar2_id, site1="1", site2="6")

# Add branch 2: Man at site 3
branch2 = Node(name="DMan", type=GLYP_NODE_TYPE.BETA)
branch2_id = gly_branched.add_sugar(branch2, parent_id=sugar2_id, site1="1", site2="3")

# Check the structure before adding to project
state = pdbreader.get_glycosylation_state(gly_branched)
assert state, "Failed to get glycosylation"
print(state)

pdbreader.add_glycosylation(gly_branched)
assert pdbreader.confirm_chains()
assert pdbreader.model_pdb()

pdbreader.download_project(f'glycan_2.pdb')

Example 3: Glycan with Modifications

You can add modifications (like sulfate) to sugar nodes using set_modification().

Important constraints:

  • Modifications can only be added to sugar nodes (A or B type), not to root nodes

  • A site cannot be used for both modification and child linkage

  • Available modifications can be checked in get_available_glycosylation_info() output

root = Node(name="ASN", type=GLYP_NODE_TYPE.PROTEIN, chain="PROT_A", resid="160")
gly_mod = Glycosylation(root=root)

# Add sugar
sugar = Node(name="DGlcNAc", type=GLYP_NODE_TYPE.BETA)
sugar_id = gly_mod.add_sugar(sugar, parent_id=0, site1="1", site2="")

# Add modification to sugar node
# Parameters: node_id, site (key), modification value
gly_mod.set_modification(node_id=sugar_id, key="6", value="S")  # Sulfate at site 6

print("Glycan with modification:")

# Note: This would fail - cannot add modification to root node
#gly_mod.set_modification(node_id=0, key="6", value="S")

# Check the structure before adding to project
state = pdbreader.get_glycosylation_state(gly_mod)
assert state, "Failed to get glycosylation"
print(state)

pdbreader.add_glycosylation(gly_mod)
assert pdbreader.confirm_chains()
assert pdbreader.model_pdb()

pdbreader.download_project(f'glycan_3.pdb')

Example 4: Using Predefined N-Glycan (M3)

The system provides commonly used glycan structures that can be applied with a single call.

Available Predefined Glycans (PREDEFINED_GLYCAN):

Category

Glycans

N-glycans (attach to ASN or SER)

M3, M5, M9, FA2G2S2, FA3G3S3, FA4G4S4, FA2BG2S2, Hybrid

O-glycans (attach to THR)

Core_1 ~ Core_8, Extended_Core_1 ~ Extended_Core_4

# Create root (ASN for N-glycans)
root = Node(name="ASN", type=GLYP_NODE_TYPE.PROTEIN, chain="PROT_A", resid="160")
gly_m3 = Glycosylation(root=root)

# Apply M3 predefined glycan
gly_m3.apply_predefined_glycan(PREDEFINED_GLYCAN.M3)  # or "M3"

print("M3 N-glycan structure:")
print(f"Total nodes: {len(gly_m3._registry)}")

# Check the structure before adding to project
state = pdbreader.get_glycosylation_state(gly_mod)
assert state, "Failed to get glycosylation"
print(state)

pdbreader.add_glycosylation(gly_mod)
assert pdbreader.confirm_chains()
assert pdbreader.model_pdb()

pdbreader.download_project(f'glycan_3.pdb')

Managing Glycosylations in PDBReader

Once you’ve built a Glycosylation object, you can manage it using these methods:

Method

Description

add_glycosylation(gly)

Add a new glycosylation to the project

update_glycosylation(chain_index, gly)

Replace an existing glycosylation

delete_glycosylation(chain_index)

Remove a glycosylation

reset_glycosylations()

Reset to default glycosylations

get_glycosylation_state(target)

View glycosylation structure

get_applied_glycosylations()

Get all registered glycosylations

add_glycosylation()

Adds a new glycosylation to the project. The chain_index is automatically assigned.

Constraints:

  • Root must be PROTEIN type (not LIPID)

  • Root residue (chain + resid) must exist in the PDB

  • Root residue name must match actual residue in PDB

  • Cannot add if chain_index is manually set

# Check current glycosylation chain indices
print("Before adding:")
print(f"Chain Indices: {pdbreader.glyc_chain_indices}")

# Add the glycosylation we built earlier (gly with 2 sugars)
pdbreader.add_glycosylation(gly)

print(f"\nAssigned chain_index: {gly.chain_index}")
print(f"After adding - Chain Indices: {pdbreader.glyc_chain_indices}")

# View the added glycosylation
print(pdbreader.get_glycosylation_state(gly.chain_index))

update_glycosylation()

Replaces an existing glycosylation with a new one. The chain_index must already exist.

# Create a new glycosylation to replace the existing one
root_update = Node(name="ASN", type=GLYP_NODE_TYPE.PROTEIN,
                   chain="PROT_A", resid="160")
gly_update = Glycosylation(root=root_update)

# Use M5 predefined glycan
gly_update.apply_predefined_glycan(PREDEFINED_GLYCAN.M5)

# Update the existing glycosylation
chain_to_update = gly.chain_index
pdbreader.update_glycosylation(chain_to_update, gly_update)

print(f"\nUpdated glycosylation at {chain_to_update}:")
pdbreader.get_glycosylation_state(chain_to_update)

delete_glycosylation()

Removes a glycosylation from the project by its chain index.

# Delete the glycosylation we just added/updated
print(f"Before delete - Chain Indices: {pdbreader.glyc_chain_indices}")

pdbreader.delete_glycosylation(chain_to_update)

print(f"After delete - Chain Indices: {pdbreader.glyc_chain_indices}")

reset_glycosylations()

Resets all glycosylations to the server’s default state (original PDB glycosylations).

# Reset to default glycosylations
print("Before reset:")
print(f"Chain Indices: {pdbreader.glyc_chain_indices}")

pdbreader.reset_glycosylations()

print("\nAfter reset:")
print(f"Chain Indices: {pdbreader.glyc_chain_indices}")

Common Errors and Constraints

Below are common error scenarios to be aware of.

# ============================================================
# Example 5: LIPID Root Glycosylation using add_lipid_root()
# ============================================================
# LIPID root is added above an existing non-PROTEIN default glycosylation
#
# Before: OldRoot(type=B) → Sugar1 → Sugar2
# After:  LIPID_NAME(type=LIPID) → OldRoot(type=B) → Sugar1 → Sugar2
#
# NOTE: This feature requires:
#   1. The PDB to have non-PROTEIN root glycosylations (glycolipid structures)
#   2. A valid lipid name from avail_glyc_lipids in available_glycosylation.json

# Check which default glycosylations have non-PROTEIN roots
print("Checking default glycosylations for non-PROTEIN roots...")
default_glycs = pdbreader.get_applied_glycosylations()

avail_add_lipid_obj = []
for chain_idx, gly_obj in default_glycs.items():
    root_type = gly_obj.root.type.upper()
    if root_type != "PROTEIN":
        print(f"  ✓ {chain_idx}: {gly_obj.root.name} (type={root_type}) - Can use add_lipid_root()")
        avail_add_lipid_obj.append(gly_obj)
    else:
        print(f"  ✗ {chain_idx}: {gly_obj.root.name} (type=PROTEIN) - Cannot use add_lipid_root()")

add_lipid_obj = avail_add_lipid_obj[0]  # Using the first available glycosylation object

print(f"Before add_lipid_root:")
print(f"  root.id={add_lipid_obj.root.id}, root.name={add_lipid_obj.root.name}, root.type={add_lipid_obj.root.type}")
pdbreader.get_glycosylation_state(add_lipid_obj)

# Add LIPID root above existing structure
# Use a lipid name from available_glycosylation.json -> avail_glyc_lipids
# LIPID root gets id = -1, existing nodes keep their original IDs
add_lipid_obj.add_lipid_root("CER160")

print(f"After add_lipid_root:")
print(f"  root.id={add_lipid_obj.root.id}, root.name={add_lipid_obj.root.name}, root.type={add_lipid_obj.root.type}")
print(f"  Original root (id=0) is now child of LIPID (id=-1)")
pdbreader.get_glycosylation_state(add_lipid_obj)

# Run model_pdb with LIPID root glycosylation applied
print("Running model_pdb with LIPID root...")
pdbreader.model_pdb(options)

# ============================================================
# Error Case 1: add_lipid_root() on glycosylation without chain_index
# ============================================================
# add_lipid_root() requires the glycosylation to have a chain_index (from default glycosylations)

try:
    # Create a new glycosylation (doesn't have chain_index yet)
    root = Node(name="ASN", type=GLYP_NODE_TYPE.PROTEIN, chain="PROT_A", resid="160")
    gly_new = Glycosylation(root=root)

    # Try to add lipid root - ERROR: no chain_index
    gly_new.add_lipid_root("CER160")
except ValueError as e:
    print(f"✗ Error: {e}")
# ============================================================
# Error Case 2: add_lipid_root() on PROTEIN-rooted default glycosylation
# ============================================================
# add_lipid_root() can only be used on non-PROTEIN default glycosylations

# Find a PROTEIN-rooted default glycosylation to demonstrate the error
default_glycs = pdbreader.get_applied_glycosylations()
protein_chain_idx = None
for chain_idx, gly_obj in default_glycs.items():
    if gly_obj.root.type.upper() == "PROTEIN":
        protein_chain_idx = chain_idx
        break

if protein_chain_idx:
    try:
        # Get PROTEIN-rooted glycosylation
        gly_prot = default_glycs[protein_chain_idx]

        # Try to add lipid root - ERROR: cannot add LIPID above PROTEIN
        gly_prot.add_lipid_root("CER160")
    except ValueError as e:
        print(f"✗ Error: {e}")
else:
    print("No PROTEIN-rooted default glycosylation to demonstrate this error")
# ============================================================
# Error Case 3: Root node can only have ONE child
# ============================================================

try:
    root = Node(name="ASN", type=GLYP_NODE_TYPE.PROTEIN, chain="PROT_A", resid="160")
    gly_err = Glycosylation(root=root)

    # First child - OK
    sugar1 = Node(name="DGlcNAc", type=GLYP_NODE_TYPE.BETA)
    gly_err.add_sugar(sugar1, parent_id=0, site1="1", site2="")

    # Second child to root - ERROR
    sugar2 = Node(name="DGlcNAc", type=GLYP_NODE_TYPE.BETA)
    gly_err.add_sugar(sugar2, parent_id=0, site1="1", site2="")

    pdbreader.add_glycosylation(gly_err)
except ValueError as e:
    print(f"✗ Error: {e}")
# ============================================================
# Error Case 4: PROTEIN/LIPID cannot be child nodes
# ============================================================

try:
    root = Node(name="ASN", type=GLYP_NODE_TYPE.PROTEIN, chain="PROT_A", resid="160")
    gly_err = Glycosylation(root=root)

    # Try to add PROTEIN as child - ERROR
    wrong_child = Node(name="THR", type=GLYP_NODE_TYPE.PROTEIN, chain="PROT_A", resid="161")
    gly_err.add_sugar(wrong_child, parent_id=0, site1="1", site2="")
except ValueError as e:
    print(f"✗ Error: {e}")
# ============================================================
# Error Case 5: Same site cannot have both modification and linkage
# ============================================================

try:
    root = Node(name="ASN", type=GLYP_NODE_TYPE.PROTEIN, chain="PROT_A", resid="160")
    gly_err = Glycosylation(root=root)

    sugar1 = Node(name="DGlcNAc", type=GLYP_NODE_TYPE.BETA)
    sugar1_id = gly_err.add_sugar(sugar1, parent_id=0, site1="1", site2="")

    # Add modification at site "6"
    gly_err.set_modification(node_id=sugar1_id, key="6", value="S")

    # Try to add child at same site "6" - ERROR (during validation)
    sugar2 = Node(name="DGlc", type=GLYP_NODE_TYPE.BETA)
    gly_err.add_sugar(sugar2, parent_id=sugar1_id, site1="1", site2="6")

    pdbreader.add_glycosylation(gly_err)
except ValueError as e:
    print(f"✗ Error: {e}")
# ============================================================
# Error Case 6: Wrong predefined glycan for root residue type
# ============================================================

try:
    # THR is for O-glycans, but M3 is an N-glycan
    root = Node(name="THR", type=GLYP_NODE_TYPE.PROTEIN, chain="PROT_A", resid="200")
    gly_err = Glycosylation(root=root)

    # N-glycan on THR - ERROR
    gly_err.apply_predefined_glycan(PREDEFINED_GLYCAN.M3)
except ValueError as e:
    print(f"✗ Error: {e}")