Glycosylation ============= For initial setup, we'll use a PDB that includes many glycosylations:: import molcube as mc molcube = mc.API() # Replace with your actual API token api_token = 'your_api_token_here' molcube.authenticate(api_token=api_token) pdbreader = molcube.create_pdb_reader_project() pdbreader.create_project(title=title, ff='charmmff', pdbId='3pqr') pdbreader.set_defaults() When using ``set_defaults()``, |project-bf| automatically sets the appropriate options for modeling glycans already in the PDB. Check default GLYC Chains ------------------------- The ``PdbReaderProject`` has a ``glycosylations`` attribute mapping glycan chain ID to an object representing the glycan sequence. When converted to a string (e.g., by printing it), this object shows a condensed representation of the sequence:: >>> from pprint import pprint >>> pprint(pdbreader.glycosylations) {'GLYC_A': 3)bDMan(1->4)bDGlcNAc(1->4)bDGlcNAc(1->)PROT_A-15>, 'GLYC_B': , 'GLYC_C': 1)aDGlc>, 'GLYC_D': )PROT_A-2>, 'GLYC_E': } Understanding Glycosylation Structure ------------------------------------- Core Concepts ^^^^^^^^^^^^^ A glycosylation structure is represented as a **tree** with: * **Root Node**: The anchor point (PROTEIN residue or LIPID) * **Sugar Nodes**: Glycan residues (e.g., DGlc, DMan, DGal) * **Links**: Connections between nodes with linkage site information Node Types (`GLYP_NODE_TYPE`) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +------------------+----------------------------------+-----------------------------------------------------------+ | Type | Description | Usage | +==================+==================================+===========================================================+ | `PROTEIN` | Protein residue (ASN, SER, THR) | **Root node only**, requires `chain` and `resid` | +------------------+----------------------------------+-----------------------------------------------------------+ | `LIPID` | Lipid residue | **Root node only**, created via `add_lipid_root()` method | +------------------+----------------------------------+-----------------------------------------------------------+ | `ALPHA` (or `A`) | Alpha-linked sugar | Sugar nodes | +------------------+----------------------------------+-----------------------------------------------------------+ | `BETA` (or `B`) | Beta-linked sugar | Sugar nodes | +------------------+----------------------------------+-----------------------------------------------------------+ Important Constraints ^^^^^^^^^^^^^^^^^^^^^ #. **PROTEIN root glycosylations** - Only `ASN`, `SER`, `THR` residues can host glycans - **N-glycans** attach to `ASN` or `SER` - **O-glycans** attach to `THR` only - Requires `chain` and `resid` in the root node - Chain index is **auto-assigned** (`GLYC_A`, `GLYC_B`, ...) #. **LIPID root glycosylations** - Created by calling `add_lipid_root('LIPID_NAME')` on an existing glycosylation - Must be based on a **non-PROTEIN** default glycosylation from `get_applied_glycosylations()` - Inserts a new LIPID node as root, pushing the original root down one level - Useful for modifying lipid-attached glycans from PDB #. **Root node can only have ONE direct child** - The first sugar attaches directly to the protein/lipid - Additional sugars branch from subsequent sugar nodes #. **Resid is auto-assigned for sugar nodes** - Sugar nodes get sequential resid values (1, 2, 3, ...) - Only PROTEIN root requires explicit `chain` and `resid` Building a Glycosylation Structure ---------------------------------- Step-by-Step Process ^^^^^^^^^^^^^^^^^^^^ To create a custom glycosylation, follow these steps: 1. **Import required classes** 2. **Create a root node** (PROTEIN type with chain and resid) 3. **Create a Glycosylation object** with the root node 4. **Add sugar nodes** using `add_sugar()` method 5. **Optionally add modifications** using `set_modification()` method 6. **Add to project** using `add_glycosylation()` method Linkage Sites ^^^^^^^^^^^^^ When adding sugars, you need to specify: - **`site1`**: The linkage site on the **child** (sugar being added), typically "1" - **`site2`**: The linkage site on the **parent** (where the sugar attaches) - For root node (PROTEIN/LIPID): usually empty `""` - For sugar nodes: "2", "3", "4", "6" depending on the sugar type Example 1: Building a Simple N-Glycan ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ We'll start with a simple N-glycan with 2 sugars. :: # Target structure: bDGlcNAc(1->4)bDGlcNAc(1->)ASN # # ASN (root, PROTEIN) # └── bDGlcNAc (id=1) [site1=1, site2=""] # └── bDGlcNAc (id=2) [site1=1, site2=4] # Step 1: Create root node (PROTEIN type requires chain and resid) root = Node( name="ASN", # Residue name (must match actual PDB residue) # Root type for user-added glycosylation type=GLYP_NODE_TYPE.PROTEIN, chain="PROT_A", # Chain index from PDB resid="160" # Residue ID from PDB ) # Step 2: Create Glycosylation object # (chain_index will be auto-assigned) gly = Glycosylation(root=root) print(f"Root node created: id={root.id}, " f"name={root.name}, " f"type={root.type}") print(f"Registry after root: {list(gly._registry.keys())}") # Step 3: Add first sugar (directly attached to root) sugar1 = Node( name="DGlcNAc", # Sugar name (case-insensitive) type=GLYP_NODE_TYPE.BETA # Beta linkage (can also use "B") ) # Note: resid is auto-assigned for sugar nodes (1, 2, 3, ...) sugar1_id = gly.add_sugar( node=sugar1, parent_id=0, # 0 is the root node id site1="1", # Child linkage site (usually "1" for sugars) site2="" # Parent site (empty for root) ) print(f"Sugar1 added: id={sugar1_id}, " f"name={sugar1.name}, " f"resid={sugar1.resid}") # Step 4: Add second sugar (attached to sugar1 at site 4) sugar2 = Node( name="DGlcNAc", type="B" # String "B" also works for BETA ) sugar2_id = gly.add_sugar( node=sugar2, parent_id=sugar1_id, # Attach to sugar1 site1="1", # Child linkage site site2="4" # Parent linkage site (1->4 linkage) ) print(f"Sugar2 added: id={sugar2_id}, " f"name={sugar2.name}, " f"resid={sugar2.resid}") # Check the structure before adding to project state = pdbreader.get_glycosylation_state(gly) assert state, "Failed to get glycosylation" print(state) pdbreader.add_glycosylation(gly) assert pdbreader.confirm_chains() assert pdbreader.model_pdb() pdbreader.download_project(f'glycan_1.pdb') Example 2: Branched Glycan Structure ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ This time we'll build a more complex one with branching. :: # Target structure: bDMan(1->6)[bDMan(1->3)]bDMan(1->4)bDGlcNAc(1->)ASN # # ASN (root) # └── bDGlcNAc (id=1) [1->] # └── bDMan (id=2) [1->4] # ├── bDMan (id=3) [1->6] ← branch 1 # └── bDMan (id=4) [1->3] ← branch 2 # Create root node root = Node(name="ASN", type=GLYP_NODE_TYPE.PROTEIN, chain="PROT_A", resid="315") gly_branched = Glycosylation(root=root) # Add first sugar (core GlcNAc) sugar1 = Node(name="DGlcNAc", type=GLYP_NODE_TYPE.BETA) sugar1_id = gly_branched.add_sugar(sugar1, parent_id=0, site1="1", site2="") # Add second sugar (core Man) sugar2 = Node(name="DMan", type=GLYP_NODE_TYPE.BETA) sugar2_id = gly_branched.add_sugar(sugar2, parent_id=sugar1_id, site1="1", site2="4") # Add branch 1: Man at site 6 branch1 = Node(name="DMan", type=GLYP_NODE_TYPE.BETA) branch1_id = gly_branched.add_sugar(branch1, parent_id=sugar2_id, site1="1", site2="6") # Add branch 2: Man at site 3 branch2 = Node(name="DMan", type=GLYP_NODE_TYPE.BETA) branch2_id = gly_branched.add_sugar(branch2, parent_id=sugar2_id, site1="1", site2="3") # Check the structure before adding to project state = pdbreader.get_glycosylation_state(gly_branched) assert state, "Failed to get glycosylation" print(state) pdbreader.add_glycosylation(gly_branched) assert pdbreader.confirm_chains() assert pdbreader.model_pdb() pdbreader.download_project(f'glycan_2.pdb') Example 3: Glycan with Modifications ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ You can add modifications (like sulfate) to sugar nodes using ``set_modification()``. **Important constraints:** * Modifications can only be added to **sugar nodes** (A or B type), not to root nodes * A site cannot be used for both modification and child linkage * Available modifications can be checked in ``get_available_glycosylation_info()`` output :: root = Node(name="ASN", type=GLYP_NODE_TYPE.PROTEIN, chain="PROT_A", resid="160") gly_mod = Glycosylation(root=root) # Add sugar sugar = Node(name="DGlcNAc", type=GLYP_NODE_TYPE.BETA) sugar_id = gly_mod.add_sugar(sugar, parent_id=0, site1="1", site2="") # Add modification to sugar node # Parameters: node_id, site (key), modification value gly_mod.set_modification(node_id=sugar_id, key="6", value="S") # Sulfate at site 6 print("Glycan with modification:") # Note: This would fail - cannot add modification to root node #gly_mod.set_modification(node_id=0, key="6", value="S") # Check the structure before adding to project state = pdbreader.get_glycosylation_state(gly_mod) assert state, "Failed to get glycosylation" print(state) pdbreader.add_glycosylation(gly_mod) assert pdbreader.confirm_chains() assert pdbreader.model_pdb() pdbreader.download_project(f'glycan_3.pdb') Example 4: Using Predefined N-Glycan (M3) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The system provides commonly used glycan structures that can be applied with a single call. Available Predefined Glycans (`PREDEFINED_GLYCAN`): +--------------------------------------+---------------------------------------------------------+ | Category | Glycans | +======================================+=========================================================+ | **N-glycans** (attach to ASN or SER) | M3, M5, M9, FA2G2S2, FA3G3S3, FA4G4S4, FA2BG2S2, Hybrid | +--------------------------------------+---------------------------------------------------------+ | **O-glycans** (attach to THR) | Core_1 ~ Core_8, Extended_Core_1 ~ Extended_Core_4 | +--------------------------------------+---------------------------------------------------------+ :: # Create root (ASN for N-glycans) root = Node(name="ASN", type=GLYP_NODE_TYPE.PROTEIN, chain="PROT_A", resid="160") gly_m3 = Glycosylation(root=root) # Apply M3 predefined glycan gly_m3.apply_predefined_glycan(PREDEFINED_GLYCAN.M3) # or "M3" print("M3 N-glycan structure:") print(f"Total nodes: {len(gly_m3._registry)}") # Check the structure before adding to project state = pdbreader.get_glycosylation_state(gly_mod) assert state, "Failed to get glycosylation" print(state) pdbreader.add_glycosylation(gly_mod) assert pdbreader.confirm_chains() assert pdbreader.model_pdb() pdbreader.download_project(f'glycan_3.pdb') Managing Glycosylations in PDBReader ------------------------------------ Once you've built a Glycosylation object, you can manage it using these methods: +--------------------------------------------+----------------------------------------+ | Method | Description | +============================================+========================================+ | ``add_glycosylation(gly)`` | Add a new glycosylation to the project | +--------------------------------------------+----------------------------------------+ | ``update_glycosylation(chain_index, gly)`` | Replace an existing glycosylation | +--------------------------------------------+----------------------------------------+ | ``delete_glycosylation(chain_index)`` | Remove a glycosylation | +--------------------------------------------+----------------------------------------+ | ``reset_glycosylations()`` | Reset to default glycosylations | +--------------------------------------------+----------------------------------------+ | ``get_glycosylation_state(target)`` | View glycosylation structure | +--------------------------------------------+----------------------------------------+ | ``get_applied_glycosylations()`` | Get all registered glycosylations | +--------------------------------------------+----------------------------------------+ ``add_glycosylation()`` ^^^^^^^^^^^^^^^^^^^^^^^ Adds a new glycosylation to the project. The `chain_index` is automatically assigned. **Constraints:** * Root must be PROTEIN type (not LIPID) * Root residue (chain + resid) must exist in the PDB * Root residue name must match actual residue in PDB * Cannot add if chain_index is manually set :: # Check current glycosylation chain indices print("Before adding:") print(f"Chain Indices: {pdbreader.glyc_chain_indices}") # Add the glycosylation we built earlier (gly with 2 sugars) pdbreader.add_glycosylation(gly) print(f"\nAssigned chain_index: {gly.chain_index}") print(f"After adding - Chain Indices: {pdbreader.glyc_chain_indices}") # View the added glycosylation print(pdbreader.get_glycosylation_state(gly.chain_index)) ``update_glycosylation()`` ^^^^^^^^^^^^^^^^^^^^^^^^^^ Replaces an existing glycosylation with a new one. The `chain_index` must already exist. :: # Create a new glycosylation to replace the existing one root_update = Node(name="ASN", type=GLYP_NODE_TYPE.PROTEIN, chain="PROT_A", resid="160") gly_update = Glycosylation(root=root_update) # Use M5 predefined glycan gly_update.apply_predefined_glycan(PREDEFINED_GLYCAN.M5) # Update the existing glycosylation chain_to_update = gly.chain_index pdbreader.update_glycosylation(chain_to_update, gly_update) print(f"\nUpdated glycosylation at {chain_to_update}:") pdbreader.get_glycosylation_state(chain_to_update) ``delete_glycosylation()`` ^^^^^^^^^^^^^^^^^^^^^^^^^^ Removes a glycosylation from the project by its chain index. :: # Delete the glycosylation we just added/updated print(f"Before delete - Chain Indices: {pdbreader.glyc_chain_indices}") pdbreader.delete_glycosylation(chain_to_update) print(f"After delete - Chain Indices: {pdbreader.glyc_chain_indices}") ``reset_glycosylations()`` ^^^^^^^^^^^^^^^^^^^^^^^^^^ Resets all glycosylations to the server's default state (original PDB glycosylations). :: # Reset to default glycosylations print("Before reset:") print(f"Chain Indices: {pdbreader.glyc_chain_indices}") pdbreader.reset_glycosylations() print("\nAfter reset:") print(f"Chain Indices: {pdbreader.glyc_chain_indices}") Common Errors and Constraints ----------------------------- Below are common error scenarios to be aware of. :: # ============================================================ # Example 5: LIPID Root Glycosylation using add_lipid_root() # ============================================================ # LIPID root is added above an existing non-PROTEIN default glycosylation # # Before: OldRoot(type=B) → Sugar1 → Sugar2 # After: LIPID_NAME(type=LIPID) → OldRoot(type=B) → Sugar1 → Sugar2 # # NOTE: This feature requires: # 1. The PDB to have non-PROTEIN root glycosylations (glycolipid structures) # 2. A valid lipid name from avail_glyc_lipids in available_glycosylation.json # Check which default glycosylations have non-PROTEIN roots print("Checking default glycosylations for non-PROTEIN roots...") default_glycs = pdbreader.get_applied_glycosylations() avail_add_lipid_obj = [] for chain_idx, gly_obj in default_glycs.items(): root_type = gly_obj.root.type.upper() if root_type != "PROTEIN": print(f" ✓ {chain_idx}: {gly_obj.root.name} (type={root_type}) - Can use add_lipid_root()") avail_add_lipid_obj.append(gly_obj) else: print(f" ✗ {chain_idx}: {gly_obj.root.name} (type=PROTEIN) - Cannot use add_lipid_root()") add_lipid_obj = avail_add_lipid_obj[0] # Using the first available glycosylation object print(f"Before add_lipid_root:") print(f" root.id={add_lipid_obj.root.id}, root.name={add_lipid_obj.root.name}, root.type={add_lipid_obj.root.type}") pdbreader.get_glycosylation_state(add_lipid_obj) # Add LIPID root above existing structure # Use a lipid name from available_glycosylation.json -> avail_glyc_lipids # LIPID root gets id = -1, existing nodes keep their original IDs add_lipid_obj.add_lipid_root("CER160") print(f"After add_lipid_root:") print(f" root.id={add_lipid_obj.root.id}, root.name={add_lipid_obj.root.name}, root.type={add_lipid_obj.root.type}") print(f" Original root (id=0) is now child of LIPID (id=-1)") pdbreader.get_glycosylation_state(add_lipid_obj) # Run model_pdb with LIPID root glycosylation applied print("Running model_pdb with LIPID root...") pdbreader.model_pdb(options) # ============================================================ # Error Case 1: add_lipid_root() on glycosylation without chain_index # ============================================================ # add_lipid_root() requires the glycosylation to have a chain_index (from default glycosylations) try: # Create a new glycosylation (doesn't have chain_index yet) root = Node(name="ASN", type=GLYP_NODE_TYPE.PROTEIN, chain="PROT_A", resid="160") gly_new = Glycosylation(root=root) # Try to add lipid root - ERROR: no chain_index gly_new.add_lipid_root("CER160") except ValueError as e: print(f"✗ Error: {e}") :: # ============================================================ # Error Case 2: add_lipid_root() on PROTEIN-rooted default glycosylation # ============================================================ # add_lipid_root() can only be used on non-PROTEIN default glycosylations # Find a PROTEIN-rooted default glycosylation to demonstrate the error default_glycs = pdbreader.get_applied_glycosylations() protein_chain_idx = None for chain_idx, gly_obj in default_glycs.items(): if gly_obj.root.type.upper() == "PROTEIN": protein_chain_idx = chain_idx break if protein_chain_idx: try: # Get PROTEIN-rooted glycosylation gly_prot = default_glycs[protein_chain_idx] # Try to add lipid root - ERROR: cannot add LIPID above PROTEIN gly_prot.add_lipid_root("CER160") except ValueError as e: print(f"✗ Error: {e}") else: print("No PROTEIN-rooted default glycosylation to demonstrate this error") :: # ============================================================ # Error Case 3: Root node can only have ONE child # ============================================================ try: root = Node(name="ASN", type=GLYP_NODE_TYPE.PROTEIN, chain="PROT_A", resid="160") gly_err = Glycosylation(root=root) # First child - OK sugar1 = Node(name="DGlcNAc", type=GLYP_NODE_TYPE.BETA) gly_err.add_sugar(sugar1, parent_id=0, site1="1", site2="") # Second child to root - ERROR sugar2 = Node(name="DGlcNAc", type=GLYP_NODE_TYPE.BETA) gly_err.add_sugar(sugar2, parent_id=0, site1="1", site2="") pdbreader.add_glycosylation(gly_err) except ValueError as e: print(f"✗ Error: {e}") :: # ============================================================ # Error Case 4: PROTEIN/LIPID cannot be child nodes # ============================================================ try: root = Node(name="ASN", type=GLYP_NODE_TYPE.PROTEIN, chain="PROT_A", resid="160") gly_err = Glycosylation(root=root) # Try to add PROTEIN as child - ERROR wrong_child = Node(name="THR", type=GLYP_NODE_TYPE.PROTEIN, chain="PROT_A", resid="161") gly_err.add_sugar(wrong_child, parent_id=0, site1="1", site2="") except ValueError as e: print(f"✗ Error: {e}") :: # ============================================================ # Error Case 5: Same site cannot have both modification and linkage # ============================================================ try: root = Node(name="ASN", type=GLYP_NODE_TYPE.PROTEIN, chain="PROT_A", resid="160") gly_err = Glycosylation(root=root) sugar1 = Node(name="DGlcNAc", type=GLYP_NODE_TYPE.BETA) sugar1_id = gly_err.add_sugar(sugar1, parent_id=0, site1="1", site2="") # Add modification at site "6" gly_err.set_modification(node_id=sugar1_id, key="6", value="S") # Try to add child at same site "6" - ERROR (during validation) sugar2 = Node(name="DGlc", type=GLYP_NODE_TYPE.BETA) gly_err.add_sugar(sugar2, parent_id=sugar1_id, site1="1", site2="6") pdbreader.add_glycosylation(gly_err) except ValueError as e: print(f"✗ Error: {e}") :: # ============================================================ # Error Case 6: Wrong predefined glycan for root residue type # ============================================================ try: # THR is for O-glycans, but M3 is an N-glycan root = Node(name="THR", type=GLYP_NODE_TYPE.PROTEIN, chain="PROT_A", resid="200") gly_err = Glycosylation(root=root) # N-glycan on THR - ERROR gly_err.apply_predefined_glycan(PREDEFINED_GLYCAN.M3) except ValueError as e: print(f"✗ Error: {e}")