RCSB Viewers:MBT Libs:Bonds and Nucleic Acid Identification^Classification

From BioJava

Jump to: navigation, search

Notes

  • Bond records are ignored in the loaders. Bonds are determined either through a dictionary lookup, or via calculation if the lookup fails.
  • Currently the lookup files described here are generated by an external process and are incorporated directly within the 'Structure Models' jar as a resource. This means that they can only be updated if the 'Structure Models' jar is updated.

Relevent Classes

  • Bond - definition class
  • BondFactory - Creates the bonds (static)
  • ChemicalComponentBonds - does lookup for bonds
  • NucleicAcidInfo - does lookup for nucleic acids
  • Octree - for calculating bonds
  • OctreeAtomItem - for Octree
  • OctreeDataItem - for Octree

Explanation

MBT maintains a dictionary of known structures. This comes from a combined .cif file that is found at this ftp site:

   	ftp://ftp.wwpdb.org/pub/pdb/data/monomers/components.cif.gz

This file is loaded and broken apart by an external process - see the RCSB Excluded project, package tools package.

ChemicalComponentBondsCreator is run from the commandline against the file. It's not a full parser - it just extracts bond information. The output of that (ChemicalComponentBonds.dat') is copied into the RCSB MBT Libs project, source directory Structure Model, in the package util as a resource.

At runtime, this abbreviated file is picked up and put into a hash-table. Atoms are checked against this for bond information.

If bonds are not found for a given residue, the atoms are run through a bond-generation algorithm that determines bonds by distance. Atoms are arranged in an octree, first, for quick spatial checks.

Look in the 'RCSB MBT Libs' project, source directory 'Structure Model', in the package model for the StructureMap class, again. In there, find generateBonds(). Note it checks a flag to ignore the dictionary lookup and strictly use the distance algorithm (suspect this is for debugging, mainly). The BondFactory class is what does the dictionary lookup or bond calculations, depending on what's required.

Incidentally, the same kind of mechanism is used to determined nucleic acid classification. In the RCSB Excluded project, source directory CL Tools', the FindAllNucleicAcidCompoundNames is also run from the commandline and generates an output file ('NucleicAcidCompoundNames.dat').

Personal tools