BioJava:CookBook:PDB:read
From BioJava
Contents |
How do I read a PDB file?
BioJava provides a PDB file parser, that reads the content of a PDB file into a flexible data model for managing protein structural data. It is possible to
- parse individual PDB files, or
- work with local PDB file installations.
The class providing the core functionality for this is the PDBFileReader class.
Short Example: the quickest way to read a local file
// also works for gzip compressed files String filename = "path/to/pdbfile.ent" ; PDBFileReader pdbreader = new PDBFileReader(); try{ Structure struc = pdbreader.getStructure(filename); } catch (Exception e){ e.printStackTrace(); }
Example: How to work with a local installation of PDB
try { PDBFileReader reader = new PDBFileReader(); // the path to the local PDB installation reader.setPath("/tmp"); // are all files in one directory, or are the files split, // as on the PDB ftp servers? reader.setPdbDirectorySplit(true); // should a missing PDB id be fetched automatically from the FTP servers? reader.setAutoFetch(true); // should the ATOM and SEQRES residues be aligned when creating the internal data model? reader.setAlignSeqRes(false); // should secondary structure get parsed from the file reader.setParseSecStruc(false); Structure structure = reader.getStructureById("4hhb"); System.out.println(structure); } catch (Exception e){ e.printStackTrace(); }
Will give this output:
Fetching ftp://ftp.wwpdb.org/pub/pdb/data/structures/all/pdb/pdb4hhb.ent.gz writing to /tmp/hh/pdb4hhb.ent.gz structure 4HHB Authors: G.FERMI,M.F.PERUTZ Resolution: 1.74 Technique: X-RAY DIFFRACTION Classification: OXYGEN TRANSPORT DepDate: Wed Mar 07 00:00:00 PST 1984 IdCode: 4HHB Title: THE CRYSTAL STRUCTURE OF HUMAN DEOXYHAEMOGLOBIN AT 1.74 ANGSTROMS RESOLUTION ModDate: Tue Feb 24 00:00:00 PST 2009 chains: chain 0: >A< HEMOGLOBIN (DEOXY) (ALPHA CHAIN) length SEQRES: 0 length ATOM: 198 aminos: 141 hetatms: 57 nucleotides: 0 chain 1: >B< HEMOGLOBIN (DEOXY) (BETA CHAIN) length SEQRES: 0 length ATOM: 205 aminos: 146 hetatms: 59 nucleotides: 0 chain 2: >C< HEMOGLOBIN (DEOXY) (ALPHA CHAIN) length SEQRES: 0 length ATOM: 201 aminos: 141 hetatms: 60 nucleotides: 0 chain 3: >D< HEMOGLOBIN (DEOXY) (BETA CHAIN) length SEQRES: 0 length ATOM: 197 aminos: 146 hetatms: 51 nucleotides: 0 DBRefs: 4 DBREF 4HHB A 1 141 UNP P69905 HBA_HUMAN 1 141 DBREF 4HHB B 1 146 UNP P68871 HBB_HUMAN 1 146 DBREF 4HHB C 1 141 UNP P69905 HBA_HUMAN 1 141 DBREF 4HHB D 1 146 UNP P68871 HBB_HUMAN 1 146 Molecules: Compound: 1 HEMOGLOBIN (DEOXY) (ALPHA CHAIN) Chains: ChainId: A C Engineered: YES OrganismScientific: HOMO SAPIENS OrganismTaxId: 9606 OrganismCommon: HUMAN Compound: 2 HEMOGLOBIN (DEOXY) (BETA CHAIN) Chains: ChainId: B D Engineered: YES OrganismScientific: HOMO SAPIENS OrganismTaxId: 9606 OrganismCommon: HUMAN
Example: How to parse a local file
This example shows how to read a PDB file from your file system, obtain a Structure object and iterate over the Groups that are contained in the file. For more examples of how to access the Atoms please go to BioJava:CookBook:PDB:atoms. For more info on how the parser deals with SEQRES and ATOM records please see BioJava:CookBook:PDB:seqres
// also works for gzip compressed files String filename = "path/to/pdbfile.ent" ; PDBFileReader pdbreader = new PDBFileReader(); // the following parameters are optional: //the parser can read the secondary structure // assignment from the PDB file header and add it to the amino acids pdbreader.setParseSecStruc(true); // align the SEQRES and ATOM records, default = true // slows the parsing speed slightly down, so if speed matters turn it off. pdbreader.setAlignSeqRes(true); // parse the C-alpha atoms only, default = false pdbreader.setParseCAOnly(false); // download missing PDB files automatically from EBI ftp server, default = false pdbreader.setAutoFetch(false); try{ Structure struc = pdbreader.getStructure(filename); System.out.println(struc); GroupIterator gi = new GroupIterator(struc); while (gi.hasNext()){ Group g = (Group) gi.next(); if ( g instanceof AminoAcid ){ AminoAcid aa = (AminoAcid)g; Map sec = aa.getSecStruc(); Chain c = g.getParent(); System.out.println(c.getName() + " " + g + " " + sec); } } } catch (Exception e) { e.printStackTrace(); }
To learn how to serialize a Structure object to a database see BioJava:CookBook:PDB:hibernate
Next: BioJava:CookBook:PDB:atoms - How to access atoms.

