BioJava:CookBook:PDB:mmcif

From BioJava

Jump to: navigation, search

How do I read a .mmcif file?

mmcif is an alternative file format to PDB files ( 1,2 ). The mmcif files are parsed into the same BioJava data structure as the PDB files. The example below demonstrates how to load the content into the BioJava data model for protein structures.

To parse an mmCif file do the following:

@since 1.7
	public static void main(String[] args){
		String file = "/path/to/myfile.cif.gz";
		StructureIOFile pdbreader = new MMCIFFileReader();
		try {
			Structure s = pdbreader.getStructure(file);
			System.out.println(s);
			System.out.println(s.toPDB());
		} catch (IOException e) {
			e.printStackTrace();
		}
	}

Parse into custom data structures

By default the file content will be loaded into the BioJava data structures. The parser contains a built-in event model, which allows to load your own, custom data structures. For this you will require to implement the MMcifConsumer interface If you don;t have that, just use the SimpleMMcifConsumer

@since 1.7
	public static void main(String[] args){
 
		String fileName = args[0];
 
		InputStream inStream =  new FileInputStream(fileName);
 
		MMcifParser parser = new SimpleMMcifParser();
 
		SimpleMMcifConsumer consumer = new SimpleMMcifConsumer();
 
		// The Consumer builds up the BioJava - structure object.
                // you could also hook in your own and build up you own data model.          
		parser.addMMcifConsumer(consumer);
 
		try {
			parser.parse(new BufferedReader(new InputStreamReader(inStream)));
		} catch (IOException e){
			e.printStackTrace();
		}
 
                // now get the protein structure.
		Structure cifStructure = consumer.getStructure();
 
}

For more info on how to work with the BioJava structure data model see BioJava:CookBook:PDB:atoms.

References

  1. Westbrook JD and Bourne PE. STAR/mmCIF: an ontology for macromolecular structure. Bioinformatics 2000 Feb; 16(2) 159-68. pmid:10842738. PubMed HubMed [westbrook2000]
  2. Westbrook JD and Fitzgerald PM. The PDB format, mmCIF, and other data formats. Methods Biochem Anal 2003; 44 161-79. pmid:12647386. PubMed HubMed [westbrook2003]
All Medline abstracts: PubMed HubMed
Personal tools