Class MultipleAlignmentTools


  • public class MultipleAlignmentTools
    extends Object
    Utility functions for working with MultipleAlignment.

    Supported functions:

    • Obtain the alignment as sequence strings
    • Map from sequence alignment position to structure Atom
    • Map from sequence alignment position to Block number
    • Transform the aligned Atoms of a MultipleAlignment
    • Get all the core alignment positions of the alignment
    • Calculate the average residue distance of all aligned positions
    • Sort Blocks in a MultipleAlignment by a specified row
    • Convert a MultipleAlignment to a MultipleSequenceAlignment
    Since:
    4.1.0
    Author:
    Spencer Bliven, Aleix Lafita
    • Method Detail

      • getSequenceAlignment

        public static List<StringgetSequenceAlignment​(MultipleAlignment alignment,
                                                        List<Integer> mapSeqToStruct)
        Calculate the sequence alignment Strings for the whole alignment. This method creates a sequence alignment where aligned residues are in uppercase and unaligned residues are in lowercase, thus providing a more compact way to represent the alignment.

        Blocks are concatenated in the order returned by MultipleAlignment.getBlocks(), so sequences may not be sequential. Gaps are represented by '-'. Separation between different Blocks is indicated by a gap in all positions, meaning that there is a possible discontinuity.

        Parameters:
        alignment - input MultipleAlignment
        mapSeqToStruct - provides a link from the sequence alignment position to the structure alignment position. Specially designed for the GUI. Has to be initialized previously and will be overwritten.
        Returns:
        a string for each row in the alignment, giving the 1-letter code for each aligned residue.
      • getSequenceAlignment

        public static List<StringgetSequenceAlignment​(MultipleAlignment msa)
        Calculate the sequence alignment Strings for the whole alignment. This method creates a sequence alignment where aligned residues are in uppercase and unaligned residues are in lowercase, thus providing a more compact way to represent the alignment.

        Blocks are concatenated in the order returned by MultipleAlignment.getBlocks(), so sequences may not be sequential. Gaps are represented by '-'. Separation between different Blocks is indicated by a gap in all positions, meaning that there is a possible discontinuity.

        Parameters:
        alignment - input MultipleAlignment
        Returns:
        String for each row in the alignment, giving the 1-letter code for each aligned residue.
      • getBlockSequenceAlignment

        public static List<StringgetBlockSequenceAlignment​(MultipleAlignment alignment,
                                                             List<Integer> mapSeqToStruct)
        Calculate the sequence alignment Strings for the alignment Blocks in an alignment. This method creates a sequence alignment where all residues are in uppercase and a residue alone with gaps in all the other structures represents unaligned residues. Because of this latter constraint only the residues within the Blocks are represented, for a more compact alignment. For a sequence alignment of the full protein use getSequenceAlignment(MultipleAlignment).

        Blocks are concatenated in the order returned by MultipleAlignment.getBlocks(), so sequences may not be sequential. Gaps between blocks are omitted, while gaps within blocks are represented by '-'. Separation between different Blocks is indicated by a gap in all positions, meaning that there is something unaligned inbetween.

        Parameters:
        alignment - input MultipleAlignment
        mapSeqToStruct - provides a link from the sequence alignment position to the structure alignment position. Specially designed for the GUI. Has to be initialized previously and will be overwritten.
        Returns:
        a string for each row in the alignment, giving the 1-letter code for each aligned residue.
      • getBlockSequenceAlignment

        public static List<StringgetBlockSequenceAlignment​(MultipleAlignment ma)
        Calculate the sequence alignment Strings for the alignment Blocks in an alignment. This method creates a sequence alignment where all residues are in uppercase and a residue alone with gaps in all the other structures represents unaligned residues. Because of this latter constraint only the residues within the Blocks are represented, for a more compact alignment. For a sequence alignment of the full protein use getSequenceAlignment(MultipleAlignment).

        Blocks are concatenated in the order returned by MultipleAlignment.getBlocks(), so sequences may not be sequential. Gaps between blocks are omitted, while gaps within blocks are represented by '-'. Separation between different Blocks is indicated by a gap in all positions, meaning that there is something unaligned inbetween.

        Parameters:
        alignment - input MultipleAlignment
        Returns:
        String for each row in the alignment, giving the 1-letter code for each aligned residue.
      • getAtomForSequencePosition

        public static Atom getAtomForSequencePosition​(MultipleAlignment msa,
                                                      List<Integer> mapSeqToStruct,
                                                      int str,
                                                      int sequencePos)
        Returns the Atom of the specified structure that is aligned in the sequence alignment position specified.
        Parameters:
        multAln - the MultipleAlignment object from where the sequence alignment has been generated
        mapSeqToStruct - the mapping between sequence and structure generated with the sequence alignment
        str - the structure index of the alignment (row)
        sequencePos - the sequence alignment position (column)
        Returns:
        Atom the atom in that position or null if there is a gap
      • getBlockForSequencePosition

        public static int getBlockForSequencePosition​(MultipleAlignment multAln,
                                                      List<Integer> mapSeqToStruct,
                                                      int sequencePos)
        Returns the block number of a specified position in the sequence alignment, given the mapping from structure to function.
        Parameters:
        multAln - the MultipleAlignment object from where the sequence alignment has been generated.
        mapSeqToStruct - the mapping between sequence and structure generated with the sequence alignment
        sequencePos - the position in the sequence alignment (column)
        Returns:
        int the block index, or -1 if the position is not aligned
      • getAverageResidueDistances

        public static Matrix getAverageResidueDistances​(MultipleAlignment msa)
        The average residue distance Matrix contains the average distance from each residue to all other residues aligned with it.

        Complexity: T(n,l) = O(l*n^2), if n=number of structures and l=alignment length.

        Parameters:
        alignment - MultipleAlignment
        Returns:
        Matrix containing all average residue distances
      • getAverageResidueDistances

        public static Matrix getAverageResidueDistances​(List<Atom[]> transformed)
        The average residue distance Matrix contains the average distance from each residue to all other residues aligned with it.

        Complexity: T(n,l) = O(l*n^2), if n=number of structures and l=alignment length.

        Parameters:
        transformed - List of Atom arrays containing only the aligned atoms of each structure, or null if there is a gap.
        Returns:
        Matrix containing all average residue distances. Entry -1 means there is a gap in the position.
      • transformAtoms

        public static List<Atom[]> transformAtoms​(MultipleAlignment alignment)
        Transforms atoms according to the superposition stored in the alignment.

        For each structure in the alignment, returns an atom for each representative atom in the aligned columns, omitting unaligned residues (i.e. an array of length alignment.length() ).

        All blocks are concatenated together, so Atoms may not appear in the same order as in their parent structure. If the alignment blocks contain null residues (gaps), then the returned array will also contain null Atoms in the same positions.

        Parameters:
        alignment - MultipleAlignment
        Returns:
        List of Atom arrays of only the aligned atoms of every structure (null Atom if a gap position)
      • getCorePositions

        public static List<IntegergetCorePositions​(Block block)
        Calculate a List of alignment indicies that correspond to the core of a Block, which means that all structures have a residue in that positon.
        Parameters:
        block - alignment Block
        Returns:
        List of positions in the core of the alignment
      • sortBlocks

        public static void sortBlocks​(List<Block> blocks,
                                      int sortedIndex)
        Sort blocks so that the specified row is in sequential order. The sort happens in place.
        Parameters:
        blocks - List of blocks
        sortedIndex - Index of the row to be sorted
      • getAlignedStructure

        public static final Structure getAlignedStructure​(List<Atom[]> atomArrays)
                                                   throws StructureException
        Get an artificial Structure containing a different model for every input structure, so that the alignment result can be viewed in Jmol. The Atoms have to be rotated beforehand.
        Parameters:
        atomArrays - an array of Atoms for every aligned structure
        Returns:
        a structure object containing a set of models, one for each input array of Atoms.
        Throws:
        StructureException
      • getRMSDMatrix

        public static Matrix getRMSDMatrix​(MultipleAlignment msa)
        Calculate the RMSD matrix of a MultipleAlignment, that is, entry (i,j) of the matrix contains the RMSD between structures i and j.
        Parameters:
        msa - Multiple Structure Alignment
        Returns:
        Matrix of RMSD with size the number of structures squared
      • getHSDMTree

        public static org.forester.phylogeny.Phylogeny getHSDMTree​(MultipleAlignment msta)
                                                            throws CompoundNotFoundException,
                                                                   IOException
        Calculate a phylogenetic tree of the MultipleAlignment using dissimilarity scores (DS), based in SDM Substitution Matrix (ideal for distantly related proteins, structure-derived) and the Neighbor Joining algorithm from forester.
        Parameters:
        msta - MultipleAlignment of protein structures
        Returns:
        Phylogeny phylogenetic tree
        Throws:
        CompoundNotFoundException
        IOException
      • getStructuralTree

        public static org.forester.phylogeny.Phylogeny getStructuralTree​(MultipleAlignment msta)
        Calculate a phylogenetic tree of the MultipleAlignment using RMSD distances and the Neighbor Joining algorithm from forester.
        Parameters:
        msta - MultipleAlignment of protein structures
        Returns:
        Phylogeny phylogenetic tree
        Throws:
        CompoundNotFoundException