Pairwise and multiple sequence alignments

Pairwise and multiple sequence alignments


Purpose: to become familiar with multiple alignment software (CLUSTALW) available in internet, demonstrate differences between protein and nucleotide alignments, effect of parameters chosen (such as similarity matrices, gap penalties) on the alignments.


Cytochrome B

Cytochromes are mostly membrane-bound proteins that contain heme groups and carry out electron transport or catalyze reductive/oxidative reactions. In Eukaryotes cytochromes are found in the inner membrane of mitochondria and endoplasmic reticulum (for more information look up Campbell p. 170).

  1. CytBProt file contains the amino acid sequences of the cytochrome B proteins from the mitochondrial genome of 16 vertebrate species. The sequences are labeled with species name. Take a look at these sequences. Will very long gaps be necessary for aligning these sequences? Why?
  2. Try Clustalw at EBI (or other clustalw servers given on the bar if EBI server does not work) to make multiple alignment of these protein sequences. Look at the alignment using colors. First make the alignment using BLOSUM matrix, and then using the identity matrix. What is the effect of using different scoring matrices on your alignment? Can you identify conserved regions that are longer than 10 amino acids long?
  3. CytBDNA file contains the nucleotide sequences of the cytochrome B proteins from the same species. Make alignments of DNA sequences (DNA alignments will take longer time, be patient). What is the difference between DNA and protein alignment? How do you explain this?
  4. If you make DNA alignment with default parameter values, do you see anything "strange" in this alignment? Which parameters do have an effect on this "strange" finding? Are there gaps within the sequences, and if so how large are they? Why?
  5. Hexokinases are enzymes that phosphorylate hexose (mainly glucose). After phosphorylation the sugar is ready to enter some intracellular metabolic processes. Hexokinase file contains the amino acid sequences of hexokinases from human and dog. Take a look at the sequences in this file. Will long gaps be necessary in this alignment? Perform the alignment.
  6. What can you say about the conservation of hexokinases based on this limited data set? Is the evolution of hexokinases or cytochrome B faster ?
  7. Let us now take a cytochrome B protein sequence from another kingdom, for example from Arabidopsis thaliana . Find this sequence in NCBI (choose pull down menu to make a search in NCBI protein, and take the protein sequence that has identifier CYB_ARATH). Realign your vertebrate sequence with this plant sequence. Are the regions you previous identified as conserved, still conserved? Examine Entrez entry for CYB_ARATH. Can you conclude anything about the functional properties of the conserved regions?