Pairwise and multiple sequence alignments
Pairwise and multiple sequence alignments
Purpose: to become familiar with multiple alignment software (CLUSTALW)
available in internet, demonstrate differences between protein and nucleotide
alignments, effect of parameters chosen (such as similarity matrices, gap penalties)
on the alignments.
Cytochrome B
Cytochromes are mostly membrane-bound proteins that contain heme groups and
carry out electron transport or catalyze reductive/oxidative reactions.
In Eukaryotes cytochromes are found in the inner membrane of mitochondria
and endoplasmic reticulum
(for more information look up Campbell p. 170).
- CytBProt file contains the amino acid
sequences of the cytochrome B proteins from the mitochondrial genome of
16 vertebrate species. The sequences are labeled with species name.
Take a look at these sequences. Will very long gaps be necessary for aligning these
sequences? Why?
- Try Clustalw at
EBI (or other clustalw servers given on the bar if EBI server does not
work) to make multiple alignment of these protein sequences. Look at the
alignment using colors. First make the alignment using BLOSUM matrix, and then using the identity matrix. What is the effect of using different
scoring matrices on your alignment? Can you identify conserved regions that are
longer than 10 amino acids long?
- CytBDNA file contains the nucleotide sequences
of the cytochrome B proteins from the same species. Make alignments of DNA
sequences (DNA alignments will take longer time, be patient). What is the
difference between DNA and protein alignment? How do you explain this?
- If you make DNA alignment with default parameter values, do you see
anything "strange" in this alignment?
Which parameters do have an effect on this "strange" finding?
Are there
gaps within the sequences, and if so how large are they? Why?
- Hexokinases are enzymes that phosphorylate hexose (mainly glucose). After
phosphorylation the sugar is ready to enter some intracellular metabolic
processes.
Hexokinase file contains the amino acid
sequences of hexokinases from human and dog. Take a look at the sequences in
this file. Will long gaps be necessary in this alignment? Perform the alignment.
- What can you say about the conservation of hexokinases based on this limited data
set? Is the evolution of hexokinases or cytochrome B faster
?
- Let us now take a cytochrome B protein sequence from another kingdom, for
example from Arabidopsis thaliana . Find this sequence in NCBI (choose pull down menu to make a search in NCBI protein, and take the protein sequence that has identifier CYB_ARATH). Realign your vertebrate
sequence with this plant sequence. Are the regions you previous identified
as conserved, still conserved?
Examine Entrez entry for CYB_ARATH.
Can you conclude anything about the functional properties of the conserved regions?