|
|
Goal: to become familiar with software to make phylogenetic analysis, demonstrate differences between species and gene trees, analyse genetic events in a protein family.
We will perform phylogenetic analysis as part of the ClustalW package. Look at Hints to learn how to make phylogenetic trees.
Remarks: If you are having trouble with one of the questions, have a look at the Hints section. Write down you answers, either on paper or in digital form (e.g. a Word document). It is important to make notes! Write down for yourself (broadly) what you have done, what your results were, and try to formulate a bottom line (or in other words: what have you learned from the exercise?). If one of the webservers linked in the questions is offline or too slow, you might find alternative servers in the links section. |
|
|
|
Let us return back to Ubiquitin molecule which you studied during the previous computer exercises. Remember what you learned about the evolution of ubiquitin during these exercises. |
1. |
Constuct phylogenies using Protein and DNA sequences. Which tree gives more evolutionary information? Why? If you use the EBI for constructing the tree ( EBI Phylogeny ), first align the sequences ( EBI ClustalW ) and subsequently use the aligned sequences as input for the tree construction. |
2. |
For these species check the species tree in the Tree of Life project. A zoomed-out tree can be found here; use this tree to view all species in a single tree. Compare this species tree with the tree you made using ubiquitin sequences. What is the difference? |
|
|
|
We have collected matrix proteins from 15 corona viruses (note that only one of them is SARS) in Matrix proteins.
|
1. |
Make a phylogenetic analysis of these viruses based on the matrix protein. |
2. |
What does the tree suggest with regards to the origin of SARS? |
3. |
How does this compare to the relevant literature? |
4. |
OPTIONAL: Collect, for example, spike proteins from these corona viruses yourself. Start by extracting spike protein from bat coronavirus from Entrez. Make a BLAST search to find other spike proteins from coronaviruses. Try to generate a dataset that includes the same species as in the matrix protein dataset. Perform the phylogenetic analysis on this new data set. Do you see the same relationship as it was the case with matrix proteins? |
|
|
|
BLAST servers:
Alignment and Phylogeny webservers:
Tree of Life
NCBI Entrez
|
|
|
|
Ubiquitin
- Note that we have unaligned protein sequences in FASTA-format in this COO, so they should be aligned before you can make a phylogenetic tree! To do this, first paste the sequences in the EBI ClustalW page and align them. To make a phylogenetic tree, paste the aligned sequences in the submission form on the EBI Phylogeny page, including the first line (the line that says CLUSTAL 2.1 multiple sequence alignment). The aligned sequences can also be downloaded by clicking on the file links (top of clustalW result page). It would of course be better if the EBI server would align the sequences automatically before making a tree, but unfortunately it doesn't do this.
- Scroll down the page to see the tree. This is a phylogram, to get a cladogram press the button Show as Cladogram Tree.
- From these two figures, can you tell the difference between a cladogram and a phylogram? Both are phylogenetic trees, but a phylogram does not only indicate the relationships between the taxa, it also conveys a sense of time or rate of evolution. The temporal aspect of a phylogram is missing from a cladogram.
- In the parameters there is also a setting called CORRECT DIST. This is the same as the Kimura 2-parameter correction (see your reader), which corrects the evolutionary distances for multiple substitutions. See whether setting this has an effect on the phylograms you produce.
- Remember these trees that you obtain are UNROOTED trees. To make a rooted tree one has to use an outgroup, for example yeast ubiquitin in this example.
- On the Tree of Life Web Project page, the search function works only with latin names of the organisms. Alternatively, start at the root of the tree, and 'click your way up' to species you have in this question. The added advantage is that you get some very nice pictures along the way. Keep in mind that your trees were unrooted, that is the root can be anywhere.
SARS
- With regard to the optional exercise: There are various ways of doing this. Remember that you can search for specific organisms using by adding "organism"[ORGN] to your search (e.g. "Murine Hepatitis virus"[ORGN], "sars coronavirus"[ORGN] or "coronavirus"[ORGN]), although you will see that it is not always easy to find viruses this way as most do not have a scientific name. Once you find one spike protein the other ones are most easily found by performing a BLAST search. Limit your BLAST search to viruses only. When BLAST search is finished scroll down to alignments and select the sequences you want to retrieve. At the end of the page you will see "Get selected sequences" button. Once you have your sequences in a file in FASTA format edit the header lines (the first line) so that the first word indicates which virus this spike protein is from. You should use unique names. You need this renaming to be able to see in your phylogentic tree which sequences belong to which viruses. If you are finished with preparing this dataset, move to sequence alignments and phylogeny. If you are stuck with preparing this dataset, you can also use Spikes file that is ready made for this exercise.
|
|
|