Orthologous group clustering – description of utility

Below is a step by step way of using the Orthogroup excel file that can be found here: http://euglenadb.org/EuglenaDB_Menu/Sequence/Transcriptome/E_gracilis_transcriptome_v1.0_final/Orthologous_groups/

We will use an example of the Rab protein family to explain this, and to find orthologs (to identify paralogs you need to do it manually using blast with some specific threshold).

1. Assuming you are interested in Euglena gracilis Rab 1 proteins. You will use a known corresponding T. brucei Rab 1 sequence ID in TrytripDB (http://tritrypdb.org/tritrypdb/showApplication.do) to find this.

2. Copy the T. brucei Rab 1 sequence ID (Tb09.v4.0017) on TrytripDB, and search for this sequence ID (Tb09.v4.0017) in the Orthogroup excel spreadsheet (on a PC this is  Home > find and select > find  then click ENTER button). This should take you to T. brucei Rab 1 (column AD, row 16 in the Orthogroup spreadsheet), and possibly where it clustered with other sequence IDs which are also Rabs. The orthogroup file groups/clusters sequences at the family level.

3. Once you have found the T. brucei Rab 1 sequence ID (Tb09.v4.0017) by searching in the excel spreadsheet – which should be in column AD (under the heading T. brucei):row 16, locate the corresponding E. gracilis sequence on the same row (row 16) but under column N. The sequence IDs in that cell (column N, row 16) are the E. gracilis Rabs/Rab 1 that were clustered with T. brucei.

4. You will then copy out all the E. gracilis sequence IDs in that cell (column N, row 16), and use it to extract their corresponding sequences in the Transcriptome sequence fasta file (E_gracilis_transcriptome_final.PROTEINS.fasta).

5. You will have to repeat this process for each protein or protein families you are interested in.

You may choose to use any species (not necessary T. brucei), provided you know the sequence IDs for your protein of interest, as well as being sure that the sequence ID you picked up on any protein database is the same as that used in the orthologous group file. For instance, you may decide to use Chlamydomonas reinhardtii, and so you will go to the JGI website and try to find the Rab sequence IDs for C. reinhardtii.

This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *