Orthologous group clustering – description of utility

Below is a step by step way of using the Orthogroup excel file that can be found here: http://euglenadb.org/EuglenaDB_Menu/Sequence/Transcriptome/E_gracilis_transcriptome_v1.0_final/Orthologous_groups/

We will use an example of the Rab protein family to explain this, and to find orthologs (to identify paralogs you need to do it manually using blast with some specific threshold).

1. Assuming you are interested in Euglena gracilis Rab 1 proteins. You will use a known corresponding T. brucei Rab 1 sequence ID in TrytripDB (http://tritrypdb.org/tritrypdb/showApplication.do) to find this.

2. Copy the T. brucei Rab 1 sequence ID (Tb09.v4.0017) on TrytripDB, and search for this sequence ID (Tb09.v4.0017) in the Orthogroup excel spreadsheet (on a PC this is  Home > find and select > find  then click ENTER button). This should take you to T. brucei Rab 1 (column AD, row 16 in the Orthogroup spreadsheet), and possibly where it clustered with other sequence IDs which are also Rabs. The orthogroup file groups/clusters sequences at the family level.

3. Once you have found the T. brucei Rab 1 sequence ID (Tb09.v4.0017) by searching in the excel spreadsheet – which should be in column AD (under the heading T. brucei):row 16, locate the corresponding E. gracilis sequence on the same row (row 16) but under column N. The sequence IDs in that cell (column N, row 16) are the E. gracilis Rabs/Rab 1 that were clustered with T. brucei.

4. You will then copy out all the E. gracilis sequence IDs in that cell (column N, row 16), and use it to extract their corresponding sequences in the Transcriptome sequence fasta file (E_gracilis_transcriptome_final.PROTEINS.fasta).

5. You will have to repeat this process for each protein or protein families you are interested in.

You may choose to use any species (not necessary T. brucei), provided you know the sequence IDs for your protein of interest, as well as being sure that the sequence ID you picked up on any protein database is the same as that used in the orthologous group file. For instance, you may decide to use Chlamydomonas reinhardtii, and so you will go to the JGI website and try to find the Rab sequence IDs for C. reinhardtii.

Posted in Uncategorized | Leave a comment

Orthogroup clustering

We have updated the EuglenaDB with the orthogroup clustering. To access the file, please point your browser to:  http://euglenadb.org/EuglenaDB_Menu/Sequence/Transcriptome/

This is a .xlsx file named “parsed_ortho_groups.xlsx” with all the identified genes, and it is “a good starting point for any presence and absence analysis”.

The first column represent the orthogroup numbers or gene families, and are “numbered from largest to smallest”. For instance, “OG000000 is the largest gene family”.

The subsequent columns represent the taxa/species, and are in alphabetical order. To look for a particular gene in Euglena gracilis, kindly download the .xlsx file and search for the gene using a corresponding gene identifier from any species represented in the clustering. The corresponding row against E. gracilis tells you if the gene is present or absent.

Posted in Uncategorized | Leave a comment

E. gracilis transcriptome – final

We have updated the EuglenaDB database with the final draft of the E. gracilis transcriptome. The updated version (v1.0) is in the folder named:  E_gracilis_transcriptome_v1.0_final in Sequence > Transcriptome Menu  (http://euglenadb.org/EuglenaDB_Menu/Sequence/Transcriptome/)

Posted in Uncategorized | Leave a comment

Meetings

No recently scheduled meeting

Posted in Uncategorized | Leave a comment