Advanced Bioinformatics Center at BISR

BIRLA INSTITUTE OF SCIENTIFIC RESEARCH, JAIPUR

Summer Training Exercises

Day3

Go Back

Section - 1

Sequence and Genome Analysis

Various Genome Databases:

NCBI Genome Browser
http://www.ncbi.nlm.nih.gov/genome

Ensembl Genome Browser
http://www.ensembl.org/index.html

UCSC Genome Browser
http://hgdownload.cse.ucsc.edu/downloads.html

GENSCAN

GENSCAN is a program to identify complete gene structures in genomic DNA. It is a GHMM-based program that can be used to predict the location of genes and their exon-intron boundaries in genomic sequences from a variety of organisms. It also predicts peptide sequence of predicted genes of given genomic DNA.

Steps:

Open GenScan webpage: http://genes.mit.edu/GENSCAN.html

Select 'Organism' - can select DB organism according to organism of input DNA sequence.

Set 'Suboptimal exon cutoff' value - use 0.5 for better prediction

Choose 'Print Option' for only peptide or both peptide & cds prediction.

Browse or paste genomic DNA sequence (less than 1mb data)

Click on 'Run GenScan' button for the prediction results.

Terms used in Results of GenScan:

Gn.Ex : gene number, exon number (for reference)

Type : Init = Initial exon (ATG to 5' splice site)

Intr = Internal exon (3' splice site to 5' splice site)

Term = Terminal exon (3' splice site to stop codon)

Sngl = Single-exon gene (ATG to stop)

Prom = Promoter (TATA box / initation site)

PlyA = poly-A signal (consensus: AATAAA)

S : DNA strand (+ = input strand; - = opposite strand)

Begin : beginning of exon or signal (numbered on input strand)

End : end point of exon or signal (numbered on input strand)

Len : length of exon or signal (bp)

Fr : reading frame (a forward strand codon ending at x has frame x mod 3)

Ph : net phase of exon (exon length modulo 3)

I/Ac : initiation signal or 3' splice site score (tenth bit units)

Do/T : 5' splice site or termination signal score (tenth bit units)

CodRg : coding region score (tenth bit units)

P : probability of exon (sum over all parses containing exon)

Tscr : exon score (depends on length, I/Ac, Do/T and CodRg scores)

ORF Finder

The ORF Finder (Open Reading Frame Finder) is a graphical analysis tool which finds all open reading frames of a selectable minimum size in a user's sequence or in a sequence already in the database.

This tool identifies all open reading frames using the standard or alternative genetic codes.

The deduced amino acid sequence can be saved in various formats and searched against the sequence database using the WWW BLAST server. The ORF Finder should be helpful in preparing complete and accurate sequence submissions. It is also packaged with the Sequin sequence submission software. GENSCAN is a program to identify complete gene structures in genomic DNA. It is a GHMM-based program that can be used to predict the location of genes and their exon-intron boundaries in genomic sequences from a variety of organisms.

It also predicts peptide sequence of predicted genes of given genomic DNA.

Steps:

Open ORF Finder webpage: http://www.ncbi.nlm.nih.gov/gorf/gorf.html

Paste sequence in FASTA format or provide accesion no of sequence.

provide the nucleotide range if needed

Specify genetic code.

Click on 'Run Orffind' button for the prediction results.

BISR Primer Machine

Steps:

Open BISR Primer webpage: http://bioinfo.bisr.res.in/cgi-bin/project/primer/index.cgi

Paste sequence in FASTA format or provide accesion no of sequence.

Provide the nucleotide range if required

You can also take the test file if you are checking it for use.

Click on 'Proceed' button for the prediction results.

Result will provide the positions of nucleotide in given sequence on the basis of ranks.

VISTA- Tools for Comparative Genomics

mVISTA- Align and compare your sequences from multiple species

gVISTA- Compare your sequences against whole-genome assemblies.

wgVISTA- Align pair of sequences up to 10Mb long (finished or draft) including microbial whole-genome assemblies.

Steps for mVISTA:

Open VISTA tool from the link http://genome.lbl.gov/vista/index.shtml

At the first page you will be asked to identify the number of genomic sequences you want to analyze. Entering this number and clicking "submit" will take you to the main submission page which will contain the number of fields corresponding to the number of sequences you entered (can process up to 100 sequences).

It will ask for email address so that it can notify you when the results are ready.

Upload query sequence in Fasta format using "Browse" button (previously saved sequence).

You can specify its GenBank accession number, which will be used to automatically retrieve the sequence from the GenBank database and process on our server.

Click Submit.