Summer Training Exercises


Go Back

Section - 1

Accessing Biological Sequences and Structure Databases


The objective of this exercise is to make students aware about the various biological informations & databases available.
Here we will learn to access NCBI for nucleic and protein sequences and PDB for the structures of proteins.

Primary Biological Databases

NCBI - National Center for Biotechnology Information

  1. Division of the National Library of Medicine (NLM) at the United States National Institutes of Health
  2. Established in November of 1988 at Bethesda, Maryland, USA.

EBI (European Bioinformatics Institute) - European Molecular Biology Laboratory (EMBL)

  1. EMBL (European Molecular Biology Laboratory) is Europe's flagship laboratory for the life sciences
  2. Established in 1980 at the European Molecular Biology Laboratory in Heidelberg, Germany in September 1994
  3. EMBL-EBI was firmly established at Wellcome Trust Genome Campus in Hinxton in the UK.

DDBJ(DNA Data Bank of Japan) - National Institute of Genetics

  1. DDBJ Center collects nucleotide sequence data as a member of INSDC (International Nucleotide Sequence Database Collaboration)
  2. First asian Nucleotide Datababse Centre
  3. Actively working from 1986 at National Institute of Genetics, Japan.

Universal Protein Resource (UniProt)

  1. UniProt is a collaboration between the European Bioinformatics Institute (EMBL-EBI), the SIB Swiss Institute of Bioinformatics and the Protein Information Resource (PIR)
  2. Since 2002, it is maintained by the UniProt consortium

Protein Information Resource (PIR)

  1. Produced the Protein Sequence Database
  2. PIR Super Family : PIRSF protein classification system is a network with multiple levels of sequence diversity from superfamilies to subfamilies that reflects the evolutionary relationship of full-length proteins and domains.
  3. Established in 1984 by the National Biomedical Research Foundation (NBRF)

Protein Data Bank (structure db)

  1. Single worldwide repository of information about the 3D structures of large biological molecules, including proteins and nucleic acids.
  2. The data, typically obtained by X-ray crystallography or NMR spectroscopy
  3. Established in 1971 at Brookhaven National Laboratory
  4. In 1998, the Research Collaboratory for Structural Bioinformatics (RCSB) became responsible for the management of the PDB.
  6. PDB database actively available as:
    1. RCSB PDB: USA
    2. PDBe: Europe
    3. PDBj: Japan

Lab 1: Accessing NCBI

This is repository of all kind of the information of various nucleotide sequences as well as protein sequences. It has databases named GenBank for nuceotide sequences and GenPept for the protein sequences. NCBI, EMBL and DDBJ are linked to each other and they share and update their databases with all the information submitted on one site within 24 hours.
Here we will search for Human Insulin Gene sequence in query of nucleotide database and then search for protein sequence of this insulin protein.


  1. Open the link of NCBI (

  2. Select 'Nucleotide' and enter name 'Human Insulin'.

  3. Click on 'Search' Button

  4. Now you will get various links as information of various sequence.

  5. Click on the 'Human insulin gene, complete CDS for more information of this gene.

  6. You can click on GenBank to get the compelete information of this gene.

  7. Or you can click on 'Fasta' to get the nucleotide sequence of this gene.

Like this you can also search for the protein sequence of this query by selecting 'Protein' in database option.

Lab 2: Accessing PDB

This is the only database which stores information of structured proteins. These structures have been determined either with NMR or X-Ray crystallography techniques. These are the 3-dimensional structure of proteins. Research Collaboratory for Structural Bioinformatics (RCSB) is the owner of this database.
Here we will search a protein structure of human insulin (PDB ID: 4EX0).


  1. Open the link to open PDB database:

  2. Enter human insulin or protein ID '4EX0' in search box and click enter.

  3. This will display various results of related protein strucuters. Click on the desired protein's name. It will show the completer information of this protein.

  4. You can download pdb file of this protein by clicking on 'Download Files' --> PDB File (Text).

  5. In this pdb file you will get various information regarding to this protein such as organism name, amino acid sequence, chains, atom coordinates, ligand molecules coordinates, heteratom coordinates etc.

Go Back