Summer Training MAY-JULY, 2015
Section - 1
Accessing Biological Sequences and Structure Databases
The objetive of this exercise is to make students aware about the various biological informations & databases
available. The two major components in biological studies are Nucleic Acid and Proteins. So there are various
databases available for the information of both elements such as NCBI-GenBank, EBI, DDBJ, UniProt, PDB, PIR
as primary databases and PFAM, PROSITE, TrEMBL etc as secondary databases.
Here we will learn to access NCBI for nucleic and protein sequences and PDB for the structures of proteins.
Primary Biological Databases
GenBank - National Center for Biotechnology Information
- Division of the National Library of Medicine (NLM) at the United States National Institutes of Health
- Established in November of 1988 at Bethesda, Maryland, USA.
EBI (Europian Bioinformatics Institute) - Europian Molecular Biology Laboratory (EMBL)
- EMBL (Europian Molecular Biology Laboratory) is Europe's flagship laboratory for the life sciences
- Established in 1980 at the European Molecular Biology Laboratory in Heidelberg, Germany in September 1994
- EMBL-EBI was firmly established at Wellcome Trust Genome Campus in Hinxton in the UK.
DDBJ(DNA Data Bank of Japan) - National Institute of Genetics
- DDBJ Center collects nucleotide sequence data as a member of INSDC (International Nucleotide Sequence Database Collaboration)
- First asian Nucleotide Datababse Centre
- Actively working from 1986 at National Institute of Genetics, Japan.
Universal Protein Resource (UniProt)
- UniProt is a collaboration between the European Bioinformatics Institute (EMBL-EBI), the SIB Swiss Institute of Bioinformatics and the Protein Information Resource (PIR)
- Since 2002, it is maintained by the UniProt consortium
Protein Information Resource (PIR)
- Produced the Protein Sequence Database
- PIR Super Family : PIRSF protein classification system is a network with multiple levels of sequence diversity from superfamilies to subfamilies that reflects the evolutionary relationship of full-length proteins and domains.
- Established in 1984 by the National Biomedical Research Foundation (NBRF)
Protein Data Bank (structure db)
- Single worldwide repository of information about the 3D structures of large biological molecules, including proteins and nucleic acids.
- The data, typically obtained by X-ray crystallography or NMR spectroscopy
- Established in 1971 at Brookhaven National Laboratory
- In 1998, the Research Collaboratory for Structural Bioinformatics (RCSB) became responsible for the management of the PDB.
- As of Tuesday Jun 10, 2014 at 5 PM PDT there are '100843' Structures and The PDB archive is updated each week at the target time of Wednesday 00:00 UTC.
- PDB database actively available as:
- RCSB PDB: USA
- PDBe: Europe
- PDBj: Japan
- RCSB PDB: USA
Lab 1: Accessing NCBI
This is repository of all kind of the information of various nucleotide sequences as well as protein sequences. It has
databases named GenBank for nuceotide sequences and GenPept for the protein sequences. NCBI, EMBL and DDBJ are
linked to each other and they share and update their databases with all the information submitted on one site
within 24 hours.
Here we will search for Human Insulin Gene sequence in query of nucleotide database and then search for protein sequence of this insulin protein.
- Open the link of NCBI (http://www.ncbi.nlm.nih.gov)
- Select 'Nucleotide' and enter name 'Human Insulin'.
- Click on 'Search' Button
- Now you will get various links as information of various sequence.
- Click on the 'Human insulin gene, complete CDS for more information of this gene.
- You can click on GenBank to get the compelete information of this gene.
- Or you can click on 'Fasta' to get the nucleotide sequence of this gene.
Like this you can also search for the protein sequence of this query by selecting 'Protein' in database option.
Lab 2: Accessing PDB
This is the only database which stores information of structured proteins. These structures have been determined
either with NMR or X-Ray crystallography techniques. These are the 3-dimensional structure of proteins. Research
Collaboratory for Structural Bioinformatics (RCSB) is the owner of this database.
Here we will search a protein structure of human insulin (PDB ID: 4EX0).
- Open the link to open PDB database:
- Enter human insulin or protein ID '4EX0' in search box and click enter.
- This will display various results of related protein strucuters. Click on the desired protein's name. It will show the completer information of this protein.
- You can download pdb file of this protein by clicking on 'Download Files' --> PDB File (Text).
- In this pdb file you will get various information regarding to this protein such as organism name, amino acid sequence, chains, atom coordinates, ligand molecules coordinates, heteratom coordinates etc.