Announcement

BIRLA INSTITUTE OF SCIENTIFIC RESEARCH
Three-Day National Workshop
on
"Bioinformatics and Systems Biology"
06 - 08 March 2019


Day1

Go Back

Lab 1:

Section - 1: Databases in Bioinformatics

Accessing Biological Sequences and Structure Databases

Objective

The objective of this exercise is to make students aware about the various biological informations & databases available.
Here we will learn to access NCBI for nucleic and protein sequences and PDB for the structures of proteins.

Primary Biological Databases

NCBI - National Center for Biotechnology Information

  1. Division of the National Library of Medicine (NLM) at the United States National Institutes of Health
  2. Established in November of 1988 at Bethesda, Maryland, USA.
  3. http://www.ncbi.nlm.nih.gov/

EBI (European Bioinformatics Institute) - European Molecular Biology Laboratory (EMBL)

  1. EMBL (European Molecular Biology Laboratory) is Europe's flagship laboratory for the life sciences
  2. Established in 1980 at the European Molecular Biology Laboratory in Heidelberg, Germany in September 1994
  3. EMBL-EBI was firmly established at Wellcome Trust Genome Campus in Hinxton in the UK.
  4. http://www.ebi.ac.uk/about

DDBJ(DNA Data Bank of Japan) - National Institute of Genetics

  1. DDBJ Center collects nucleotide sequence data as a member of INSDC (International Nucleotide Sequence Database Collaboration)
  2. First asian Nucleotide Datababse Centre
  3. Actively working from 1986 at National Institute of Genetics, Japan.
  4. http://www.ddbj.nig.ac.jp/

Protein Data Bank (structure db)

  1. Single worldwide repository of information about the 3D structures of large biological molecules, including proteins and nucleic acids.
  2. The data, typically obtained by X-ray crystallography or NMR spectroscopy
  3. Established in 1971 at Brookhaven National Laboratory
  4. In 1998, the Research Collaboratory for Structural Bioinformatics (RCSB) became responsible for the management of the PDB.
  5. http://rcsb.org/pdb/home/home.do
  6. PDB database actively available as:
    1. RCSB PDB: USA
    2. PDBe: Europe
    3. PDBj: Japan


PubChem

PubChem, released in 2004, provides information on the biological activities of small molecules.

PubChem is organized as three linked databases within the NCBI's Entrez information retrieval system.

These are PubChem Substance, PubChem Compound, and PubChem BioAssay.

PubChem also provides a fast chemical structure similarity search tool.

PubChem
https://pubchem.ncbi.nlm.nih.gov

DrugBank Version 5.0

PubChem, released in 2004, provides information on the biological activities of small molecules.

PubChem is organized as three linked databases within the NCBI's Entrez information retrieval system.

These are PubChem Substance, PubChem Compound, and PubChem BioAssay.

PubChem also provides a fast chemical structure similarity search tool.

3.DrugBank Version 5.0
https://www.drugbank.ca/



Go Back



Datababse Searching Using BLAST

Section -2

Basic Local Alignment Search Tool (BLAST) is a dynamic programming algorithm for sequence alignment.

It is based on smith-waterman algorithm, means on the concept of local alignment.

For an alignment blast uses a query sequence which is taken for the alignment and one or more subject seqeunces on which the query sequence is aligned.

Nucleotide Blast Search a nucleotide database using a nucleotide query
protein blast Search protein database using a protein query
blastx Search protein database using a translated nucleotide query
tblastn Search translated nucleotide database using a protein query
tblastx Search translated nucleotide database using a translated nucleotide query


Here we will go for alignment of nucleotide query sequence i.e. Human Insulin with complete nucleotide database.

STEPS:

  1. Open BLAST tool
    Link is :http://blast.ncbi.nlm.nih.gov/Blast.cgi

  2. Click on nucleotide blast (blastn).


  3. Enter fasta sequence or GI number or accession number of query sequence into 'Enter Query Sequence'.

  4. You can set further parameters for blast.

  5. The parameters set there are default, if any change is made in these parameters, a light yellow strip is displayed under that parameter.

  6. After setting up all the parameters click on 'BLAST' button to run blast.

  7. This will take to the result page in some time.

Result page of BLAST

Result inculudes graphics summary, descriptions, dot plots, alignment, scores etc.

In graphics summary result is shown as different color showing a range of percent similarity score.Towards black color, less alignment score means less similarity and towards red color more alignment score means more similarity.

The very first red strip below the range strip with numbers below show the query sequence with its length. After that all colored lines show alignment of query sequence with various (subject) sequences available in database.

After that description of subject sequences are given. It includes name of subject sequence, scores, indentity with query, e-value and the link of subject sequence.

After 'Description' alignment of query sequence with various subject sequences are given. It shows base to base alignment with matches, mismatches and gaps as well.

Section - 3

Go Back

[A.] Visualization of Protein using RasMol

Introduction

Rasmol is a computer program written for molecular graphics visualization intended and used primarily for the depiction and exploration of biological macromolecule structures.

Getting Started

  • Start Rasmol from your computer's Dektop.

  • This will open two Windows (one Black window, another white command-line window)

  • Commands below preceded by M are best done from the pull-down Menus. Command NOT preceded by M must be typed in the white command-line window. (RasMol has two windows, one black and one white. On windows, the white command-line window starts minimized. look for it on the taskbar. Command with blue colour listed below seprated by semicolons should be typed on separate lines into the white window, pressing Enter after each command.

  • Run RasMol, and do M(enu) File-- Open. Select 1d66.pdb(gal4 transcriptional regulator complexed to DNA).

  1. How many Chains are there?

    • reset; rotate z 90; zoom 150; rotate y 40

    • M(enu) Display-- Backbone, M colours-- Chain
      (Now each chain in different colour. Click on each chain to report its ID letter code)

  2. Is there anything else in this PDB file besides the protein/DNA chains?

    • select hetero; M Display—Spacefill
      (Now you can see oxygen from water in the X-rayed crystal.)

    • M Colours-- CPK

    • Restrict not water
      (This hides water; Click on what remains to find out what it is.)

  3. What are the hydrophobic aminoacids?

    • select hydrophobic; color magenta; wireframe 0.4; select not water

    • M Display-- Spacefill; M Option-- Slab mode

  4. What holds the CD ions in place?

    • M Display-- Backbone; M Colours--Chain

    • select cd; M Display-- Spacefill; M Colours-- CPK

    • select within(2.6, cd)
      (This selects all atoms within 2.6 Angstroms of the Cd++ ions)

    • M Display-- Spacefill; M Colours-- CPK

    • save script myview1.spt; M File-- Close; script myview1.spt
      (Restore script)

  5. Where are the alpha helices and beta strands?

    • M Edit-- Select all; M Display-- Backbone; M Colours-- Structure
      (This colors alpha helices purple, and beta strands yellow (there aren't any beta strand in 1d66.pdb) Turns appear blue).

  6. How do I find distance between two atoms?

    • M Display-- Spacefill; set picking
      (Now click on two atoms, and watch the report in command line window)

    • set picking monitor
      (Now click on two atoms and watch report at graphic window)

    • color monitor white; set monitor off; monitor off; set picking ident.

  7. How do I see the inside of a molecule?

  8. Don't rotate the molecule with mouse at any time during this sequence.

    • reset; M Edit-- Select All; M Display-- Spacefill; M Colour-- Chain.

    • rotate x 83; zoom 200; M Option-- Hetero Atoms
      (Toggle off waters)

    • select dna; color cpk; M Option-- Slab Mode
      (Toggle on slab mode)

    • slabmode section;
      (Now only cut face is shown)

    • slab 76
      (Now you can see GC base pair).

    • slab 68

  9. How do I keep the DNA from rotating off screen?

    • reset; restrict dna; rotate z 90;
      (Try rotating around the axis of the DNA helix, move the mouse up and down)

    • center selected
      (Now try again and notice the difference).

  10. How do I get multiple representation of the same atoms?

    • restrict :d; M Colours-- CPK
      (:d means all atoms in chain D)

    • M Display-- Backbone, M Display-- Ball & Stick

    • backbone 1.
      (Be sure to include the decimal point after the 1, which make RasMol interpret it as Angstroms)

    • spacefill off; wireframe 0.5; wireframe 0.1; spacefill 0.3; backbone 0.1; zoom 500

  11. How do I label an atom?

    • set picking label
      (Now click on a few atoms)

    • color labels white; label off; set picking ident
      (Click on an atom and notice its own ID number(3rd word in the report). We'll refer to the number as ### in the command below)

    • select atomno = ###; label "My Favorite atom"; label off

  12. How do I see the molecule in stereo?

    • M Option-- Stereo; stereo -5

  13. Where are the disulphide bonds?

    • M Display-- Wireframe; ssbonds 0.8; color ssbonds yellow

    • M Edit-- Select All; M Display-- Backbone; M Colour-- Chain

    • set ssbonds backbone

  14. Where are the hydrogen bonds?

    • M Edit-- Select All; M Display--Backbone; M Colour-- Structure

    • restrict helix; backbone 0; hbonds 0.5; color hbonds white

    • set hbonds backbone; hbonds off; restrict sheet; restrict not (helix or sheet)

  15. Some powerful command with select

    • select single Amino acid----use three letter abbreviation like lue27.
      Example select lue27

    • Amino acid type........... abbreviation like aa type
      Example select asp

    • Entire molecule
      Example select *

    • Chain -select *chain letter
      select *A

    • Select protein and select not protein.

  16. Molecule description with show command.....

    • show information

    • show selected

    • show sequence

  17. Slab mode.......

    • Allows you to look at internal regions of the molecule, by slicing on the z axis.

    • Zero is define as behind the molecule and 100 is defined as in front of the molecule.

    • slab 0-100, example slab 50 cuts at approximately the mid line of the molecule.

  18. Stereo Command..........

    • Provides side by side display of the molecule.

    • Make sure entire molecule is selected

    • Ras MOL>stereo.

  19. On and Off commands..........


CommandAction
wireframe on/off Display wireframe
wireframe 0.2 Display stick bonds of radius 0.2
backbone on/off Display CA backbone only
backbone 0.2 Display CA backbone as sticks of radius 0.2
spacefill on/offAtoms as spacefill spheres
spacefill 1.0 Atoms as spheres of radius 1.0
ribbons on/off Display molecule ribbons
ribbons 1.0 Display molecule ribbons, width 1.0
cartoon on/off Display protein cartoon
dots on/off Display dot-surface about all atoms
dots 10 Display low-density dot-surface
hbonds on/off Display hydrogen bonds
hbonds 0.1 Display hydrogen bonds, radius 0.1
set axes on/off Display coordinate axes
set boundbox on/off         Display bounding box

Go Back

Section - 4 Protein Structure Prediction: Homology Modeling

Steps:

  1. Take a protein sequence in FASTA format whose structure is to be modelled. This is our 'Target Sequence'.

  2. 		For example
    		>CAA08766.1 insulin, partial [Homo sapiens]
    		MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAE
    		
  3. Now go to the webpage http://swissmodel.expasy.org/



  4. Paste target sequence into swiss model workspace in FASTA format

    (you can also upload target sequence )

    you can provide a Project title and email-id-

    You have 2 options now you can either search for templates or build models directly




  5. Press the button 'build model'.


  6. It will build a model for you and the result will after a few seconds.





  7. Swissmodel provide 3 best results according to their score


    It provide- Sequence identity, Alignment and Structure


  8. We can also download the detailed results

Go Back

Protein Structure Validation

The Structure Analysis and Verification using SAVES Server

Steps:

  1. Open SAVES SERVER webpage: http://services.mbi.ucla.edu/SAVES/
  2. Upload the PDB file.
  3. RUN all programs


Section - 5: Molecule Sketching

[A.] To understand and draw molecule sketching , SMILE file format and calculating properties

Introduction

Biologically active compound act as a drug.these software is helpful for sketching molecule. Smiles file format is The Simplified Molecular Input Line Entry Specification (SMILES) a line notation for molecules. SMILES strings include connectivity but do not include 2D or 3D coordinates.

Hydrogen atoms are not represented. Other atoms are represented by their element symbols B, C, N, O, F, P, S, Cl, Br, and I. The symbol "=" represents double bonds and "#" represents triple bonds. Branching is indicated by (). Rings are indicated by pairs of digits.

Name      Formula      SMILES String
Methane      CH4        C
Ethanol      C2H6O        CCO
Benzene      C6H6        C1=CC=CC=C1 or c1ccccc1
Ethylene      C2H4        C=C

STEPS:

  • Open the site
    http://www.molinspiration.com

  • click on free on-line cheminformatics services

  • Use the icon | for single carbon || for double bond

  • Red cross is eraser

  • Sketch molecule aspirin.

  • For it Open site
    http://www.ncbi.nlm.nih.gov/sites/entrez?db=pccompound

  • search for aspirin

  • For smiles format take it from pubchem canonical smiles

  • paste smiles format CC(=O)OC1=CC=CC=C1C(=O)O in paste smiles here.

  • click on calculate properties and predict bioactivity

  • You will get output predict bioactivity

  • GPCR ligand -0.66
    Ion channel modulator -0.91
    Kinase inhibitor -0.49
    Nuclear receptor ligand -1.23

  • You will get output for calculate properties.


  • miLogP 1.434
    TPSA 63.604
    natoms 13
    MW 180.159
    nON 4
    nOHNH 1
    nviolations 0
    nrotb 3
    volume 155.574