Three-Day National Workshop
on
"Bioinformatics and Systems Biology"
06 - 08 March 2019
Go Back
Lab 1:
Section - 1: Databases in Bioinformatics
Accessing Biological Sequences and Structure Databases
Objective
The objective of this exercise is to make students aware about the various biological informations & databases
available.
Here we will learn to access NCBI for nucleic and protein sequences and PDB for the structures of proteins.
Primary Biological Databases
NCBI - National Center for Biotechnology Information
- Division of the National Library of Medicine (NLM) at the United States National Institutes of Health
- Established in November of 1988 at Bethesda, Maryland, USA.
- http://www.ncbi.nlm.nih.gov/
EBI (European Bioinformatics Institute) - European Molecular Biology Laboratory (EMBL)
- EMBL (European Molecular Biology Laboratory) is Europe's flagship laboratory for the life sciences
- Established in 1980 at the European Molecular Biology Laboratory in Heidelberg, Germany in September 1994
- EMBL-EBI was firmly established at Wellcome Trust Genome Campus in Hinxton in the UK.
- http://www.ebi.ac.uk/about
DDBJ(DNA Data Bank of Japan) - National Institute of Genetics
- DDBJ Center collects nucleotide sequence data as a member of INSDC (International Nucleotide Sequence Database Collaboration)
- First asian Nucleotide Datababse Centre
- Actively working from 1986 at National Institute of Genetics, Japan.
- http://www.ddbj.nig.ac.jp/
Protein Data Bank (structure db)
- Single worldwide repository of information about the 3D structures of large biological molecules, including proteins and nucleic acids.
- The data, typically obtained by X-ray crystallography or NMR spectroscopy
- Established in 1971 at Brookhaven National Laboratory
- In 1998, the Research Collaboratory for Structural Bioinformatics (RCSB) became responsible for the management of the PDB.
- http://rcsb.org/pdb/home/home.do
- PDB database actively available as:
- RCSB PDB: USA
- PDBe: Europe
- PDBj: Japan
- RCSB PDB: USA
PubChem
PubChem, released in 2004, provides information on the biological activities of small molecules.
PubChem is organized as three linked databases within the NCBI's Entrez information retrieval system.
These are PubChem Substance, PubChem Compound, and PubChem BioAssay.
PubChem also provides a fast chemical structure similarity search tool.
PubChem
https://pubchem.ncbi.nlm.nih.gov
DrugBank Version 5.0
PubChem, released in 2004, provides information on the biological activities of small molecules.
PubChem is organized as three linked databases within the NCBI's Entrez information retrieval system.
These are PubChem Substance, PubChem Compound, and PubChem BioAssay.
PubChem also provides a fast chemical structure similarity search tool.
3.DrugBank Version 5.0
https://www.drugbank.ca/
Go Back
Section -2
Basic Local Alignment Search Tool (BLAST) is a dynamic programming algorithm for sequence alignment.
It is
based on smith-waterman algorithm, means on the concept of local alignment.
For an alignment blast uses a
query sequence which is taken for the alignment and one or more subject seqeunces on which the query sequence
is aligned.
Nucleotide Blast | Search a nucleotide database using a nucleotide query |
protein blast | Search protein database using a protein query |
blastx | Search protein database using a translated nucleotide query |
tblastn | Search translated nucleotide database using a protein query |
tblastx | Search translated nucleotide database using a translated nucleotide query |
Here we will go for alignment of nucleotide query sequence i.e. Human Insulin with complete nucleotide database.
STEPS:
- Open BLAST tool
Link is :http://blast.ncbi.nlm.nih.gov/Blast.cgi - Click on nucleotide blast (blastn).
- Enter fasta sequence or GI number or accession number of query sequence into 'Enter Query Sequence'.
- You can set further parameters for blast.
- The parameters set there are default, if any change is made in these parameters, a light yellow strip is displayed under that parameter.
- After setting up all the parameters click on 'BLAST' button to run blast.
- This will take to the result page in some time.
Result page of BLAST
Result inculudes graphics summary, descriptions, dot plots, alignment, scores etc.
In graphics summary result is shown as different color showing a range of percent similarity score.Towards
black color, less alignment score means less similarity and towards red color more alignment score means
more similarity.
The very first red strip below the range strip with numbers below show the query sequence with its length.
After that all colored lines show alignment of query sequence with various (subject) sequences available in
database.
After that description of subject sequences are given. It includes name of subject sequence, scores,
indentity with query, e-value and the link of subject sequence.
After 'Description' alignment of query sequence with various subject sequences are given. It shows base to
base alignment with matches, mismatches and gaps as well.
Section - 3
Go Back
[A.] Visualization of Protein using RasMol
Introduction
Rasmol is a computer program written for molecular graphics visualization intended and used primarily for the depiction and exploration of biological macromolecule structures.
Getting Started
- Start Rasmol from your computer's Dektop.
- This will open two Windows (one Black window, another white command-line window)
- Commands below preceded by M are best done from the pull-down Menus. Command NOT preceded by M must be typed in the white command-line window. (RasMol has two windows, one black and one white. On windows, the white command-line window starts minimized. look for it on the taskbar. Command with blue colour listed below seprated by semicolons should be typed on separate lines into the white window, pressing Enter after each command.
- Run RasMol, and do M(enu) File-- Open. Select 1d66.pdb(gal4 transcriptional regulator complexed to DNA).
- How many Chains are there?
- reset; rotate z 90; zoom 150; rotate y 40
- M(enu) Display-- Backbone, M colours-- Chain
(Now each chain in different colour. Click on each chain to report its ID letter code) - Is there anything else in this PDB file besides the protein/DNA chains?
- select hetero; M Display—Spacefill
(Now you can see oxygen from water in the X-rayed crystal.) - M Colours-- CPK
- Restrict not water
(This hides water; Click on what remains to find out what it is.) - What are the hydrophobic aminoacids?
- select hydrophobic; color magenta; wireframe 0.4; select not water
- M Display-- Spacefill; M Option-- Slab mode
- What holds the CD ions in place?
- M Display-- Backbone; M Colours--Chain
- select cd; M Display-- Spacefill; M Colours-- CPK
- select within(2.6, cd)
(This selects all atoms within 2.6 Angstroms of the Cd++ ions) - M Display-- Spacefill; M Colours-- CPK
- save script myview1.spt; M File-- Close; script myview1.spt
(Restore script) - Where are the alpha helices and beta strands?
- M Edit-- Select all; M Display-- Backbone; M Colours-- Structure
(This colors alpha helices purple, and beta strands yellow (there aren't any beta strand in 1d66.pdb) Turns appear blue). - How do I find distance between two atoms?
- M Display-- Spacefill; set picking
(Now click on two atoms, and watch the report in command line window) - set picking monitor
(Now click on two atoms and watch report at graphic window) - color monitor white; set monitor off; monitor off; set picking ident.
- How do I see the inside of a molecule?
- Don't rotate the molecule with mouse at any time during this sequence.
- reset; M Edit-- Select All; M Display-- Spacefill; M Colour-- Chain.
- rotate x 83; zoom 200; M Option-- Hetero Atoms
(Toggle off waters) - select dna; color cpk; M Option-- Slab Mode
(Toggle on slab mode) - slabmode section;
(Now only cut face is shown) - slab 76
(Now you can see GC base pair). - slab 68
- How do I keep the DNA from rotating off screen?
- reset; restrict dna; rotate z 90;
(Try rotating around the axis of the DNA helix, move the mouse up and down) - center selected
(Now try again and notice the difference). - How do I get multiple representation of the same atoms?
- restrict :d; M Colours-- CPK
(:d means all atoms in chain D) - M Display-- Backbone, M Display-- Ball & Stick
- backbone 1.
(Be sure to include the decimal point after the 1, which make RasMol interpret it as Angstroms) - spacefill off; wireframe 0.5; wireframe 0.1; spacefill 0.3; backbone 0.1; zoom 500
- How do I label an atom?
- set picking label
(Now click on a few atoms) - color labels white; label off; set picking ident
(Click on an atom and notice its own ID number(3rd word in the report). We'll refer to the number as ### in the command below) - select atomno = ###; label "My Favorite atom"; label off
- How do I see the molecule in stereo?
- M Option-- Stereo; stereo -5
- Where are the disulphide bonds?
- M Display-- Wireframe; ssbonds 0.8; color ssbonds yellow
- M Edit-- Select All; M Display-- Backbone; M Colour-- Chain
- set ssbonds backbone
- Where are the hydrogen bonds?
- M Edit-- Select All; M Display--Backbone; M Colour-- Structure
- restrict helix; backbone 0; hbonds 0.5; color hbonds white
- set hbonds backbone; hbonds off; restrict sheet; restrict not (helix or sheet)
- Some powerful command with select
- select single Amino acid----use three letter abbreviation like lue27.
Example select lue27 - Amino acid type........... abbreviation like aa type
Example select asp - Entire molecule
Example select * - Chain -select *chain letter
select *A - Select protein and select not protein.
- Molecule description with show command.....
- show information
- show selected
- show sequence
- Slab mode.......
- Allows you to look at internal regions of the molecule, by slicing on the z axis.
- Zero is define as behind the molecule and 100 is defined as in front of the molecule.
- slab 0-100, example slab 50 cuts at approximately the mid line of the molecule.
- Stereo Command..........
- Provides side by side display of the molecule.
- Make sure entire molecule is selected
- Ras MOL>stereo.
- On and Off commands..........
Command | Action |
---|---|
wireframe on/off | Display wireframe |
wireframe 0.2 | Display stick bonds of radius 0.2 |
backbone on/off | Display CA backbone only |
backbone 0.2 | Display CA backbone as sticks of radius 0.2 |
spacefill on/off | Atoms as spacefill spheres |
spacefill 1.0 | Atoms as spheres of radius 1.0 |
ribbons on/off | Display molecule ribbons |
ribbons 1.0 | Display molecule ribbons, width 1.0 |
cartoon on/off | Display protein cartoon |
dots on/off | Display dot-surface about all atoms |
dots 10 | Display low-density dot-surface |
hbonds on/off | Display hydrogen bonds |
hbonds 0.1 | Display hydrogen bonds, radius 0.1 |
set axes on/off | Display coordinate axes |
set boundbox on/off | Display bounding box |
Go Back
Section - 4 Protein Structure Prediction: Homology Modeling
Steps:
- Take a protein sequence in FASTA format whose structure is to be modelled. This is our 'Target Sequence'.
- Now go to the webpage http://swissmodel.expasy.org/
- Paste target sequence into swiss model workspace in FASTA format
(you can also upload target sequence )
you can provide a Project title and email-id-
You have 2 options now you can either search for templates or build models directly
- It will build a model for you and the result will after a few seconds.
- We can also download the detailed results
For example >CAA08766.1 insulin, partial [Homo sapiens] MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAE
Press the button 'build model'.
Swissmodel provide 3 best results according to their score
It provide- Sequence identity, Alignment and Structure
Go Back
Protein Structure Validation
The Structure Analysis and Verification using SAVES Server
Steps:
- Open SAVES SERVER webpage: http://services.mbi.ucla.edu/SAVES/
- Upload the PDB file.
- RUN all programs
Section - 5: Molecule Sketching
[A.] To understand and draw molecule sketching , SMILE file format and calculating properties
Introduction
Biologically active compound act as a drug.these software is helpful for sketching molecule. Smiles file
format is The Simplified Molecular Input Line Entry Specification (SMILES) a line notation for
molecules. SMILES strings include connectivity but do not include 2D or 3D coordinates.
Hydrogen atoms are not represented. Other atoms are represented by their element symbols
B, C, N, O, F, P, S, Cl, Br, and I. The symbol "=" represents double bonds and "#" represents triple
bonds. Branching is indicated by (). Rings are indicated by pairs of digits.
Name Formula SMILES String
Methane CH4 C
Ethanol C2H6O CCO
Benzene C6H6 C1=CC=CC=C1 or c1ccccc1
Ethylene C2H4 C=C
STEPS:
- Open the site
http://www.molinspiration.com - click on free on-line cheminformatics services
- Use the icon | for single carbon || for double bond
- Red cross is eraser
- Sketch molecule aspirin.
- For it Open site
http://www.ncbi.nlm.nih.gov/sites/entrez?db=pccompound - search for aspirin
- For smiles format take it from pubchem canonical smiles
- paste smiles format CC(=O)OC1=CC=CC=C1C(=O)O in paste smiles here.
- click on calculate properties and predict bioactivity
- You will get output predict bioactivity
- You will get output for calculate properties.
GPCR ligand | -0.66 |
Ion channel modulator | -0.91 |
Kinase inhibitor | -0.49 |
Nuclear receptor ligand | -1.23 |
miLogP | 1.434 |
TPSA | 63.604 |
natoms | 13 |
MW | 180.159 |
nON | 4 |
nOHNH | 1 |
nviolations | 0 |
nrotb | 3 |
volume | 155.574 |