Major Databases
and Repositories of Biomedical Information
- Major Databases and Search Portals:
- NCBI,
National
Center for Biotechnology Information, a division of the National
Library
of Medicine (NLM) at the National Institutes of Health (NIH).
- EBI,
European Bioinformatics Institute
is a centre for research and services in bioinformatics. The Institute
manages databases of biological data including nucleic acid, protein
sequences
and macromolecular structures.
- SRS is a data
retrieval system that integrates heterogeneous databanks in molecular
biology
and genome analysis. It currently provides access to over 300 different
databanks.
- GenomeNet
is a Japanese network
of database and computational services for genome research and related
research areas in molecular and cellular biology.
- ANGIS,
Australian National
Genomic Information Service
- ExPASy,
Expert Protein Analysis
System, proteomics server of the Swiss Institute of
Bioinformatics
(SIB).
- BCM
Search Launcher
The Baylor College of Medicine Search Launcher is an on-going project
to
organize molecular biology-related search and analysis services
available
on the WWW by function by providing a single point-of-entry for related
searches.
- SeWeR
SEquence analysis using WEb Resources. SeWeR is an integrated portal to
common web-based services in bioinformatics. It is written entirely in
JavaScript1.2. Hence it will run only in Netscape 4.0 or higher and
Internet
Explorer 4.0 or higher.
- Text Queries of Multiple Databases:
- STAG
"Search
Texts in All over the Genomenet" a metasearch interface from Japan,
allowing
to perform a simultaneous text query of the following
databases:
GenBank, EMBL, EPD, RefSeq, SWISS-PROT, PIR, PRF, PDBSTR, PROSITE,
PRINTS,
COMPOUND, ENZYME, GENES, GENOMES, BRITE, OMIM, LITDB, PDB, TRANSFAC,
PMD,
AAindex, Pfam
- euGenes
Genomic
Information for Eukaryotic Organisms. The goal of this developing site
is to provide a common interface for the major model eukaryotic
organism
databases, which include Drosophila melanogaster (FlyBase), Homo
sapiens (NCBI), Mus musculus (MGD), Arabidopsis thaliana
(AtDB), Caenorhabditis elegans (ACeDB), Saccharomyces
cerevisiae
(SGD & YPD) and Danio rerio (ZFIN).
- Keynet
A Keywords Database
for Biosequences Functional Organization. Keynet is a database of
Keywords
extracted from EMBL and GenBank databases. The Keynet structure is
based
on biological criteria aimed to assist the user in data searching and
to
minimize the risk of loss of information.
- PBIL
(Pôle
Bio-Informatique Lyonnais) server allows to browse through the
following
general and specialized sequence databases: GenBank, NBRF, EMBL,
SWISS-PROT/TrEMBL,
Hovergen, Hobacgen-nucl, Hobacgen-Protein, RTKdb, NRSub and EMGLib.
- Biocatalog,
a directory of the general interest software in Molecular Biology and
Genetics
- Major
DNA Databases: Each
of the following three databases collects a portion of the total
sequence
data reported worldwide, and all new and updated database entries
are exchanged between them on a daily basis:
- GenBank
- EMBL
Nucleotide Sequence
Database constitutes Europe's primary nucleotide sequence resource.
Main sources for DNA and RNA sequences are direct submissions from
individual
researchers, genome sequencing projects and patent applications.
- DDBJ,
DNA Data Bank
of Japan is one of the tree major DNA data depositories, along with
NCBI
and EBI.
- ENCODE Project: ENCyclopedia Of DNA Elements. This project aims to identify all functional elements in the human genome sequence.
- Major RNA
Databases:
- G~tRNA~db: The Genomic tRNA Database. This genomic tRNA database contains tRNA identifications made by the program tRNAscan-SE on complete or nearly complete genomes.
- RNA modification database provides a comprehensive listing of posttranscriptionally modified nucleosides from RNA.
- tRNA sequences and sequences of tRNA genes. This compilation contains 3279 sequences of tRNAs and tRNA genes covering the literature up to December 1996.
Searches can now be performed using the Utah mirror interface.
- UTResource collect data and analysis tool for the functional classification of 5' and 3'UTRs of eukaryotic mRNAs.
- Regulatory noncoding RNAs database. The
noncoding RNA (ncRNA) database is intended to provide information on
the sequences and functions of transcripts which do not code for
proteins, but perform regulatory roles in the cell. The sequences
included in the database have been at least partially characterized in
terms of their function or expression.
- Links to various RNA Databases
- RNA World Website at FLI Jena. This web resource lists Internet links on RNA related topics.
- Major Protein
Databases:
NB: for
comprehensive protein
links go to page Proteins
- SWISS-PROT
is
a curated protein sequence database which strives to provide a high
level
of annotations (such as the description of the function of a protein,
its
domains structure, post-translational modifications, variants, etc.), a
minimal level of redundancy and high level of integration with other
databases
- PROSITE
is a database of
protein families and domains. It consists of biologically significant
sites,
patterns and profiles that help to reliably identify to which known
protein
family (if any) a new sequence belongs
- PIR,
Protein Information Resource
-- a comprehensive, non-redundant, expertly annotated, fully classified
and extensively cross-referenced protein sequence database. The PIR-PSD,
iProClass
and other PIR auxiliary databases provide an integration of sequences,
functional, and structural information to support genomics and
proteomics
research.
- MIPS,
Munich Information
Centre for Protein Sequences.
- PEDANT
Protein
Extraction, Description, and Analysis Tool
- OWL is a
non-redundant
composite of 4 publicly-available primary sources: SWISS-PROT, PIR
(1-3),
GenBank (translation) and NRL-3D. SWISS-PROT is the highest priority
source,
all others being compared against it to eliminate identical and
trivially-different
sequences. The strict redundancy criteria render OWL relatively "small"
and hence efficient in similarity searches.
- EBI
Proteome Analysis database
provides comprehensive statistical and comparative analyses of the
predicted
proteomes of fully sequenced organisms. The analysis is compiled using
InterPro, CluSTr and GO, and is performed on the non-redundant complete
proteome sets of SWISS-PROT and TrEMBL entries.
- Major 3-D Structure
Databases:
- PDB,
Protein Data Bank, the worldwide
repository for the processing and distribution of 3-D biological
macromolecular
structure data.
- NDB,
The Nucleic Acid Database
Project assembles and distributes structural information about
nucleic
acids.
- CCDC,
Cambridge Crystallographic
Data Centre contains information on crystal structures for over 230,000
organic and metal organic compounds, intermolecular interactions,
protein-ligand
interactions and docking, etc.
- Major
Bibliographic Databases:
- PubMed,
a service
of the National Library of Medicine, provides access to over 11 million
citations from MEDLINE and additional life science journals. PubMed
includes
links to many sites providing full text articles and other related
resources.
- NLM
Gateway allows users
to search in multiple retrieval systems at the U.S. National Library of
Medicine. The current Gateway searches MEDLINE/PubMed, OLDMEDLINE,
LOCATORplus,
AIDS Meetings, Health Services Research Meetings, HSRProj, MEDLINEplus
and DIRLINE.
- Medline
Plus This service provides access to extensive information about
specific
diseases and conditions and also has links to consumer health
information
from the National Institutes of Health, dictionaries, lists of
hospitals
and physicians, health information in Spanish and other languages, and
clinical trials.
- Ingenta
coverage of Biological Literature is somewhat wider than that of
PubMed.
In addition Ingenta covers Natural Sciences, Mathematics, and
Humanities.
- SeqAnalRef
(ExPASy) is
a bibliographic reference data bank relative to papers dealing with
sequence
analysis. This data bank stores the references of articles from the
expanding
field of mathematical and computer analysis of biomolecular sequences.
- Cancerlit is
a bibliographic database that contains more than 1.5 million citations
and abstracts from over 4,000 different sources including biomedical
journals,
proceedings, books, reports, and doctoral theses. Produced by the
National
Cancer Institute's International Cancer Information Center.
- Major
Genome Projects
and Databases:
- Entrez
-Genome at NCBI. The whole genomes of over 800 organisms can be
found
in Entrez Genomes. The genomes represent both completely sequenced
organisms
and those for which sequencing is in progress.
- Completed
Genomes at the EBI
- Ensembl
Human
Genome Server
is a joint project between EMBL - EBI and the Sanger Centre. Ensembl
provides
identification of 90% of known human genes in the genome sequence,
prediction
of 10,000 additional genes, all with supporting evidence. With Ensembl
you can search the DNA from the human genome, browse chromosomes, find
genes, SNPs and mouse genome matches, look for proteins and
protein
families.
- GDB, The
Genome Database, an international
collaboration in support of the Human Genome Project. Hosted by
The Hospital for Sick Children, Toronto, Ontario Canada.
- JGI
Genome Portal links to genome databases of the following organisms:
AOM (Anaerobic methane oxidation) Microbial Community, Chlamydomonas
reinhardtii, Ciona intestinalis
(sea squirt), Fugu
rubripes (pufferfish), Homo sapiens
(chromosomes 5 and 19), Phanerochaete
chrysosporium (white rot fungus), Phytophthora
ramorum,Phytophthora sojae (soybean rust), Populus trichocarpa
(poplar), Thalassiosira
pseudonana, Xenopus tropicalis (frog).
- KEGG
Complete
Genomes
- PIR
Complete
Genomes List of species which have complete protein sequences in
PIR
at MIPS
- Sanger
Centre
is a genome research
centre founded by the Wellcome Trust and the Medical Research Council,
UK. Projects include large scale sequencing and analysis of the
following
genomes: H. sapiens, C.elegans, Mouse, S.pombe, Zebrafish,
Microbes,
Protozoans.
- TIGR
Databases are a collection
of curated databases containing DNA and protein sequence, gene
expression,
cellular role, protein family, and taxonomic data for microbes, plants
and humans. Eukaryotes include: H. sapiens, Arabidopsis,
rice,
potato, parasites (Trypanosoma brucei, Trypanosoma cruzi,
Plasmodium
falciparum, Plasmodium yoelii, and Entamoeba histolytica), Cryptococcus
neoformans, Aspergillus fumigatus, etc.
- WU GSC
Washington University
Genome Sequencing Center. Projects include sequencing of the Human
Genome and that of the following model organizms: S. cerevisiae,
C. elegans, C. briggsae, A. thaliana, and several bacterial
genomes. EST projects: Human, Mouse, Zebrafish, Toxoplasma,
Soybean,
Xenopus, Parasitic Nematodes, Moss, Eimeria, Pancreas, Elegans,
Leishmania
|
|