BIOL.ORG
 .

 
 
 DATABASES: SEARCH &  ANALYSIS

ON THIS PAGE:



 
 
Major Databases and Repositories of Biomedical Information

  • Major Databases and Search Portals:
    • NCBI, National Center for Biotechnology Information, a division of the National Library of Medicine (NLM) at the National Institutes of Health (NIH).
    • EBI, European Bioinformatics Institute is a centre for research and services in bioinformatics. The Institute manages databases of biological data including nucleic acid, protein sequences and macromolecular structures.
      • SRS is a data retrieval system that integrates heterogeneous databanks in molecular biology and genome analysis. It currently provides access to over 300 different databanks.
    • GenomeNet is a Japanese network of database and computational services for genome research and related research areas in molecular and cellular biology.
    • ANGIS, Australian National Genomic Information Service
    • ExPASy, Expert Protein Analysis System,  proteomics server of the Swiss Institute of Bioinformatics (SIB). 
    • BCM Search Launcher The Baylor College of Medicine Search Launcher is an on-going project to organize molecular biology-related search and analysis services available on the WWW by function by providing a single point-of-entry for related searches. 
    • SeWeR  SEquence analysis using WEb Resources. SeWeR is an integrated portal to common web-based services in bioinformatics. It is written entirely in JavaScript1.2. Hence it will run only in Netscape 4.0 or higher and Internet Explorer 4.0 or higher.
  • Text Queries of Multiple Databases:
    • STAG  "Search Texts in All over the Genomenet" a metasearch interface from Japan, allowing to perform a simultaneous text query of the following databases: GenBank, EMBL, EPD, RefSeq, SWISS-PROT, PIR, PRF, PDBSTR, PROSITE, PRINTS, COMPOUND, ENZYME, GENES, GENOMES, BRITE, OMIM, LITDB, PDB, TRANSFAC, PMD, AAindex, Pfam
    • euGenes  Genomic Information for Eukaryotic Organisms. The goal of this developing site is to provide a common interface for the major model eukaryotic organism databases, which include Drosophila melanogaster (FlyBase), Homo sapiens (NCBI), Mus musculus (MGD), Arabidopsis thaliana (AtDB), Caenorhabditis elegans (ACeDB), Saccharomyces cerevisiae (SGD & YPD) and Danio rerio (ZFIN).
    • Keynet A Keywords Database for Biosequences Functional Organization. Keynet is a database of Keywords extracted from EMBL and GenBank databases. The Keynet structure is based on biological criteria aimed to assist the user in data searching and to minimize the risk of loss of information.
    • PBIL (Pôle Bio-Informatique Lyonnais) server allows to browse through the following general and specialized sequence databases: GenBank, NBRF, EMBL, SWISS-PROT/TrEMBL, Hovergen, Hobacgen-nucl, Hobacgen-Protein, RTKdb, NRSub and EMGLib.
    • Biocatalog, a directory of the general interest software in Molecular Biology and Genetics

  • Major DNA Databases: Each of the following three databases collects a portion of the total sequence data reported worldwide, and  all new and updated database entries are exchanged between them on a daily basis:
    • GenBank
    • EMBL Nucleotide Sequence Database constitutes Europe's primary nucleotide sequence resource. Main sources for DNA and RNA sequences are direct submissions from individual researchers, genome sequencing  projects and patent applications.
    • DDBJ,  DNA Data Bank of Japan is one of the tree major DNA data depositories, along with NCBI and EBI.
    • ENCODE Project: ENCyclopedia Of DNA Elements. This project aims to identify all functional elements in the human genome sequence.

  • Major RNA Databases: 
    • G~tRNA~db: The Genomic tRNA Database. This genomic tRNA database contains tRNA identifications made by the program tRNAscan-SE on complete or nearly complete genomes.
    • RNA modification database provides a comprehensive listing of posttranscriptionally modified nucleosides from RNA.
    • tRNA sequences and sequences of tRNA genes. This compilation contains 3279 sequences of tRNAs and tRNA genes covering the literature up to December 1996.
      Searches can now be performed using the Utah mirror interface.
    • UTResource collect data and analysis tool for the functional classification of 5' and 3'UTRs of eukaryotic mRNAs.
    • Regulatory noncoding RNAs database. The noncoding RNA (ncRNA) database is intended to provide information on the sequences and functions of transcripts which do not code for proteins, but perform regulatory roles in the cell. The sequences included in the database have been at least partially characterized in terms of their function or expression.
    • Links to various RNA Databases
    • RNA World Website at FLI Jena. This web resource lists Internet links on RNA related topics.

  • Major Protein Databases: 
        • NB: for comprehensive protein links go to page Proteins
    • SWISS-PROT is a curated protein sequence database which strives to provide a high level of annotations (such as the description of the function of a protein, its domains structure, post-translational modifications, variants, etc.), a minimal level of redundancy and high level of integration with other databases
    • PROSITE is a database of protein families and domains. It consists of biologically significant sites, patterns and profiles that help to reliably identify to which known protein family (if any) a new sequence belongs
    • PIR, Protein Information Resource  -- a comprehensive, non-redundant, expertly annotated, fully classified and extensively cross-referenced protein sequence database. The PIR-PSD, iProClass and other PIR auxiliary databases provide an integration of sequences, functional, and structural information to support genomics and proteomics research. 
    • MIPS, Munich Information Centre for Protein Sequences.
      • PEDANT  Protein Extraction, Description, and Analysis Tool
    • OWL is a non-redundant composite of 4 publicly-available primary sources: SWISS-PROT, PIR (1-3), GenBank (translation) and NRL-3D. SWISS-PROT is the highest priority source, all others being compared against it to eliminate identical and trivially-different sequences. The strict redundancy criteria render OWL relatively "small" and hence efficient in similarity searches.
    • EBI Proteome Analysis database provides comprehensive statistical and comparative analyses of the predicted proteomes of fully sequenced organisms. The analysis is compiled using InterPro, CluSTr and GO, and is performed on the non-redundant complete proteome sets of SWISS-PROT and TrEMBL entries.

  • Major 3-D Structure Databases:
    • PDB, Protein Data Bank, the worldwide repository for the processing and distribution of 3-D biological macromolecular structure data.
    • NDB, The Nucleic Acid Database Project  assembles and distributes structural information about nucleic acids.
    • CCDC, Cambridge Crystallographic Data Centre contains information on crystal structures for over 230,000 organic and metal organic compounds, intermolecular interactions, protein-ligand interactions and docking, etc.

  • Major Bibliographic Databases:
    • PubMed, a service of the National Library of Medicine, provides access to over 11 million citations from MEDLINE and additional life science journals. PubMed includes links to many sites providing full text articles and other related resources.
    • NLM Gateway allows users to search in multiple retrieval systems at the U.S. National Library of Medicine. The current Gateway searches MEDLINE/PubMed, OLDMEDLINE, LOCATORplus, AIDS Meetings, Health Services Research Meetings, HSRProj, MEDLINEplus and DIRLINE. 
    • Medline Plus This service provides access to extensive information about specific diseases and conditions and also has links to consumer health information from the National Institutes of Health, dictionaries, lists of hospitals and physicians, health information in Spanish and other languages, and clinical trials.
    • Ingenta coverage of Biological Literature is somewhat wider than that of PubMed. In addition Ingenta covers Natural Sciences, Mathematics, and Humanities.
    • SeqAnalRef (ExPASy) is a bibliographic reference data bank relative to papers dealing with sequence analysis. This data bank stores the references of articles from the expanding field of mathematical and computer analysis of biomolecular sequences.
    • Cancerlit is a bibliographic database that contains more than 1.5 million citations and abstracts from over 4,000 different sources including biomedical journals, proceedings, books, reports, and doctoral theses. Produced by the National Cancer Institute's International Cancer Information Center.

  • Major Genome Projects and Databases: 
    • Entrez -Genome at NCBI. The whole genomes of over 800 organisms can be found in Entrez Genomes. The genomes represent both completely sequenced organisms and those for which sequencing is in progress. 
    • Completed Genomes at the EBI
    • Ensembl Human Genome Server is a joint project between EMBL - EBI and the Sanger Centre. Ensembl provides identification of 90% of known human genes in the genome sequence, prediction of 10,000 additional genes, all with supporting evidence. With Ensembl you can search the DNA from the human genome, browse chromosomes, find genes, SNPs and mouse genome matches, look for proteins and protein families.
    • GDB, The Genome Database, an international collaboration in support of the Human Genome Project. Hosted by The Hospital for Sick Children, Toronto, Ontario Canada.
    • JGI Genome Portal links to genome databases of the following organisms: AOM (Anaerobic methane oxidation) Microbial Community, Chlamydomonas reinhardtii, Ciona intestinalis (sea squirt), Fugu rubripes (pufferfish), Homo sapiens (chromosomes 5 and 19),  Phanerochaete chrysosporium (white rot fungus), Phytophthora ramorum,Phytophthora sojae (soybean rust), Populus trichocarpa (poplar), Thalassiosira pseudonana, Xenopus tropicalis (frog).
    • KEGG Complete Genomes
    • PIR Complete Genomes List of species which have complete protein sequences in PIR at MIPS
    • Sanger Centre is a genome research centre founded by the Wellcome Trust and the Medical Research Council, UK. Projects include  large scale sequencing and analysis of the following genomes: H. sapiens, C.elegans, Mouse, S.pombe, Zebrafish, Microbes, Protozoans.
    • TIGR Databases are a collection of curated databases containing DNA and protein sequence, gene expression, cellular role, protein family, and taxonomic data for microbes, plants and humans. Eukaryotes include: H. sapiens, Arabidopsis, rice, potato, parasites (Trypanosoma brucei, Trypanosoma cruzi, Plasmodium falciparum, Plasmodium yoelii, and Entamoeba histolytica), Cryptococcus neoformans, Aspergillus fumigatus, etc.
    • WU GSC Washington University Genome Sequencing Center. Projects include sequencing of the Human Genome and that of the following model organizms: S. cerevisiae, C. elegans, C. briggsae, A. thaliana, and several bacterial genomes. EST projects: Human, Mouse, Zebrafish, Toxoplasma, Soybean, Xenopus, Parasitic Nematodes, Moss, Eimeria, Pancreas, Elegans, Leishmania


 
 
 
 
Submissions to Databases



 
 
How to Search

Primers, Tutorials and Articles:

Introduction to the Internet:




 

 
 
 
Newsgroups:



 
 
 
 

Copyright © 2000-2006, biol.org. All rights reserved.