References:
http://www.ncbi.nlm.nih.gov/Sitemap/samplerecord.html
http://www.ncbi.nlm.nih.gov/Genbank/GenbankOverview.html

Genbank Background:

GenBank is a publicly available genetic sequence database- funded through the National Institutes of Health (NIH).  In addition, GenBank is part of the International Nucleotide Sequence Database Collaboration, which is comprised of the DNA DataBank of Japan (DDBJ), the European Molecular Biology Laboratory (EMBL), and GenBank at NCBI. These three organizations exchange data on a daily basis.

There are approximately 12,974,000,000 bases in 12,244,000 sequence records as of June 2001.  As an example, you may view the record for a Saccharomyces cerevisiae gene (common baker's yeast). The following brief tutorial will serve as an aid in navigating through their database in locating for a specific gene- as per the assignment.

Usage

1.) Click on the photo below. A new Browser Window should pop-up resembling the photograph.

2.) Search "GenBank" for "AJ297717" by typing it into the search box (highlighted in red above).

3.) Click on Go.

4.) Clicking on the hyperlink "AJ297717" will yield the page similar to the one below- in which
      significant areas have been highlighted for your benefit and defined in the table below. 

 

Key Features Highlighted Above

Sequence Length The number of base pairs with the sequence
Molecule Type The type of molecule that was sequenced
Modification Date Date of last Modification (to the Genbank Entry)

Key Features Highlighted Above (Continued)

Accession Number An identification number unique to each GenBank Submission
Source Organism The complete nomenclature/classification of the test organism
Research Papers Contributing to the Sequence Publications that discuss the genomic data reported. 
Key Features of the Sequence This section may identify (among others):

1.) source- Biological source of the specified span of the sequence.

2.) coding regions (CDS)- sequence of nucleotides that corresponds
                                         with the sequence of amino acids in a protein.

3.) exon- A region of genome that codes for portion of  spliced
               mRNA, rRNA and tRNA; may contain 5'UTR
               (region proceeding the initation codon), coding regions,
               and 3' UTR (region following the stop codon)

4.) intron- A segment of DNA that is transcribed, but removed from
                within the RNA transcript by splicing together the
                sequences (exons) on eitherside of it. 

5.) gene- Within the context of this section, a gene is described as a
                region of biological interest, for which a name has been
                assigned.

6.) mRNA- messenger RNA; includes 5' untranslated region (5'UTR),
                   coding sequences (CDS, exon) and 3'untranslated region
                   (3'UTR)

7.) rRNA- mature ribosomal RNA ; the RNA component of the
                 ribonucleoprotein particle (ribosome) which assembles
                 amino acids into proteins.

8.) tRNA- transfer RNA mediates the translation of a nucleic acid
                 sequence into an amino acid sequence (gene product).

 

Base Composition Identifies the number of A, C, G, and T bases within the sequence.
Genetic Code The base sequence (starting with the 5' end) of the gene investigated.

Last Revised: 7/18/2001 17:10
TJ Adase