- Oplysninger
- Spørg Stuwie
Var dette dokument nyttigt?
W2 Ex Genbank-new-answers - 22111
Kursus: Introduction to Bioinformatics (22111)
11 Dokumenter
Studerende delte 11 dokumenter i dette kursus
Universitet: Danmarks Tekniske Universitet
Var dette dokument nyttigt?
2/10/22, 9:59 AM
ExGenbank-new-answers - 22111
https://teaching.healthtech.dtu.dk/22111/index.php/ExGenbank-new-answers
1/4
ExGenbank-new-answers
From 22111
Note: numbers in Part 2 and Part 3 are updated on February 7, 2022.
Part 1
QUESTION 1.1
a) Inspecting the FEATURE table of the entry reveals that two CDS regions are defined; therefore there are
two genes in this entry. As stated on the GenBank hand-out "CDS" is the most stable definition of a protein
coding gene used in the GenBank format - sometimes "gene" will also be present, but CDS is more
commonly used.
b) Columba livia (Rock pigeon / domestic pigeon)
c) The HEADER contain general information about the entry: Organism, publication references, keywords,
accession-ID etc. The FEATURE table contains information that refers to coordinates in the DNA sequence -
for example definition of CDS regions.
QUESTION 1.2
a) Since the FEATURE table has been thrown away, we no longer have the coordinates for the genes. As
such they are "in there" somewhere, but we cannot find them without using external information.
b) The entire "ORIGIN" block (all the DNA sequence) has been converted to FASTA format. The
FEATURE table is discarded. From the HEADER block the definition (title) and accession number is
preserved, the rest is discarded.
QUESTION 1.3
The downloaded file has Unix line endings. Remember from the JEdit exercise that line endings are
indicated by the letters "U", "W" or "M" in the lower right hand corner of the jEdit window.
QUESTION 1.4
a) The "join" statements defines how to extract the coding sequence from the entire length of DNA in the
entry: "join(1104..1192,1306..1510,1614..1742)" is basically a recipe stating to paste together the three
intervals - and we'll get the protein coding part of the gene: the coding exons glued together. The CDS will
always start with a START codon (e.g. ATG) and end with a STOP codon (e.g. TAA).
b) The gene contains three coding exons. Note: from a CDS definition we don't get any information about
UnTranslated Regions (UTR's) that are often found before and after the coding region in the mRNA).
QUESTION 1.5
The first number is the Gene Identifier (taken from the VERSION line in the header). The subsequent
numbers are the positions (coordinates) in the original gene entry (taken from the join line).
Part 2
QUESTION 2.1.1
a) 210,468 hits
b) No. There is e.g. the first hit, M57671.1, "Octodon degus insulin mRNA, complete cds" which is from a
Degu (http://en.wikipedia.org/wiki/Degu), a rat-like carnivore from Chile. In fact, you can see in the right
side of the results page that only 11,216 hits are from human. There is no reason to expect only human
results from GenBank, since it is not a human-centric database.
c) No. There are many hits to complete or partial chromosome sequences which contain a lot of other genes.
An example is JWIN03000075.1, "Camelus dromedarius breed African isolate Drom800 Contig74, whole