A stream of surprises from the Atlantic cod genome
Close to ten years ago, researchers from the Norwegian Institute for Water Research (NIVA) caught "Calvin the Cod" and hauled him out of the cold Arctic waters during an oceanographic expedition to the Barents Sea and the northern coast of Norway and the Lofoten archipelago.
From Lofoten, Calvin’s journey went to NIVA’s research station close to Norway’s capital Oslo. The story could have ended there, but Calvin’s destiny took a sudden twist when researchers from the University of Oslo found him swimming in a tank, killed him with a blow to the head, and took samples from his body home to their big freezer at the Department of Biosciences.
An ordinary cod would have been eaten after being placed in the freezer, but Calvin the Cod instead started a new career. Calvin was in fact a healthy and characteristic representative of the population of skrei, which is the Norwegian term for cod that migrate between feeding grounds in the Barents Sea and spawning areas along the Norwegian coast. Thus, Calvin was chosen for the honorable task of donating his body parts and genes to science.
In 2008, researchers at the University of Oslo initiated a unique project: They wanted to map the genome of a fish of great economic importance, namely Atlantic cod. This project has later become a huge success, and the cod genome researchers have delivered a stream of surprises, based on their studies of Calvin’s genome.
Just to name the two most important discoveries: It made a huge impact when they discovered the strange and unique immune system of cod in 2011, and in 2016 they found a sex gene that can make fish farming more profitable.
New discovery in the cod genome
PhD candidate Ole Kristian Tørresen and senior engineer Lex Nederbragt at the Institute of Biosciences and Centre for Ecological and Evolutionary Synthesis (CEES) have now conducted a more detailed analysis than has been possible ever before, and once again the cod genome has come up with a surprise.
“The new achievement this time is that we have combined data from three different techniques for DNA sequencing. Thus, we managed to map the cod genome in much more detail than what was previously possible. At the same time, we found the reason why it has been so difficult to map the genome in detail earlier. The reason is that this genome contains an extraordinary amount of so-called short tandem repeats, meaning that short sequences of DNA base molecules are repeated many times in succession”, recounts Ole Kristian Tørresen.
The basis for this observation is that the genomes of all organisms are written in an "alphabet" that consists of only four nucleobase molecules: adenine (A), thymine (T), guanine (G) and cytosine (C).
93 percent of the genome is mapped
“We are talking about short tandem repeats when we find for example the combination "AC" several times in succession in the DNA sequences. But also when the DNA sequence reads for example only "A" several times in succession, or perhaps "CGA", it is classified as a tandem repeat. The bottom line is that it has been very difficult to understand what the contiguous DNA sequences really look like, when these tandem repeats appear to make up an enormous amount of exactly similar pieces in an enormous jigsaw puzzle”, explains Lex Nederbragt.
The Atlantic cod genome consists of approximately 700 million pairs of DNA bases (remember that the DNA molecule is a double helix with matching base pairs on each strand). With the recent study, the researchers have now surveyed a total of 93 per cent of the genome and managed to assign the sequences to the cod’s 23 chromosomes. Thus, they have managed to fill in many of the "holes" that remained after earlier surveys.
DNA mapping as a huge jigsaw puzzle
With the best technology available today, it is possible to map contiguous DNA sequences with a length of up to 10,000 base pairs. But that is a long stretch away from a complete cod chromosome, which contains approximately 25 million base pairs. The researchers must therefore divide the DNA strands into pieces that are then read separately.
Recommended: A threat to cod is a threat to humans
DNA sequencing and genome mapping can thus be compared to dividing a very long text into lots of small pieces that are read separately – letter by letter, or more exactly: nucleobase per nucleobase. The next step is to create a digitized copy of the whole text, but without knowing exactly where each piece came from.
The result is that researchers are left with a large amount of fragments that they must try to put together in the digitized copy – much like a jigsaw puzzle with an extreme number of pieces. Lex Nederbragt explains that the jigsaw puzzle has an added problem because of the tandem repeats that look exactly the same, similar to a “normal” puzzle with a lot of blue skies. But even if the pieces look exactly the same, the researchers must find out exactly where they came from.
Combining three methods
the cod genome
- The genome is the complete genetic information that is encoded in an organism's DNA (or, in some viruses, in their RNA).
- The cod genome consists of approximately 700 million base pairs (the human genome has 3.2 billion base pairs).
- The cod genome is distributed over 23 chromosomes pairs (as in humans).
Tørresen, Nederbragt and their collaborators have solved this problem by using three different sequencing technologies in parallel, and then combining their results. Moreover, they have re-used large amounts of data from previous genomic studies and analysed them again with new and better methods.
“The oldest method, called 454 sequencing, can identify fragments of up to 700 base pairs. But this method falls short if the fragments contain several similar base molecules after another. If the fragment for example contains the sequence AAAAA, the method is unable to always accurately determine how many A's the sequence consists of”, explains Tørresen.
“We have also used a newer method called Illumina sequencing. This method can only generate fragments with a length of up to 100 base pairs, but the mapping is more accurate than with the 454 method. In addition, we supplemented with a third method called PacBio sequencing, which at the time of our experiments could identify DNA sequences with up to 2000-3000 base pairs. This helped us to see the big picture, even if the accuracy is a good deal poorer than with Illumina sequencing”, explains Nederbragt.
“The combination of three different methods allowed for the more accurate results in our study. It is perhaps slightly annoying that the technologies have evolved in the short period since the start of our project, so that we could have had even better results if we had started today. But we can’t just sit around doing nothing while we are waiting for the technology to develop further”, Nederbragt comments.
What is the significance of the copies?
The scientists would of course like to know what the large amount of short tandem repeats means for cod as a species.
“Our suggestion is that the phenomenon has an evolutionary significance, because many of the tandem repeats are found inside DNA sequences that encode the structure of proteins. They also have a tendency to vary in length between generations. This might mean that the repeated sequences can give rise to many different varieties of the same proteins. We imagine that such a variety of proteins can make it easier for cod as a species to adapt to a new environment, but we can’t say anything definite about this yet”, comments Tørresen.
However, it is definitely possible to determine such things. Now that the scientists have mapped the cod genome in great detail, they can start identifying the genes that contain the code for specific proteins. Researchers at the Institute of Biosciences are just beginning their investigations into this area.
“We have already found a fish species that has even more tandem repeats than cod, namely the related haddock. Both cod (Latin name: Gadus morhua) and haddock (Melanogrammus aeglefinus) are members of the cod family (Gadidae). This may indicate that the whole group has an increased proportion of such repetitions”, adds Nederbragt.
Ole Kristian Tørresen and Lex Nederbragt emphasize that every study of the cod genome so far have been performed on samples from the same individual, namely Calvin. But in the newly established Aqua Genome Project, researchers at the University of Oslo and the Norwegian University of Life Sciences (NMBU) are going to study the genomes of large numbers of cod and salmon. They can for example examine how different populations – such as skrei, Norwegian coastal cod and Baltic Sea cod – compare. NMBU researchers are concentrating their efforts on salmon, while researchers at the University of Oslo are continuing their in-depth studies of the cod genome.
«A never ending story»
“This new version of the cod genome is a huge improvement and will have implications for the future fisheries management, by being a reference for the sequencing of other stocks and individuals. At the same time, the improved genome comes with new potential for developing cod stocks that are more suitable for aquaculture in terms of growth, maturation and disease resistance”, says professor Kjetill S. Jakobsen at the Institute of Biosciences and CEES. He has led studies of the Atlantic cod genome at the University of Oslo from the start.
“We are very proud of this huge step forward in the understanding of the cod genome. But the assembly of such large genomes is a "never ending story", and there is still room for improvement. I can guarantee that there will be a third version sooner or later, and it will be even better than this one but still not perfect, adds senior adviser Sissel Jentoft at CEES.
PhD candidate Ole Kristian Tørresen, Department of Biosciences and CEES. Twitter: @tierhon
Senior engineer Lex Nederbragt, Department of Biosciences and CEES. Twitter: @lexnederbragt
Scientific publications and more information:
Ole K. Tørresen, Alexander J. Nederbragt et al.: An improved genome assembly uncovers prolific tandem repeats in Atlantic cod. BMC Genomics 2017, 18:95
Bastian Staar et al.: The genome sequence of Atlantic cod reveals a unique immune system. Nature 477, 207–210 (08 September 2011).
Ole K. Tørresen et al.: The new era of genome sequencing using high-throughput sequencing technology: generation of the first version of the Atlantic cod genome. Chapter in book Genomics in Aquaculture, December 2016
The Atlantic cod genome: CEES Genome Browsers