New genome alignment tool empowers large-scale studies of vertebrate
evolution
Date:
November 11, 2020
Source:
University of California - Santa Cruz
Summary:
Three new articles present major advances in understanding the
evolution of birds and mammals, made possible by new methods for
comparing the genomes of hundreds of species. Researchers developed
a powerful new genome alignment method that has made the new studies
possible, including the largest genome alignment ever achieved of
more than 600 vertebrate genomes.
FULL STORY ========================================================================== Three papers published November 11 in Nature present major advances in understanding the evolution of birds and mammals, made possible by new
methods for comparing the genomes of hundreds of species.
========================================================================== Comparative genomics uses genomic data to study the evolutionary
relationships among species and to identify DNA sequences with essential functions conserved across many species. This approach requires an
alignment of the genome sequences so that corresponding positions
in different genomes can be compared, but that becomes increasingly
difficult as the number of genomes grows.
Researchers at the UC Santa Cruz Genomics Institute developed a powerful
new genome alignment method that has made the new studies possible,
including the largest genome alignment ever achieved of more than 600 vertebrate genomes. The results provide a detailed view of how species
are related to each other at the genetic level.
"We're literally lining up the DNA sequences to see the corresponding
positions in each genome, so you can look at individual elements of the
genome and see in great detail what has changed and what's stayed the same
over evolutionary time," explained Benedict Paten, associate professor
of biomolecular engineering at UC Santa Cruz and a corresponding author
of two of the new papers.
Identifying DNA sequences that are conserved, remaining unchanged over
millions of years of evolution, enables scientists to pinpoint elements
of the genome that control important functions across a wide range of
species. "It tells you something is important there -- it hasn't changed because it can't -- and now we can see that with higher resolution than
ever before," Paten explained.
The previous generation of alignment tools relied on comparing
everything to a single reference genome, resulting in a problem called "reference bias." Paten and coauthor Glenn Hickey originally developed a reference-free alignment program called Cactus, which was state-of-the-art
at the time, but worked only on a small scale. UCSC graduate student
Joel Armstrong (now at Google) then extended it to create a powerful new program called Progressive Cactus, which can work for hundreds and even thousands of genomes.
"Most previous alignment methods were limited by reference bias, so if
human is the reference, they could tell you a lot about the human genome's relationship to the mouse genome, and a lot about the human genome's relationship to the dog genome -- but not very much about the mouse
genome's relationship to the dog genome," Armstrong explained. "What we've
done with Progressive Cactus is work out how to avoid the reference-bias limitation while remaining efficient enough and accurate enough to handle
the massive scale of today's genome sequencing projects." Armstrong is
a lead author of all three papers, and first author of the paper that
describes Progressive Cactus and presents the results from an alignment
of 605 genomes representing hundreds of millions of years of vertebrate evolution.
This unprecedented alignment combines two smaller alignments, one for
242 placental mammals and another for 363 birds. The other two papers
focus separately on the mammal and bird genome alignments.
This international collaborative effort was coordinated by an organizing
group led by coauthors Guojie Zhang at the University of Copenhagen and
China National GeneBank, Elinor Karlsson at the Broad Institute of Harvard
and MIT, and Paten at UCSC. The genomic data used in these analyses
were generated by two broad consortia: the 10,000 Bird Genomes (B10K)
project for avian genomes and the Zoonomia project for mammalian genomes.
Scientists have been making plans for years to sequence and analyze the
genomes of tens of thousands of animals. Coauthor David Haussler, director
of the UCSC Genomics Institute, helped initiate the Genome 10K project in
2009. Related efforts include the Vertebrate Genome Project and the Earth BioGenome Project, and all of these projects are now gathering steam.
"These are very much forward-looking papers, because the methods we've developed will scale to alignments of thousands of genomes," Paten
said. "As sequencing technology gets cheaper and faster, people are
sequencing hundreds of new species, and this opens up new possibilities
for understanding evolutionary relationships and the genetic underpinnings
of biology. There is a colossal amount of information in these genomes."
========================================================================== Story Source: Materials provided by
University_of_California_-_Santa_Cruz. Original written by Tim
Stephens. Note: Content may be edited for style and length.
========================================================================== Journal Reference:
1. Joel Armstrong, Glenn Hickey, Mark Diekhans, Ian T. Fiddes, Adam M.
Novak, Alden Deran, Qi Fang, Duo Xie, Shaohong Feng, Josefin
Stiller, Diane Genereux, Jeremy Johnson, Voichita Dana Marinescu,
Jessica Alfo"ldi, Robert S. Harris, Kerstin Lindblad-Toh,
David Haussler, Elinor Karlsson, Erich D. Jarvis, Guojie Zhang,
Benedict Paten. Progressive Cactus is a multiple-genome aligner
for the thousand-genome era. Nature, 2020; 587 (7833): 246 DOI:
10.1038/s41586-020-2871-y ==========================================================================
Link to news story:
https://www.sciencedaily.com/releases/2020/11/201111122830.htm
--- up 11 weeks, 2 days, 7 hours, 50 minutes
* Origin: -=> Castle Rock BBS <=- Now Husky HPT Powered! (1337:3/111)