A University of Texas at Austin scientist, working with an international research team, has developed the most precise sequence map yet of U.S. cotton and will soon create an even more detailed map for navigating the complex cotton genome.
The finding may help lead to an inexpensive version of American cotton that rivals the quality of luxurious Egyptian cotton and helps develop crops that use less water and fewer pesticides for a cotton that is easier on the skin and easier on the land.
Z. Jeffrey Chen and his collaborators, Tianzhen Zhang and Wangzhen Guo at Nanjing Agricultural University (NAU) in China, describe the new draft genomic sequence in a paper published today in Nature Biotechnology. Chen and another group of scientists also learned earlier this month that they will receive $2.3 million in funding from the National Science Foundation to take the draft road map to completion.
Having a detailed genetic road map of an American crop that has recently fallen on tough times could prove critical to the industry. Cotton is known as the most important renewable fiber crop, and its seeds and oils are used in food for humans and livestock, as well as in fertilizer.
The species of cotton Chen’s team is sequencing is Upland (or American) cotton (Gossypium hirsutum), which makes up 95 percent of all cotton grown. A well-defined map of the Upland cotton’s genome would allow researchers and growers to better understand the interactions between genes that determine the species’ ability to produce better fibers and oils in seeds.
“Only about 20-30 percent of the cells on the cotton seed surface actually produce fibers, and no one knows why,” said Chen, the D.J. Sibley Centennial Professor in Plant Molecular Genetics. “Knowing more about the sequence map could allow for more and better cotton to be produced on every seed or plant.”
Help for a Struggling Industry
The U.S. is a leading exporter of cotton, and Texas grows more cotton than any state, producing nearly 30 percent of the nation’s supply. The crop generates an estimated $2 billion per year for the state’s economy, but droughts and other weather-related changes since 2010 have led to steep losses for Texas cotton farmers, with yields down in some areas by as much as 50 percent.
Within this context, scientists have made several efforts to sequence the cotton genomes. The task is complicated for Upland cotton, a plant formed by two ancestral species 1.5 million years ago and that carries twice as many copies of genes as the presumed parent plants. Sequencing parent species, as previous researchers have done, is relatively easier but offers only a rough approximation of American cotton, much the way that using a map from the 1800s would provide incomplete information for someone trying to navigate a modern city.
David Stelly of Texas A&M, a collaborator with Chen on both the paper and the new grant, noted that the draft cotton genome sequence will be a “game-changer,” launching a new era in cotton research, education and breeding.
“This sequence will accelerate genetic and breeding improvement of cotton production,” said Zhang, director of the Cotton Research Institute at NAU.
Although the new map is accurate, with the new grant, Chen and researchers from Texas A&M University, Clemson University, Alcorn State University and HudsonAlpha Institute for Biotechnology will create a higher resolution or “gold standard” map of the Upland cotton genome. Their novel sequencing method will be able to account for changes that occurred in Upland cotton over more than a million years and could also be applied to other important crops.
Upland allotetraploid cotton comes from two extant species, closely related to today’s G. raimondii and G. arboreum or G. herbacieum. In a diploid species, progeny receives one set of chromosomes from each parent, with two copies (alleles) of each gene. However, in this case the polyploid species received two sets of chromosomes from two parents, leaving the progeny in a state known as tetraploidy.
The interactions between the genes of these two ancestral species are what give modern-day cotton its best qualities. Somehow, these two different lineages come together to create something greater than the sum of its parts.
For instance, one parent species produces cotton fibers with a length only half that of upland cotton. Meanwhile, the other parent makes virtually no cotton fibers at all. Yet their tetraploid offspring creates high quality fibers that are much longer than those of either parent. This is related to selection and domestication for superior fibers in modern cotton crops.
Unfortunately, the tetraploid status of upland cotton, with each gene having four copies instead of two, makes correctly piecing together the genome of cotton more complicated.
In order to overcome this difficult task, Chen and colleagues from institutions in China, Australia, and the US created a draft map using whole-genome shotgun (WGS) sequences, combined with bacterial artificial chromosome (BAC) end sequences, and genotyping-by-sequencing (GBS) data. The draft map, described in Nature Biotechnology is a major step forward for the completion of a reference-grade sequence of the Upland cotton genome.
A Higher-Resolution Map
Although the draft sequence is the most complete one to date, Chen notes that there are still many holes that need to be filled.
“It's like a map from Austin to Dallas, you know Austin is here and Dallas is there, but some cities may be missing,” said Chen. “The next step is to provide a Google-resolution map.”
In the new project, Chen and his fellow researchers will employ a new integrated approach by sequencing individual DNA snippets (stored in bacterial artificial chromosomes) that cover each chromosome from the beginning to the end. This will allow them to separate DNA originated from two ancient species for roughly 85% of the genome.
The remaining missing pieces and gaps will be filled by whole-genome next-generation sequences, as well as much longer sequences that are generated by third-generation sequencing technologies.
“Most crop species are also polyploid,” said Chen, referring to plants whose DNA contains more than two of each chromosome. “Wheat, oats, canola, coffee, bananas, strawberries, sugarcane and basically everything that we eat, that we drink, and that we live on is polyploid.”
The new research from Chen and colleagues was written with contributors in China, Australia and the United States from Novogene Bioinformatics Institute, Clemson University, the U.S. Department of Agriculture, Texas A&M University, the Shanghai Institutes for Biological Sciences, the Commonwealth Scientific and Industrial Research Organization, Mississippi State University and Cotton Incorporated.