An article published recently in the Open-Access journal GigaScience provides data that effectively triples the number of plant species with available genome data. This mammoth amount of work comes on the back of the growing efforts of the scientific community to sequence more plant genomes to aid in understanding their complex evolution and provide practical information for improving agricultural yield. To date, around 350 land plant genomes have been sequenced. The desire for more plant genome sequences has recently been highlighted with the announcement of the 10KP project, which aims to ultimately sequence 10,000 plant genomes to resolve the evolution of all the major branches of the plant tree of life. The work here provides images, raw sequencing data, assembled chloroplast genomes, and preliminary nuclear genome assemblies- all freely available. Effectively this work is a digital representation of an entire botanical garden.
Researchers from the China National GeneBank, BGI, and the Forestry Bureau of Ruili, China have sampled and sequenced 761 samples, representing 689 vascular plant species from 137 families and 49 orders. The plant samples are all from in and around the 500-hectare Botanical Garden in Ruili, a subtropical part of China bordering Myanmar. Being in a biologically rich part of China, the garden is committed to protecting endangered and Chinese-endemic plants. This project is the world’s first scientific and systematic attempt to digitize a whole botanical garden based on genomic as well as voucher specimen information.
In addition to the basic challenge of carrying out DNA sequencing on this number of species, another major task was scaling up the species identification, digitizing images of the specimens, and building a new herbarium for their storage at a new China National GeneBank (CNGB) herbarium in Shenzhen. So far, of the 761 specimens, sequence and chloroplast data has enabled the identification of 257 plants at the species level and 504 at the family level.
To promote more extensive data sharing than just making sequence data available, the researchers are also making the digitized images available and providing access to the herbarium. The Herbarium (HCNGB) serves as a living plant database that records the position of species grown in the Ruili Botanical Garden and monitors the status of each species.
All the digital data generated here (images, raw sequencing data, assembled chloroplast genomes, and preliminary nuclear genome assemblies) are available via the NCBI SRA, GigaScience GigaDB database and China National GeneBank CNSA. Additionally, to enable the data to be searched and genomes and species identification to be updated, metadata is indexed and linked via Datacite and GigaDB. And all resources are released without restriction.
Liu H. et al. Molecular digitization of a botanical garden: high-depth whole genome sequencing of 689 vascular plant species from the Ruili Botanical Garden, GigaScience (2019). DOI: 10.1093/gigascience/giz007
Liu H et al. (2019): Genomic and Imaging Data Supporting the Digitization of Ruili Botanical Garden GigaScience Database. DOI: 10.5524/100502