Participants at the first Genomic and Open source Breeding Informatics Initiative (GOBII) workshop at the Boyce Thompson Institute (BTI) the week of Nov. 2 attempted to plan a one-size-fits-all solution to handling big data in plant research programs.

Molecular biologists, computational biologists and software developers traveled from breeding centers in the Philippines, India and Mexico, and from Cornell University and the U.S. Department of Agriculture, to decide the best way to store and share the trillions of data points generated in the pursuit of breeding better crops. Ultimately, the GOBII project seeks to create the architecture for a publicly accessible genomics database to accelerate the development of improved crop varieties.

GOBII researchers work with breeding centers associated with the Consultative Group for International Agricultural Research, a consortium that supports agricultural research for global development. The centers work to facilitate crop improvement, with the goal of increasing plant yield, nutritional value and resilience in the face of climate change.

The database will need to be robust enough to handle a monumental amount of data of multiple types, while also being user-friendly so that plant breeders can efficiently make use of the information – a task equivalent to “finding a shirt that fits everyone,” said Kevin Palis, a software developer at the International Rice Research Institute (IRRI) in Los Baños, Philippines.

Breeding centers may sequence tens of thousands of varieties of a single crop to create a catalog of millions of genetic markers for different traits like disease-resistance or heat tolerance. The mountains of data can be used for a plant-breeding strategy called genomic selection, which uses statistical modeling to predict how a new plant variety will perform before being tested in the field. But to use these markers to make better, faster choices, breeders need tools to access and analyze the information. The GOBII project hopes to bridge the gap between plant breeders and available genomic resources to yield better crops, especially in developing countries.

“There’s so much information that one can store, and all the centers have overlapping needs, so the goal is to come up with the core requirements that are going to satisfy all the centers,” said Yaw Nti-Addae, GOBII’s lead software developer. Nti-Addae said that the four-day workshop was successful in bringing the interested parties face to face and in planning out a roadmap for the project.

In April 2015 the group received $18.5 million from the Bill & Melinda Gates Foundation through Cornell to create a breeding database for five major staple crops – wheat, rice, maize, sorghum and chickpea – but ultimately, they hope to develop a system that will work for any crop.

Previously, researchers working on a single crop have maintained their own data sets, using a variety of platforms, formats and terminology, which are not easily shared. IRRI has developed the International Rice Information System, but plenty of data is sitting in individual spreadsheets.

“We don’t have [a database] set up yet and we don’t have that much capability to develop something,” said Victor Jun Ulat, a bioinformatician at the International Maize and Wheat Improvement Center in Texcoco, Mexico.

BTI’s Lukas Mueller, associate professor, is a collaborator on the project. His lab has developed CassavaBase, a database of genomic data and physical traits from thousands of cassava varieties. Peter Bradbury, a USDA computational biologist who works on TASSEL, a software program that analyzes sequence data to find markers associated with plant traits, also attended the workshop.

“The plant people moved into the big data realm,” said Ramil Mauleon, a bioinformatics specialist at IRRI, “and now we have to find a way to get a handle on it.”

Patricia Waldron is the staff science writer for the Boyce Thompson Institute.