Can Big Data Feed the World?

In 1968, Paul Ehrlich began his bestselling book, The Population Bomb, by asserting that a rapidly expanding population was dooming the human race to mass starvation. “The battle to feed all of humanity is over,” he wrote. “In the 1970s hundreds of millions of people will starve to death in spite of any crash programs embarked upon now. At this late date, nothing can prevent a substantial increase in the world death rate.”

Ehrlich was right about the population explosion. Between 1960 and 2000, the number of people on the planet doubled, from three billion to six billion. Yet, while hunger persisted in much of the world, Ehrlich’s dire prediction of mass starvation did not come true. On a global scale, food production kept pace with population growth.

What Ehrlich failed to predict was the Green Revolution, a term first used in 1968, the same year that The Population Bomb was published. During the next 35 years, a combination of higher-yielding crops, irrigation, pesticides, herbicides and synthetic fertilizer increased worldwide grain production by 250%. But the dramatic increase in food came at a huge environmental cost. Extensive use of fertilizers and pesticides polluted waterways and killed beneficial insects; irrigation practices reduced groundwater reserves; and monoculture farming led to a wide range of problems, including a growing dependence on even more pesticide and fertilizer.

Today, with the benefits of the Green Revolution winding down and the negative impacts increasing, feeding the world has once again become a daunting challenge. There are already nearly a billion people without enough to eat and the United Nations Department of Economic and Social Affairs predicts that by 2050 there will be 1.7 billion more people to feed. That’s a 24% increase in world population during a period when increasing droughts and floods are likely to cause dwindling supplies of arable land and potable water.

The question now is, can science and technology achieve the same kind of results in the next 35 years as they did during the Green Revolution, this time without the widespread environmental damage? Judging by the information generated by “Sustainability in the Age of Big Data,” a conference sponsored by Xerox and the Initiative for Global Environmental Leadership (IGEL), the future looks promising. A great deal of work still needs to be done, but Big Data now offers the hope that the Green Revolution will be succeeded by a more sustainable Evergreen Revolution.

Big Data Speeds Plant Breeding

Plants have been cross-breeding on their own for eons, and people have been manipulating the process to achieve desired traits for centuries. The strawberry as we know it began to take shape several hundred years ago when a Chilean variety, which produced big bland fruit, and a U.S. variety, which produced small, intensely flavored fruit, were planted next to each other in a French garden. The offspring of those two plants was a hybrid with big, red, flavorful strawberries.

Since then, hybrid farmers, and more recently scientists, have been improving strawberries through breeding programs. They have intentionally cross-pollinated varieties with desirable traits, selected the new hybrids with the most promise and then repeated the process down through the generations. The results are all around us: plants well suited to growing conditions in specific regions, yielding great quantities of fruit that can be harvested early, shipped long distances and arrive ripe, undamaged and ready for the grocery shelf.

The traditional process involved in creating such successful varieties is costly, labor intensive and can take as long as 10 years or more. What Big Data does is speed things up. Now, explains James C. Carrington, president of the Donald Danforth Plant Science Center, an independent, nonprofit research center, marker-assisted breeding techniques allow scientists to determine in the lab, within a matter of days, “which of the progeny of the cross, which of the seeds, contain the combination of traits that you want. In other words, you can get an analytical readout from a machine, and you don’t have to wait until you test and grow up all of the seed.” Naturally, this shortens development time dramatically.

The traditional process involved in creating such successful varieties is costly, labor intensive and can take as long as 10 years or more.

The work of marker-assisted breeding does not involve the genetic manipulation used to produce genetically modified plants or GMOs, which is why a number of environmental activists support it. Instead, it starts with sequencing the genome of a particular crop. “Ten years ago, if you sequenced a genome, it would have cost you $100 million. Nowadays, it’s almost for nothing,” Carrington says. Sequencing the billions of nucleotides in a plant’s genome is now “frankly trivial to do.”

The first significant challenge is to identify all the genetic variations that exist within a crop species worldwide by sequencing and comparing the genomes of hundreds of thousands of different varieties, both wild and domestic. The next task is to figure out which genetic variations control or influence which traits. To accomplish this, scientists grow seedlings both in controlled laboratory conditions and in the field, using an automated process to photograph them on a regular basis. These images allow scientists to determine each plant’s observable traits, or phenotype.

At this point, that analytical work truly enters the realm of Big Data. The phenotype information, images representing roughly 100 terabytes of unstructured data, is integrated with the vast database of sequenced genetic information that has been compiled from all the existing varieties. Analytical programs sift through this massive data set to determine which minute differences among the billions of nucleotides in each genome are associated with which traits. Complicating this already daunting task is the fact that many traits are governed by the interaction of multiple genes.

That’s not the end of the process, because all of the work done to this point has been limited to just one particular set of environments. And since plant genes interact with the environment — the same seed will produce a different plant in Missouri than it will in Maine — scientists have to determine and incorporate in their data how genomes perform across a wide range of environments. Only then can a breeder know which exact hybrid is best for a particular area.

In regions with robust economies, the considerable upfront costs of this Big Data approach are largely borne by seed companies, which profit by developing and selling hybrids that have traits making them ideal for given environments and uses. Nature supports this business model with a phenomenon known as hybrid vigor.

“Hybrid vigor is an almost universal, still very mysterious characteristic,” Carrington explains. Purebred plants (or animals) that have been extensively bred to produce virtually identical offspring are far less robust than less-refined varieties. Cross two purebreds, however, and you get offspring that embody the traits of both parents and are every bit as vigorous as the original hybrids with which the whole process started. And all the seeds from that first generation will produce identical plants.

“But if you cross that same big, strong, vigorous plant with itself, self-fertilize it, and collect the seeds from that next generation, the plants will be highly, highly variable,” explains Carrington. That’s why the seed companies advise farmers not to plant the seeds that come from the first generation, because if they do, the plants they get will be extremely variable. The companies do not, as many mistakenly believe, render the seeds sterile or forbid farmers from planting them. Farmers take the companies’ advice about not planting the seeds because they know that the next generation will be nothing like the crop they originally planted.

So the farmers come back to the seed companies each season to buy the carefully grown first-generation offspring of the same two purebreds. Carrington describes this as a win-win business proposition: The seed companies profit from continuing sales and farmers profit from seeds that produce row after row of vigorous crops, all with the traits they paid for.

Data-driven Planting, Better Results

When it comes to planting the hybrid seeds they buy, farmers are generally guided by experience — their own and that of past generations who have worked the same land. The decisions they make necessarily rely on averages: on average, that variety of corn does best in that field when the seeds are planted this far apart. “But when you look at an individual field, it’s really not about averages,” says Ted Crosbie, distinguished science fellow at Monsanto. “When we average things out, we average out a lot of value.”

“When we average things out, we average out a lot of value.” –Ted Crosbie

The goal of Monsanto’s Integrated Farming Systems (IFS) research platform is to enable farmers to move beyond averages. FieldScripts, which will be launched commercially in Illinois, Indiana, Iowa and Minnesota this year, is the first initiative to emerge from the program. It uses Big Data to determine which hybrids are best suited for particular fields and to provide a prescription for variable rate planting that is designed to maximize yield.

Using inputs from the farmer that include detailed information about each field — boundaries, yield data, soil test results — and information about Monsanto hybrids, FieldScripts delivers the variable rate seeding plan directly to the FieldView iPad app, which the farmer connects to the monitor in the planter cab so that the machine can execute the script. Over time, the farmer will capture the results of each harvest and feed this additional information back into the system.

Monsanto is investing heavily in its IFS program. It paid $250 million for Precision Planting, a company that develops software, including the FieldView technology, as well as hardware and after-market production equipment to help farmers plant seeds at depths and spacing that vary almost by the square meter.

More recently, in 2013 Monsanto purchased Climate Corporation for $930 million. The company brings Monsanto a proprietary technology platform that combines hyper-local weather monitoring, agronomic data modeling and high-resolution weather simulations. In early 2014, Climate Corporation itself purchased the soil analysis business line of Solum Inc., another agriculture technology company.

Manufacturing giant John Deere and DuPont Pioneer, the seed division of the multinational chemical company, have joined forces to compete head to head with Monsanto’s Integrated Farming System. They will be rolling out their Field360 products in 2014, as well.

Writing in the trade publication Modern Farmer, Erin Biba compares the Big Data battle between Monsanto and John Deere to the contest between Google and Apple. “John Deere is Apple, selling physical technology with their proprietary software built-in, while Monsanto is Google, selling software-as-a-service that farmers can download to their tablets and computer-controlled tractors,” she writes.

Hyper-local Weather Forecasts

IBM, too, has entered the Big Data agriculture arena in a big way. At the heart of its efforts is Deep Thunder, a nickname echoing IBM’s chess-playing software Deep Blue. Deep Thunder incorporates a wide variety of inputs to generate hyper-local weather forecasts. “We’re data scavengers,” says Lloyd Treinish, an IBM distinguished engineer and the chief scientist for Deep Thunder. “We’ll use whatever data we can get our hands on, from public and private sources, that will drive forecasts.”

In Flint River, Georgia, where a pilot project is now underway, Deep Thunder “ingests in real time” atmospheric data from National Oceanic and Atmospheric Administration (NOAA); terrestrial data (including topography, soil type, land use, vegetation and water temperature) gathered by sensors aboard NASA spacecraft; as well as additional data from the U.S. Geological Survey. The program also pulls in data from private weather stations, which can outnumber government stations 10 to one in some areas.

The program uses all this data to forecast the weather every 10 minutes, for each 1.5 square kilometer of farmland over an area of approximately 40,000 square kilometers (standard weather forecasts, by contrast, predict the weather in one-hour increments, for areas no smaller than 12 square kilometers). The 72-hour forecast is updated every 12 hours and made available to smart-phone and tablet toting farmers via a web portal that employs high-definition animations as well as charts and daily summaries.

The Flint River project, which also involves the University of Georgia and the Flint River Partnership (a coalition of the Flint River Soil and Water Conservation District, USDA’s Natural Resources Conservation Service and The Nature Conservancy), is focusing not just on increasing farmers’ yields but also on conserving water.

Ultimately this means not just hyper-local predictions of precipitation (where it will rain, when, the volume and intensity) but also how moisture from the atmosphere will affect the level of moisture in the soil, which is ultimately what impacts the growth of crops. To make such soil forecasts possible, the University of Georgia is deploying a network of soil sensors that will ultimately feed their data into Deep Thunder as well.

Critical to the use of these forecasts are GPS-driven variable rate irrigation systems, common in the area (and throughout much of the developed world). These sophisticated systems have to be programmed a day in advance, so farmers can save a lot of wasted water (tens of millions of gallons per farm per year) by knowing which of their fields will need watering tomorrow and which will not.

Saving water is important to Flint River agriculture, where porous soil makes drought a constant threat. But the need to reduce agricultural water use is essential worldwide. Data from gravity-sensing Grace satellites “shows us is that groundwater depletion is happening at a very rapid rate in almost all of the major aquifers in the arid and semi-arid parts of the world,” says hydrologist James Famiglietti, who directs the University of California Center for Hydrologic Modeling. Since farming uses 70% of the world’s fresh water, and agricultural runoff also accounts for much of the world’s water pollution, finding ways to reduce agricultural water use, while increasing production, is critical.

Following Food

Tracking food from farm to table prevents illness, reduces waste and increases profits. As the global supply chain for food stretches further and further, the importance of tracking and monitoring agricultural products continues to grow, as does the use of sensor technology.

A sensor imbedded in a pallet of tomatoes can monitor the temperature hourly while the produce is in transit, “so when the customer receives the tomatoes, they can download the data from the sensor and know whether or not the produce has been maintained at optimal temperatures the whole way,” explains Paul Chang, an IBM global leader for the smarter supply chain. This kind of data is not typically recorded rather than transmitted, so it is primarily valuable retroactively. For example, a retailer might decide to discount tomatoes if their shelf life has been reduced by problems in transit.

A sensor imbedded in a pallet of tomatoes can monitor the temperature hourly while the produce is in transit.

Analytics can provide more proactive information. If the system knows how long each of the steps in the tomatoes’ supply chain is supposed to take, it can send alerts if there are any significant deviations. For instance, if the pallet is taking too long to move from the refrigerated warehouse (where it is scanned) to the refrigerated truck (where it is scanned again), an alert can be sent, and a dispatcher can take action to prevent any spoilage. “So it’s the combination of sensors that are monitoring the condition and some kind of analytics that’s associated with the business process that gives you the most robust solution,” says Chang.

IBM has partnered with other companies to create such systems in many areas of the world, from Thailand to Norway. The pork-tracking system established in China begins with the bar coding of each individual pig at the slaughterhouse, where cameras monitor every step in the production process. Temperature and humidity are monitored by GPS-enabled sensors, triggering alerts anywhere along the distribution route where corrective action may be needed. And point-of-sale scanning at the retailer allows prompt, efficient action if a problem or recall occurs even after the pork has been sold.

Such systems help prevent food-borne illnesses, which sicken an estimated 76 million people in the U.S. every year (leading to 5,000 deaths); reduce waste at virtually every step in the supply chain (40% of all food in developed markets is thrown away, including 10% to 15% of all produce); and often increase customer satisfaction. A 2009 IBM study showed that 76% of consumers would like more information about the origin of their food.

The Big Data Gap

Big Data is needed most by farmers who can least afford it.Major corporations are investing heavily in Big Data for agriculture, and start-ups in the space are proliferating, supported by the increasing availability of venture capital. But all this market-driven activity does little to help poor, developing areas such as sub-Saharan Africa, where productivity is very low by U.S. standards and where virtually all of the world’s population growth is predicted to take place in the coming decades.

Increased productivity in the developed world can help to feed people in these areas, “but it is impractical to think the U.S. and South America are going to produce the food for everyone,” says Carrington. “It is much more realistic to work on the assumption that Africa, Asia, Central America and parts of South America are going to have to do better in the future and be more productive. This is where science and technology really have to special focus. Ironically, it’s where the least amount of investment is.”

But numerous groups are working to bring relevant technology to these areas. With primary funding from the Bill and Melinda Gates Foundation, scientists at the Danforth Center, for example, are developing virus-resistant, highly nutritious varieties of cassava, an orphan crop that is important in much of Africa but has no commercial appeal to seed companies (it doesn’t even produce seeds).

New agricultural cell phone and tablet applications are also being developed for Africa, where there are now more mobile phones than in the U.S. or Europe and where there has been a 20-fold increase in Internet bandwidth since 2008. These apps are connecting farmers to financing, market information, agricultural expertise and sometimes simply to each other so they can share information and best practices.

The applications vary depending on local needs. According to “Unlocking Africa’s Agricultural Potential,” a report by the World Bank, “Information and communication technology (ICT) applications for agriculture and rural development have generally not followed any generic blueprint. They are usually designed locally and for specific target markets, with localized content specific to languages, crop types, and farming methods.”

The common thread running through all these efforts is the need to make vital data of all types available without charge to everyone. In a recent speech at the G-8 Conference on Open Data for Agriculture, Bill Gates urged incentives for scientists and organizations to share data and the development of common data standards, easing the exchange of information between organizations and individuals.

“To reap the benefits of Big Data, it’s important to ensure this is publicly available and shared with research and development partners,” Gates said. “Only then will we be able to create a rich data ecosystem to support the knowledge-intensive and location-specific enterprise of agriculture. This is especially important in developing countries.”

Citing Knowledge@Wharton


For Personal use:

Please use the following citations to quote for personal use:


"Can Big Data Feed the World?." Knowledge@Wharton. The Wharton School, University of Pennsylvania, 12 September, 2014. Web. 27 May, 2016 <>


Can Big Data Feed the World?. Knowledge@Wharton (2014, September 12). Retrieved from


"Can Big Data Feed the World?" Knowledge@Wharton, September 12, 2014,
accessed May 27, 2016.

For Educational/Business use:

Please contact us for repurposing articles, podcasts, or videos using our content licensing contact form.