Following a period of disillusionment over the unrealized promise of big data, companies are starting to make it work, says Gary Survis, a Wharton lecturer and senior fellow at Wharton’s Initiative for Global Environmental Leadership. At the same time, since 90% of all data ever created has appeared in just the last two years, big data is also breaking the systems that hold and measure it. From seeds that can divulge how well they are growing to jet engines that collect megadata for fuel saving, some returns are coming in. In this Knowledge at Wharton video interview, Survis discusses just what constitutes big data and what it means for business today.
An edited transcript of the conversation follows.
Knowledge at Wharton: What is big data?
Gary Survis: I find that when people ask about big data and try to figure out why it is so much more than just a lot of data, the answer often is [to take] a look at how quickly we are generating data today. And it is not just traditional sources. We have gotten so much better at using devices that can capture data – whether it be cameras, the Internet or what your phone is transmitting at any given time – and all of this data comes together.
One of the things that I think is interesting is you find that when you look and you say that 90% of all of the data has been created within the last two years, you start to recognize that this is an accelerating amount of data — not just a stagnant amount, but accelerating. And that is why when people talk about the differences in big data you have to recognize it is breaking traditional relational database systems. It is breaking the electronic data warehouse, which was our store of these data. It cannot handle — it was never designed to handle this kind of data.
One of the things that is most fascinating about big data is that people do not understand what it really is. And I will tell you that it is absolutely not a “lot of data,” because there are many places in our society where we have accumulated a lot of data. But that does not really qualify as big data. There are many definitions, but I will say that the key definition that most people go to is what they call the three Vs — volume, velocity and variety.
“Ninety percent of all of the data has been created has been created within the last two years.”
We start with volume. There has to be a lot of data and it has got to be breaking the systems that traditionally you would use in your data warehouses and your servers that you might use to store your data.
Then we talk a little bit about velocity and that is how quickly is this data coming in? In today’s society, where there is data being accumulated everywhere, that data from the web, from machines is being streamed into many different databases and that is the velocity piece. So it is coming in quickly.
But the real catch here is variety. And I do not want to get technical in terms of the differences between the different kinds of data, but at a high level we talk about structured data and unstructured data. Structured data is what you are traditionally used to seeing – rows and tables of data that we will always have. Our biggest the source by the way is traditional spreadsheets. There are lots of columnar data.
But the thing that is really causing traditional systems to break is this unstructured data. This is the data that comes from social media, from Twitter feeds, from all of the email that goes through our systems, the rich data of sound and video and images. How do you structure those into our traditional systems?
It is when you take all those pieces together and try to understand what big data is — it is the combination of those three Vs that really defines it.
Knowledge at Wharton: How is big data changing our lives?
Survis: When people ask how big data is going to be changing my life, the first answer I want to give is, if you thought the Internet was a big deal, you haven’t seen anything yet, because big data is starting to influence and impact our lives in many ways. And I am not talking about the NSA stuff. I am talking about all the ways, every day, that you are starting to see big data having an impact on your life. If you were using a navigation system that is on your smart phone, in all likelihood that is all big data in the background there. It is amassing data.
Now why do you think a company like Google made a choice to purchase a little company called Waze? Well, they made that choice because they saw the connection here. The idea of taking every person driving around in your area – their speed, their communications, what information they are passing around, the fact that they saw a police officer, and the fact that they saw a pothole — all of this information is being amassed and brought together in brand new technology that allows you to process this data in ways that you cannot even imagine.
What does it mean when we start to take advantage of this type of data? An example that I think of many times is the jet engine. An airplane in and of itself on a given day is going to make multiple stops. It is going to be landing and taking off. And every piece of that flight is captured by that jet engine. And that is
called machine data, and that is data that machines are capturing. And they capture it via what are onboard computers, but also by sensors and all kinds of other information. So if you can imagine, a typical flight is going to capture about 240 megabytes of data.
“If you thought the Internet was a big deal, you haven’t seen anything yet.”
Now take that one flight – multiply it by a given day – so now that plane maybe made five flights. Let’s call that a terabyte in rounding numbers. So that airliner – that one piece of aircraft — is capturing a terabyte of information. Now that terabyte of information multiplied by every plane in the air – on any given day – then amassed – imagine how all of a sudden that you can start to get insights on how to make that plane run more efficiently. You start to see that maybe we can change the way the jet engine works so it maximizes energy efficiency. And for an airline, energy efficiency equals dollars to the bottom line.
And these are the kinds of things that happen in just a few little places, but today it starting small and you are going to start seeing it in every part of your life.
Knowledge at Wharton: What is the future of big data?
Survis: When I look at the future for big data I really think it is early days. As much as it has a lot of flash today – it is actually a model that Gardner comes up with, which after there is this whole sort of frenzy, there is this trough of disillusionment that happens. And we are actually just coming out of that trough of disillusionment because we had all this promise and then you say, okay, what can I do next?
The technology is moving so quickly. The operating systems to handle it are moving so quickly. The prices of memory are dropping so quickly. Things that you used to need to have discs to do now can be done in memory. All of these changes are going to allow big data to start delivering even more than what we are seeing today. And I caution people that say, well, it is just all hype. It is not all hype. And you have not even begun to see how you can take this kind of information and the kind of insight that you can get.
A great example is what is going on with NASA. They have been capturing images and information for years. But when you start taking that information and tying it against existing data, there is incredible information that satellites can tell us today. Beyond just seeing a pretty picture or knowing what is there, you can do time-lapse [photography] on the entire planet that shows deforestation, that shows water flows, that gives insight into our lives that never, ever was imagined. And that is the power of big data.
Knowledge at Wharton: What is the effect of big data on sustainability issues?
Survis: One of the great things about big data is that it is finding its way into so many places that people never imagined that you could be leveraging big data. One of the places is certainly sustainability. Business today and corporations today have gotten to the place where a lot of the low hanging fruit of changing their light bulbs and doing some changes to their supply chains, and doing the things that they know that they could bring some immediate dollars in — they are exhausting those resources.
“You can do time-lapse [photography] on the entire planet that shows deforestation, that shows water flows, that gives insight into our lives that never, ever was imagined.”
The next level of sustainability for business is going to be around leveraging big data to change operations — to leverage big data to do things that allow them to save energy, allows them to reduce their carbon footprint, that allows them to be more efficient about how they use water. And all of this begins to happen as they start accumulating the data from machines – accumulating the data from their operations – melding that with consumer information – and that is a huge piece of how business is going to be leveraging this against sustainability.
But I think another part of this is how many areas of traditional sustainability will have incredible impact by big data. Let’s talk about how we deal with animals and preserving animal habitats. Let’s talk about how we deal with some of our most pressing issues of humankind – how we are going to feed a population that is going to reach nine billion by 2050. These kinds of resource issues will not be solved by traditional means.
And what we are seeing are companies that are able to start taking weather data and start incorporating what we know about climate change today, and start leveraging to make farming more efficient, to change the way that farm insurance works, to start utilizing, if you can imagine, … machine data coming from seeds. The seeds are going to have data within them that will be able to transmit the efficiency of those seeds and how they are doing.
There is going to be massive, massive amounts of data today. Farming today, which was arguably one of the least technical businesses around, will become one of the most technical businesses. Companies like Monsanto are making huge investments in technology. Deere is making investments in technology. All of these companies are [doing so] because we all recognize that in the future there is going to have to be change in how we go about raising plants, raising animals, and raising enough food to feed this ever growing population.