Apoorv Saxena, lead product manager at Google and co-founder of the AI Frontiers conference that will be held in Santa Clara, Calif., from November 3-5, speaks with Knowledge at Wharton about why interest in artificial intelligence is growing, what is likely to happen in the near future and which challenges will take longer to overcome. [Knowledge at Wharton is a media partner for the conference.]
An edited transcript of the conversation follows.
Knowledge at Wharton: Interest in artificial intelligence has picked up dramatically in recent times. What is driving this hype? What are some of the biggest prevailing misconceptions about AI and how would you separate the hype from reality?
Apoorv Saxena: There are multiple factors driving strong interest in AI recently. First is significant gains in dealing with long-standing problems in AI. These are mostly problems of image and speech understanding. For example, now computers are able to transcribe human speech better than humans. Understanding speech has been worked on for almost 20 to 30 years, and only recently have we seen significant gains in that area. The same thing is true of image understanding, and also of specific parts of human language understanding such as translation.
Such progress has been made possible by applying an old technique called deep learning and running it on highly distributed and scalable computing infrastructure. This combined with availability of large amounts of data to train these algorithms and easy-to-use tools to build AI models, are the major factors driving interest in AI.
It is natural for people to project the recent successes in specific domains into the future. Some are even projecting the present into domains where deep learning has not been very effective, and that creates a lot of misconception and also hype. AI is still pretty bad in how it learns new concepts and extending that learning to new contexts.
For example, AI systems still require a tremendous amount of data to train. Humans do not need to look at 40,000 images of cats to identify a cat. A human child can look at two cats and figure out what a cat and a dog is — and to distinguish between them. So today’s AI systems are nowhere close to replicating how the human mind learns. That will be a challenge for the foreseeable future.
“Understanding natural language or conversation requires huge amounts of human knowledge and background knowledge.”
Knowledge at Wharton: How would you separate the hype from the reality?
Saxena: A lot of the hype originates from the extrapolation of current trends and ignoring the reality of taking something from a research paper to an engineered product. As a product manager responsible for building products using the latest AI technology, I am constantly trying to separate the hype from reality. The best way to do this is to combine the healthy skepticism of an engineer with an optimism of a researcher. So you need to understand the underlying technical principles driving the latest cool AI demo and be able to extrapolate only the parts of the technology that have firm technical grounding. For example, if you understand the underlying drivers of improvements in say speech recognition it becomes easy to extrapolate the upcoming improvements in speech recognition quality. Combine that with a healthy skepticism of where natural language understanding is today, you will be able to identify the right opportunities in say what pieces of the call centers workflow will be automated in the near future.
Knowledge at Wharton: What is possible with AI in the near term, and what is more difficult to do?
Saxena: As I mentioned in narrow domains such as speech recognition AI is now more sophisticated than the best humans while in more general domains that require reasoning, context understanding and goal seeking, AI can’t even compete with a five-year old child. I think AI systems have still not figured out to do unsupervised learning well, or learned how to train on a very limited amount of data, or train without a lot of human intervention. That is going to be the main thing that continues to remain difficult. None of the recent research have shown a lot of progress here.
There is a very good quote from [Google engineering fellow] Geoff Hinton who is known as the father of deep learning. I might be misquoting him but it goes something like, “Deep learning actually spoiled AI because it made a lot of people think it can do everything when we know that it can only solve very limited kinds of problems.” I think there are still significant challenges in AI. There are no recent advances that tell us when we will get there or solve them anytime soon.
Knowledge at Wharton: AI is a vast field covering many areas, and some of them are quite confusing to non-experts. For example, you and Wharton operations, information and decisions professor Kartik Hosanagar wrote an article for Knowledge at Wharton last April about the democratization of machine learning. What is happening today in machine learning that impresses or surprises you the most?
Saxena: What impresses me is how, with the availability of really easy to use tools, how widely AI is being used to help the world. So we have heard about farmers in Japan using AI to sort their cucumbers, to sort through their produce to sort good produce from bad produce. Some logistics company in Africa is using AI to route packages. It always surprises me how hungry and how innovative and creative people are in using AI. Even though it’s limited in some ways, people are still using it and making it meaningful. I definitely am super impressed [with this phenomenon].
Knowledge at Wharton: In addition to machine learning, you also referred a couple of times to deep learning. For many of our readers who are not experts in AI, could you explain how deep learning differs from machine learning? What are some of the biggest breakthroughs in deep learning?
Saxena: Machine learning is much broader than deep learning. Machine learning is essentially a computer learning patterns from data and using the learned patterns to make predictions on new data. Deep learning is a specific machine learning technique.
Deep learning is modeled on how human brains supposedly learn and use neural networks — a layered network of neurons to learn patterns from data and make predictions. So just as humans use different levels of conceptualization to understand a complex problem, each layer of neurons abstracts out a specific feature or concept in an hierarchical way to understand complex patterns. And the beauty of deep learning is that unlike other machine learning techniques whose prediction performance plateaus when you feed in more training data, deep learning performance continues to improve with more data. Also deep learning has been applied to solve very different sets of problems and shown good performance, which is typically not possible with other techniques. All these makes deep learning special, especially for problems where you could throw in more data and computing power easily.
Knowledge at Wharton: Can you talk a little bit about some of the biggest breakthroughs in deep learning that you find most impressive?
Saxena: Deep learning is an exciting field with lots of experimentation and new techniques being proposed over the last two to three years. There are two that come to mind. One is reinforcement learning, which I will explain in a minute. And the other big thing that is happening is GANs, or Generative Adversarial Networks.
Both of these are breakthroughs because they address one of the key problems in AI that I highlighted — how to learn without a lot of human supervision. So in the most layman terms, reinforcement learning is essentially agent-based learning where an agent, a software program, is given an optimization goal and it tries to optimize by taking multiple paths and choosing the best path by learning from mistakes or errors. This is the same technique that led to advances in machine learning — how to play video games, such as game of Atari, or even in a more advanced strategy games like Go.
“For example, setting up a meeting or an appointment between two people can be completely handed over to a chat bot.”
The other big area that has generated tremendous interest involves Generative Adversarial Networks or GANs in short. In layman’s terms, think about someone learning something with a buddy. So we essentially have two neural models competing and teaching each other and improving each other to expedite the learning process. GANs work well for class of problems called unsupervised learning — where you don’t have a lot of trained data to tell the machine what to learn. GANs have been applied to make significant progress in image generation and video morphing, and many more to come.
Knowledge at Wharton: The other area of AI that gets a lot of attention is natural language processing, often involving intelligent assistants, like Siri from Apple, Alexa from Amazon, or Cortana from Microsoft. How are chatbots evolving, and what is the future of the chatbot?
Saxena: This is a huge area of investment for all of the big players, as you mentioned. This is generating a lot of interest, for two reasons. It is the most natural way for people to interact with machines, by just talking to them and the machines understanding. This has led to a fundamental shift in how computers and humans interact. Almost everybody believes this will be the next big thing.
Still, early versions of this technology have been very disappointing. The reason is that natural language understanding or processing is extremely tough. You can’t use just one technique or deep learning model, for example, as you can for image understanding or speech understanding and solve everything. Natural language understanding inherently is different. Understanding natural language or conversation requires huge amounts of human knowledge and background knowledge. Because there’s so much context associated with language, unless you teach your agent all of the human knowledge, it falls short in understanding even basic stuff.
That’s where the challenge is. All the big companies you mentioned are investing heavily in this area. I see progress being made within narrow domains, like for example ordering a pizza or solving problems such as, “My bank account is running low, can you allow me to make this transaction?” Such problems will get solved in the near term. But when you come to more open ended discussions — imagine your AI assistant acting like your psychiatrist — those solutions are much further out because they require deeper understanding of human knowledge and emotions that AI will lack for the foreseeable future.
Knowledge at Wharton: What do you think is the future of the chatbot?
Saxena: When chatbots operate within specific vertical domains and contexts, as I said, chatbots will do well. When the context is fixed and doesn’t vary — and, more importantly, the user’s expectation of the chatbot is limited — I think in these areas chatbots will do really well.
Other areas we have seen chatbots being used for is what we call goal-oriented conversations. For example, setting up a meeting or an appointment between two people can be completely handed over to a chatbot. Here the context is very limited of coordinating the calendars of two people or making a reservation in a restaurant. Instead of a human being calling a restaurant to make a reservation, a chatbot can do this automatically because the task and context are both very well defined. Anything beyond that is still difficult in my view.
Knowledge at Wharton: What is computer vision? Is it possible to make machines understand video the way that human beings do? What are the most promising business applications here, and the biggest challenges in making them a reality?
Saxena: Computer vision is the science of understanding images and videos. One example of understanding image is what objects are in an image. The same thing goes with videos. In a video, you consider the different scenes you see as well as the different people and objects in the scene.
And then describing each scene by correlating different images or scenes or frames within the video is also possible — or increasingly getting possible now — where AI can watch a video and summarize what it saw in the video. All these are within the realm of computer vision or visual understanding.
There are many areas where computer vision can be applied. One promising application of computer vision is in surveillance. We have the ability to detect anomalies in a surveillance video. Another big application is in the field of self-driving vehicles, where AI enables cars to understand what is on the road, detect objects, and then making decisions, and allowing the car to make decisions on those. That’s the other big area.
“So the ability to modify video and make changes in a video, and make it realistic is going to be a huge challenge as well as a huge opportunity.”
On the video front I clearly see huge improvements. Video is called dark data for a reason today because our ability to understand video is pretty limited. But imagine a world where machines can start understanding what’s in a video. You will see tremendous advances in the near future in machines helping humans generate videos on their own. It will not be completely automated, but one of the risks here is the ability to create fake videos. Recently you may have seen — it was pretty popular on social media – a video of Barack Obama speaking fake messages. It is very easy to morph videos and human lip-synch technology to make anybody believe anything. That really caused a lot of stir in this space. So the ability to modify video and make changes in a video, and make it realistic is going to be a huge challenge as well as a huge opportunity. So that is coming.
Knowledge at Wharton: That sounds incredible. Now, a number of big companies are active in AI — especially Google, Microsoft, Amazon, Apple in the U.S., or in China you have Baidu, Alibaba and Tencent. What opportunities exist in AI for startups and smaller companies? How can they add value? How do you see them fitting into the broader AI ecosystem?
Saxena: I see value for both big and small companies. A lot of the investments by the big players in this space are in building platforms where others can build AI applications. Almost every player in the AI space, including Google, has created platforms on which others can build applications. This is similar to what they did for Android or mobile platforms. Once the platform is built, others can build applications. So clearly that is where the focus is. Clearly there is a big opportunity for startups to build applications using some of the open source tools created by these big players.
The second area where startups will continue to play is with what we call vertical domains. So a big part of the advances in AI will come through a combination of good algorithms with proprietary data. Even though the Googles of the world and other big players have some of the best engineering talent and also the algorithms, they don’t have data. So for example, a company that has proprietary health care data can build a health care AI startup and compete with the big players. The same thing is true of industries such as finance or retail.
Knowledge at Wharton: Can you give any examples of startups that are doing the most significant work in AI? Why is their work important?
Saxena: There have not been many breakout successes in the AI-centric startups yet. When I say breakout successes, I mean multi-million or even billion dollar startups. There are a lot of promising startups across the board. For example, in the area of customer service I have seen startups doing well. In the area of HR automation I have seen some good startups.
Knowledge at Wharton: What are the top three areas in AI that everybody should be paying attention to in the next 12 to 24 months and why?
Saxena: I think the intersection of robotics and AI is going to be interesting. Robotics has been disappointing for a long time in terms of wide-scale adoption. This is one area in which I would say a combination of AI and robotics is going to be interesting. You will see some noteworthy applications coming up in that space. More human-like robots will be one big area, with advances in natural language understanding and visual understanding, and of course robotics. That is one area that I would definitely watch.
Self-driving cars are also a critical area. Within the next few years we will see commercial deployment of self-driving cars.
I am bullish on some of the advances we will see in video understanding. A combination of video understanding combined with virtual reality could create some interesting breakthroughs. That is another area we should keep watching. The common theme I see is not AI in particular, but AI combined with some other domain. That can create some compelling use cases in the near future.