Earlier this week TechCrunch broke the news that Google had acquired Geoff Hinton’s recently founded deep learning startup. Soon thereafter Geoff posted on his Google+ page an announcement confirming the news and his (part-time) departure to Google from the University of Toronto. From the details that have emerged so far, it appears that he will split his time between UoT and the Google offices in Toronto and Mountain View. What does Geoff’s move, and other recent higher profiles departures, say about the future of machine learning research in academia? A lot, I think.
First, some context. While Geoff is undoubtedly a giant in the field, he is only the latest in a string of departures to Google. Sebastian Thrun left Stanford to head Google’s autonomous car project in 2011. Andrew Ng has led several projects at Google, including the recent high-profile deep learning study in very large scale computer vision. And in late 2010, Matt Welsh made news when he left his tenured faculty position at Harvard to join Google. Except for Matt Welsh, this trend has particularly centered on large scale machine learning, and this question was recently put to Andrew Ng during the panel discussion at the BigVision workshop at NIPS. Andrew’s answer was that the effort involved in doing machine learning at a large scale required significant industrial engineering expertise, one that does not exist in academic settings including Stanford, and that a place like Google is simply much better equipped to carry out this function.
It’s easy to dismiss this as merely the commercialization and scaling of technology that was initially developed in academia. While this is in part true, it is also true that machine learning in general is increasingly becoming dependent on large-scale data sets. In particular, the recent successes in deep learning have all relied on access to massive data sets and massive computing power. I believe it will become increasingly difficult to explore the algorithmic space without such access, and without the concomitant engineering expertise that is required. The behavior of machine learning algorithms, particularly neural networks, is dependent in a nonlinear fashion on the amount of data and computing power used. An algorithm that appears to be performing poorly on small data sets and short training times can begin to perform considerably better when these limitations are removed. This has in fact been in a nutshell the reason for the recent resurgence in neural networks. Even the pre-training breakthrough of 2006 appears to not be strictly necessary if enough data and computing power is thrown at the problem.
All this suggests that, without access to significant computing power, academic machine learning research will find it increasingly difficult to stay relevant. Such a shift to industrial research would not be without precedent. The development of computing itself provides a valuable lesson. In this post, I will attempt to draw parallels between computing in the 20th century and what I see as the current trajectory of machine learning and artificial intelligence.
Computing developed in three broad phases. The first, running from around the 1930s to the 1950s, was one that I would describe as embryonic research. The work was extremely foundational, had little obvious commercial value, and was done primarily in the halls of academia, supported by government and military spending. The heroes of that era are people like Alan Turing and John von Neumann, brilliant minds that could see far into the future, but who had to basically toil in obscurity because their work was so far from reaching its ultimate realization, that the true magnitude of their contribution would only be realized decades later.
The second phase, roughly from the 1960s to the 1980s, was dominated by industrial research. This period is one where the field reached sufficient maturity that it was better done at companies than in academia. The science had become obviously useful, even if it was still far from reaching its full potential. There was no longer a question of whether there is something there worth exploring, but it remained a science, with much research to be done. Because of its potential utility, companies that were not in the business (think Bell Labs, Xerox, IBM) could nonetheless afford to spend large wads of money on the problem. And in many ways this became the golden age of the field, with many interesting if not absolutely foundational problems to be solved. Large companies were also willing to put their weight behind the problem and propel the field forward with great speed. Curiously however this era seemed to generate fewer famous personalities, perhaps because the work was done by lots of people distributed in many industrial labs. The singular genius was less important. This phase also seemed to particularly favor large established companies, ones that were able to throw significant amounts of money and expertise at the problem, without expecting an immediate return on investment. Startups would have had a hard time becoming commercially viable by relying only on computing, because the business was not there yet.
Finally came the third phase, the breakout of the science into a true technology that is used by the masses, spanning the 1980s to the 2000s (I am excluding the internet and mobile.) This phase was less interesting from a scientific perspective, as most of the intellectual heavy lifting had already been done, with the remaining technical work centering on scalability. On the other hand, this period is the most challenging and exciting entrepreneurially, as a new generation of founders took the technology mainstream, eclipsing the older companies that served as incubators for the science. Its heroes are thus not scientists, but corporate visionaries like Bill Gates and Steve Jobs, who became household names and effected great public impact and broad societal change.
Which brings us back to machine learning. My claim is that the above is a sort of generic template for the development of any technology. First it starts as basic academic research, then it graduates into industrial research, finally it becomes a real technology and is made mainstream by new startups. I think that if machine learning were viewed through the broader prism of artificial intelligence, as merely a stage in AI research, then what we are now witnessing is the transition of AI research from the first phase to the second. By AI I mean systems that are able to not only parameter-fit and self-learn input representations, but also reason in a structured manner by exploring compositional models. From this perspective, the 1960s to 1980s were the embryonic period, where a lot of foundational but obscure work was done and no one outside the field took it seriously. It was unclear if there was anything there, if there would be any ultimate payoff. The second phase began in the 1990s and 2000s and will run to the 2020s or 2030s. It’s still very much science (AI, not ML), but large companies like Google see the potential, and to them the basic question of whether it has any utility has been answered in the affirmative, and so they can justify throwing money at the problem, even though it is not strictly in their business (one could argue whether Google is really in the business of AI or just ML.) Because companies are now taking it seriously, the center of mass will shift heavily from academia to industry, with most of the interesting AI research occurring in industrial laboratories. And these industrial labs will be a safe and welcoming place for researchers for around two decades. The reason I think it will last that long is because, to take a very simple and naïve extrapolation, we are currently doing deep neural nets with tens of millions of nodes and billions of parameters. The human brain has approximately hundreds of billions of neurons and hundreds of trillions of synapses, and so, assuming Moore’s law holds and algorithmic developments keep up, we are about three orders of magnitude, or around 20 years, from getting a human brain (assuming doubling of computing power every two years and so 210 fold increase in 20 years.) By the 2030s the bulk of the science will be done, and in the succeeding decades the technology will be commercialized, by which time (2050s-2060s) we will have machine brains that are orders of magnitude smarter than human ones (!). This also suggests that the prime time for AI startups is not now, but some 20 years from now.
I made enough wild predictions and speculations in this post that the rope is now long enough to hang myself several times over, and so I will stop here. My point is simple though. Geoff Hinton’s move is part of a much larger shift. ML research, real fundamental research and not just scaling numerical methods to get the SVD of gargantuan matrices, is moving permanently to the industrial lab.
Update: This post got picked up by Hacker News with the requisite discussion here.
Update (12/9/13): Yann LeCun just announced that he’s joining Facebook.