The significance of Watson
February 13, 2011 by Ray Kurzweil
IBM's "Watson" Deep QA program, running on IBM Power7 servers. (Image: IBM T.J. Watson Research Labs)
In The Age of Intelligent Machines, which I wrote in the mid-1980s, I predicted that a computer would defeat the world chess champion by 1998. My estimate was based on the predictable exponential growth of computing power (an example of what I now call the “law of accelerating returns”) and my estimate of what level of computing was needed to achieve a chess rating of just under 2800 (sufficient to defeat any human, although lately the best human chess scores have inched above 2800).
I also predicted that when that happened we would either think better of computer intelligence, worse of human thinking, or worse of chess, and that if history was a guide, we would downgrade chess.
Deep Blue defeated Gary Kasparov in 1997, and indeed we were immediately treated to rationalizations that chess was not really exemplary of human thinking after all. Commentaries pointed out that Deep Blue’s feat just showed how computers were good at dealing with high- speed logical analysis and that chess was just a matter of dealing with the combinatorial explosion of move-countermoves. Humans, on the other hand, could deal with the subtleties and unpredictable complexities of human language.
I do not entirely disagree with this view of computer game playing. The early success of computers with logical thinking, even at such tasks as solving mathematical theorems, showed what computers were good for. Recall that CMU’s “General Problem Solver” solved a mathematical theorem in the 1950s that had eluded Russell and Whitehead in their Principia Mathematica, one of the early successes of the AI field that led to premature confidence in AI.
Computers could keep track of vast logical structures and remember enormous databases with great accuracy. Search engines such as Google and Bing continue to illustrate this strength of computers.
Indeed no human can do what a search engine does, but computers have still not shown an ability to deal with the subtlety and complexity of language. Humans, on the other hand, have been unique in our ability to think in a hierarchical fashion, to understand the elaborate nested structures in language, to put symbols together to form an idea, and then to use a symbol for that idea in yet another such structure. This is what sets humans apart.
That is, until now. Watson is a stunning example of the growing ability of computers to successfully invade this supposedly unique attribute of human intelligence. If you watch Watson’s performance, it appears to be at least as good as the best “Jeopardy!” players at understanding the nature of the question (or I should say the answer, since “Jeopardy!” presents the answer and asks for the question, which I always thought was a little tedious). Watson is able to then combine this ability to understand the level of language in a “Jeopardy!” query with a computer’s innate ability to accurately master a vast corpus of knowledge.
I’ve always felt that once a computer masters a human’s level of pattern recognition and language understanding, it would inherently be far superior to a human because of this combination.
We don’t know yet whether Watson will win this particular tournament, but it won the preliminary round and the point has been made, regardless of the outcome. There were chess machines before Deep Blue that just missed defeating the world chess champion, but they kept getting better and passing the threshold of defeating the best human was inevitable. The same is true now with :Jeopardy!.”
Yes, there are limitations to “Jeopardy!” Like all games, it has a particular structure and does not probe all human capabilities, even within understanding language. Already commentators are beginning to point out the limitations of “Jeopardy!,” for example, that the short length of the queries limits their complexity.
For those who would like to minimize Watson’s abilities, I’ll add the following. When human contestant Ken Jennings selects the “Chicks dig me” category, he makes a joke that is outside the formal game by saying “I’ve never said this on TV, ‘chicks dig me.’” Later on, Watson says, “Let’s finish Chicks Dig Me.” That’s also pretty funny and the audience laughs, but it is clear that Watson is clueless as to the joke it has inadvertently made.
However, Watson was never asked to make commentaries, humorous or otherwise, about the proceedings. It is clearly capable of dealing with a certain level of humor within the queries. If suitably programmed, I believe that it could make appropriate and humorous comments also about the situation it is in.
It is going to be more difficult to seriously argue that there are human tasks that computers will never achieve. “Jeopardy!” does involve understanding complexities of humor, puns, metaphors and other subtleties. Computers are also advancing on a myriad of other fronts, from driverless cars (Google’s cars have driven 140,000 miles through California cities and towns without human intervention) to the diagnosis of disease.
Watson on your PC or mobile phone?
Watson runs on 90 servers, although it does not go out to the Internet. When will this capability be available on your PC? It was only five years between Deep Blue in 1997, which was a specialized supercomputer, and Deep Fritz in 2002, which ran on eight personal computers, and did about as well.
This reduction in the size and cost of a machine that could play world-champion level chess was due both to the ongoing exponential growth of computer hardware and to improved pattern recognition software for performing the key move-countermove tree-pruning decision task. Computer price-performance is now doubling in less than a year, so 90 servers would become the equivalent of one in about seven years. Since a server is more expensive than a typical personal computer, we could consider the gap to be about ten years.
But the trend is definitely moving towards cloud computing, in which supercomputer capability will be available in bursts to anyone, in which case Watson-like capability would be available to the average user much sooner. I do expect the type of natural language processing we see in Watson to show up in search engines and other knowledge retrieval systems over the next five years.
Passing the Turing test
How does all of this relate to the Turing test? Alan Turing based his eponymous Turing test entirely on human text language based on his (in my view accurate) insight that human language embodies all of human intelligence. In other words, there are no simple language tricks that would enable a computer to pass a well-designed Turing test. A computer would need to actually master human levels of understanding to pass this threshold.
Incidentally, properly designing a Turing test is not straightforward and Turing himself left the rules purposely vague. How qualified does the human judge need to be? How human does the judge need to be (for example, can he or she be enhanced with nonbiological intelligence)? How do we ensure that the human foils actually try to trick the judge?
How long should the sessions be? Mitch Kapor and I bet $20,000 ($10,000 each), with the proceeds to go to the charity of the winner’s choice, whether a computer would pass a Turing test by 2029. I said yes and he said no. We spent considerable time negotiating the rules, which you can see here:
What does this achievement with “Jeopardy!” tell us about the prospect of computers passing the Turing test? It certainly demonstrates the rapid progress being made on human language understanding. There are many other examples, such as CMU’s Read the Web project, which has created NELL (Never Ending Language Learner), which is currently reading documents on the Web and accurately understanding most of them.
With computers demonstrating a basic ability to understand the symbolic and hierarchical nature of language (a reflection of the inherently hierarchical nature of our neocortex), it is only a matter of time before that capability reaches Turing-test levels. Indeed, if Watson’s underlying technology were applied to the Turing test task, it should do pretty well. Consider the annual Loebner Prize competition, one version of the Turing test. Last year, the best chatbot fooled the human judges 25 percent of the time, and the competition requires only a 30 percent level to pass.
Given that contemporary chatbots do well on the Loebner competition, it is likely that such a system based on Watson technology would actually pass the Loebner threshold. In my view, however, that threshold is too easy. It would not be likely to pass the more difficult threshold that Mitch Kapor and I defined. But the outlook for my bet, which is not due until 2029, is looking pretty good.
It is important to note that an important part of the engineering of a system that will pass a proper Turing test is that it will need to dumb itself down. In a movie I wrote and co-directed, The Singularity is Near, A True Story about the Future, an AI named Ramona needs to pass a Turing test, and indeed she has this very realization. After all, if you were talking to someone over instant messaging and they seemed to know every detail of everything, you’d realize it was an AI.
What will be the significance of a computer passing the Turing test? If it is really a properly designed test it would mean that this AI is truly operating at human levels. And I for one would then regard it as human. I’m expecting this to happen within two decades, but I also expect that when it does, observers will continue to find things wrong with it.
By the time the controversy dies down and it becomes unambiguous that nonbiological intelligence is equal to biological human intelligence, the AIs will already be thousands of times smarter than us. But keep in mind that this is not an alien invasion from Mars. We’re creating these technologies to extend our reach. The fact that farmers in China can access all of human knowledge with devices they carry in their pockets is a testament to the fact that we are doing this already.
Ultimately, we will vastly extend and expand our own intelligence by merging with these tools of our own creation.