The buzzer factor: did Watson have an unfair advantage? [UPDATE]
Does Watson have an unfair advantage over humans because it can signal its response instantly? It seemed that way in the three “Jeopardy!” TV shows this week, especially Wednesday night, as Watson proceeded to totally own the humans.
FEBRUARY 24, 2011:
From Final Jeopardy: Man vs Machine and the Quest to Know Everything by Stephen Baker:
“After the match,
and Rutter stressed that the computer still had cognitive catching up to do. They both agreed that if ‘Jeopardy’ had been a written test — a measure of knowledge, not speed — they both would have outperformed Watson. ‘It was its buzzer that killed us,’ Rutter said.” Jennings
Note that this appears to contradict statements made by IBM, cited below.
Watson's "hand" is a mechanical device that functions like a human contestant's buzzer. (Photo: IBM)
It gets down to human vs. computer reaction time to the light that signals that players can press the button. (After the host reads the question, a light signals the contestants that the button is armed so they all have the same start cue.) According to one literature review, the accepted figure for mean simple reaction times for college-age individuals for light stimuli is about 190 ms (0.19 sec). (You can test your reaction time to a traffic light change or dot color change.)
What about Watson? Its reaction time is just five to ten milliseconds, according to a white paper provided to me by IBM and written by Dr. David Ferrucci, who heads up the Watson computer design team. That’s a 38:1 advantage!
But wait — Ferrucci throws us a curve: “By anticipating and ‘timing the buzz,’ top players do not have to wait for the enable lights,” he reveals. “Rather, they start their neurons and muscles going well ahead of the very end of the clue.
They win the buzz consistently, with remarkable speeds under twenty, ten and even under four milliseconds after the enable lights go on! They can and have timed the buzzer faster than Watson’s fastest possible speed to the buzz.”
Back on the stage, disconnected from any life-lines, Watson goes head-to-head with “Jeopardy!” champs. The clue appears, and the process begins: Watson deeply analyzes the question, explores millions of pieces of text, and considers hundreds of possible answers, gathering and analyzing all the evidence it can for each possibility. Watson computes a strong confidence in its top answer. It does ALL this before the host finishes reading the clue.
But so did Sue [another contestant in pre-trials]. Sue times the buzz through learned anticipation, as do all good human ‘Jeopardy!’ champs. She’s got the rhythm, she is synced-up to the cadence of Alex’s voice, she knows what the penultimate syllable will sound like, she tenses her muscles, readies her mind, the answer she computed seemingly ages ago is on the tip of her tongue, she is anticipating….anticipating.
The milliseconds are zipping by, the end of the clue is being spoken and then with uncanny accuracy she anticipates the [buzz delay] and finally her own electromechanical delay and bam! — Sue’s electrons steal the buzz just two milliseconds after the enable lights go on. Three whole milliseconds later, Watson’s fastest signal arrives — too late. Watson was beaten to the buzz.
— David Ferrucci white paper
It’s actually even worse for Watson, Ferrucci goes on. “It spends its 2,800 compute cores trying to understand and answer the question in enough time to get any chance to buzz in. The current version of Watson does this about 90%-95% of the time. Depending on the specific game, Watson may not have enough time to even decide to buzz for the about 5%-10% of the clues. If Watson gets ready to buzz in fast enough, it can leverage its speedy hand, but it still does not buzz perfectly; Watson has to manage its risk.” The risk here is that there is a penalty if a contestant buzzes before the light: a quarter-second (250 ms) lock-out.” (The contestant is blocked from responding.)
“According to our estimates, in Ken Jennings’ 74-game winning streak, he buzzed in first for about 60% of the questions in a game. This was his average “buzzer-ability” over the entire streak. He had games where he buzzed in first for over 70% and even up to 80% of the board! On a game like that, he totally dominated, leaving only 20% (not counting rebounds and triple stumpers) for the two other players to split.” (See “Timing analysis” below for more details.)
OK, but what can we realistically conclude about Watson’s capabilities from this demonstration? After all, it’s TV entertainment — not a controlled lab experiment. Watson was capable of giving correct responses to the clues. But we can’t conclude Watson knew more, or “thought” faster that human contestants. Contestant fatigue, distractions, and stress levels could have also affected the outcome. Some might also conclude that the clues were relatively simple (that’s how many of them seemed to me), which could have made reaction time a bigger factor in scoring than the contestant’s knowledge. A definitive test would eliminate the buzzer.
In other words, what are Watson’s (or more importantly, DeepQA‘s — Watson’s question answering software) real capabilities compared to humans? I ran across an in-depth discussion of DeepQA in D. Ferrucci et al., “Building Watson: An Overview of the DeepQA Project,” AI Magazine, Fall 2010 that offers some clues.
Next week, we’ll look at where IBM plans to take DeepQA next. Hint: not a TV show.
At the same time the clue is displayed on a screen to the contestants, it is transmitted electronically to Watson. It is sent in the same format that appears on the Jeopardy! display (usually all uppercase letters) but is sent as a single line of text. Given the speed of light and of electronic transmission, the clue text hits Watson’s chips at essentially the same time that it hits the human players’ retinas. And then the language processing begins for both human and machine.
The enable light travels to the human eye at roughly the same speed the enable signal travels to Watson’s chips. Electronic signals over wires travel a bit slower than light and Watson is a bit further away from the signal source so perhaps the humans may have a few billionths of a second advantage….
After Watson gets the enable signal, the third and most discussed interface comes into play: the physical buzzer. If and only if Watson has computed an answer with a sufficiently good confidence will it send an electronic signal to its hand.
Yes, Watson has a “hand” that it uses to physically depress the “Jeopardy!” buzzer. Watson’s hand looks like a clear, Plexiglas, cylindrical soda can with a few metal screws in the top and a wire extending from the bottom that is connected to Watson’s Front-End Controller. The mechanical hand wraps around the Jeopardy! buzzer which is inserted in the bottom of the Plexiglas cylinder and is held in place by a clamp. Watson’s hand uses a solenoid to physically press the same button that the humans must press. After all, if the humans have to physically push plastic down onto a spring, so should Watson. It should not have a direct electronic link. Fair enough.
Watson’s hand is pretty fast in terms of raw speed — it takes somewhere between five and ten milliseconds for Watson to activate the buzzer once it decides to answer. This delay is affected by the speed of the solenoid and other small, sometimes hard-to-pin-down delays inherent in the software stack. This refers to layers of software through which the enable signal and then the decision to buzz must go to get to Watson’s hand. Watson does not, however, win the buzz all the time, not by a long shot.
— David Ferrucci white paper