Digital assistants still struggle with human conversation

Digital assistants like Amazon's Echo can listen to you. And they can talk back. But that doesn't mean they can carry on a good conversation.

Posted 1:27 a.m. Aug 18, 2018 — Updated 4:06 p.m. Aug 18, 2018

Keith Collins

and

Cade Metz, New York Times

Digital assistants like Amazon’s Echo can listen to you. And they can talk back. But that doesn’t mean they can carry on a good conversation.

As the devices that run these assistants become more commonplace — 39 million Americans now own one, according to a recent study — Amazon and competitors like Apple and Google foresee a day when you can chat with their assistants as you would with a friend.

We explored the conversational limits of these assistants, creating tests for the devices after consulting with the companies and a few artificial intelligence experts.

A digital assistant relies on many different technology systems, all working in concert. When you say something, one system tries to recognize each word and convert it to text. Another tries to understand the word’s meaning. A third system spits out responses. A fourth converts this response to digital speech. Other systems may also weigh in, but you get the point.

Many of these systems are powered by neural networks, which are complex mathematical systems that can learn particular tasks by pinpointing patterns in large amounts of data.

Don’t expect the assistants to replace conversations with friends anytime soon. But the experiments — even in the moments when the assistants screwed up — showed what these assistants could one day become in our lives.

— Let’s make a list

Because these devices are not yet nimble enough to respond to anything you might ask, they must be honed for particular tasks.

Add these things to my shopping list …

AMAZON ECHO: What can I add for you?

APPLE HOMEPOD: I added “these things” to the shopping list.

GOOGLE HOME: What do you want to add?

Right away, the HomePod showed how difficult it can be for machines to understand natural language. The device recognized what we were doing, but misunderstood how we wanted to do it, and added an item called “these things” to a shopping list.

… guacamole, chips, tortillas.

AMAZON ECHO: “Guacamole chips tortillas” added to your shopping list.

APPLE HOMEPOD: No longer listening.

GOOGLE HOME: OK, I’ve added “guacamole” and “chips tortillas" to your shopping list.

If a friend asked what you wanted from the market and you said “guacamole, chips, tortillas,” he would understand them as three separate things. But the Google and Amazon devices didn’t quite recognize the pause between items as a verbal comma. Although these machines are pretty good at recognizing what you say, they may not completely understand it.

Even if you make a rather specific request, it can have multiple meanings. And that can be tough for an assistant to deal with.

Who won the Giants game last night?

AMAZON ECHO: On December 31st, the Giants beat the Redskins 18 to 10. They’ll play on August 9th at 7 p.m. at home against the Browns.

APPLE HOMEPOD: Which Giants? New York Giants, or San Francisco Giants?

GOOGLE HOME: Yesterday, the Giants played the Padres. The final score was 3 to 2, Giants, in zero innings.

If you’re a sports fan, you know there are two professional teams that call themselves the Giants. If someone asks about a Giants game in the summer, they’re probably talking about the San Francisco Giants, who play baseball.

Google and Apple likely have written an explicit rule for this particular situation.

Alexa guessed that we meant the New York Giants football team, which hadn’t played a game in seven months, but this may have been because we were asking the question in New York. Amazon sometimes generates responses based on your location.

Who do they play next?

AMAZON ECHO: First pitch of the NL West matchup between the Giants and the Diamondbacks is scheduled for tomorrow night at 9:40 p.m. This season, the Giants have bested the Diamondbacks in seven of the 12 games they’ve played.

APPLE HOMEPOD: The Giants battle the Diamondbacks tomorrow at 9:40 p.m.

GOOGLE HOME: Their next game is tomorrow at 9:40 p.m., when they will play the Diamondbacks.

When you chat with a friend, you use pronouns. To keep up, a digital assistant must recognize pronouns for what they are and map them back to words in a previous exchange. Engineers might train one neural network to identify what you are trying to do (build a shopping list, get the score of a baseball game) and train another to parse a sentence and identify the nouns, pronouns, verbs and other parts of speech. It’s like elementary school grammar classes for computers. Algorithms are learning to identify pronouns by analyzing hundreds of sentences, detecting patterns in the syntax.

We asked the devices where we could find the nearest coffee shop, then followed up with a vague command.

Give me directions.

AMAZON ECHO: At the moment, traffic to Ramini Espresso Bar & Cafe at 265 W. 37th St. looks a little sluggish. The fastest route takes about six minutes via W. 42nd Street and 7th Avenue.

APPLE HOMEPOD: I can’t help with directions here. Sorry about that.

GOOGLE HOME: The best way to get to Culture Espresso by car is via West 39th Street, and will take about nine minutes in heavy traffic.

Amazon trains neural networks to recognize that this kind of pronoun-free request refers back to a previous exchange. The algorithms analyze hundreds, perhaps thousands, of requests like this and learn to identify them on their own. When generating responses, these assistants plug particular information into response templates. Sometimes, engineers will build this template to sound more conversational. That seemed to be the case here, when Alexa said that traffic was “sluggish.” Don’t expect these devices to handle queries that require self-reflection or complex analysis.

Tell me something you do not know.

AMAZON ECHO: Sorry, I’m not sure.

APPLE HOMEPOD: My mistake. Please try again and tell me what were you looking for.

GOOGLE HOME: Hmm, well, I know the Muffin Man, but I don’t know if the Muffin Man knows me.

This is an incredibly difficult question for a computer to answer, unless it has been specifically programmed to handle it. Many times, these devices will reply with canned responses, as Google did with the Muffin Man bit.

Leading researchers have built experimental neural networks that learn to carry on richer conversations by analyzing reams of real (human) dialogue, like exchanges on Twitter or Facebook Messenger. But these neural networks can veer into nonsense and can reinforce the flaws of human conversation (gender bias, rudeness, maybe even racism). Bridging the gap between these experimental networks and practical products like the Amazon Echo will take some time.

For now, you will have to make do with the Muffin Man.