Today’s post comes from Jenni McKienzie, our principal designer. Enjoy! -Team Banter


We recently visited my aunt with our newly-adopted dog in tow, which led her to share this story about her daughter:

At some point while dating her then-boyfriend, the daughter said “I’m a big dog lover!” in a conversation.

There’s a lack of consensus about the exact delivery of that line, but in hindsight it became clear that she meant:

I’m a big dog lover!

But what her boyfriend heard was:

I’m a BIG dog lover!

When they got married, guess what he gave her as a wedding gift? Hint: Daisy now weighs 95 pounds, is chocolate brown, and loves to jump in the pool.

It took several people several tries to explain to her why he got her a Labrador retriever.

This was a human-to-human conversation where the interpretation was incorrect. If this all-too-common type of misunderstanding happens between people, how do we expect natural language processors to get it right in human-to-computer conversations?

While scientists and technology purveyors have made great strides in speech recognition and natural language processing, there’s still a long way to go, especially in the areas of disambiguation and understanding context. Is the correct return “I” or “eye?” “To,” “too,” or “two?”

These are the challenges that face the assistant platforms like Google Assistant, Alexa, Cortana, and Siri. Sometimes they interpret a user’s words and intent right…and sometimes not. But these assistants must deal with a multitude of domains and contexts, so it’s much harder for them to get it right in all domains and contexts they’re expected to cover. And sometimes that’s frustrating for the users. This is most likely why a recent Nielsen Norman Group study concluded that intelligent assistants “have poor usability.”

But there’s a bit of a silver lining when it comes to conversational apps. When we design and develop a specific application, skill, or action, we gain an understanding of the domain and the contexts of use. (We do this through user-centered techniques like field observations and usability testing, but that’s the subject for a separate post). And when our app is conversing with a user, we know what question the user is responding to. So we can usually adapt our design to deal with uncertainty… oftentimes even if the assistant’s recognizer gets a word or two wrong.

So while the assistant platforms themselves might frustrate users, we’ve found that smart, domain- and context-informed design is critical to creating usable, enjoyable voice-first experiences.