The rapid evolution of smart devices is pushing a new trend in user interface design – voice user interfaces. As people grow more comfortable talking to their phones, tablets, and wearables, brands who want to remain visible must invest in voice-friendly designs. See how voice user interface design has evolved, look at the most popular current interfaces, and learn how designers can optimize sites for voice-driven experiences.
Why Voice User Interface?
Speech is the first way humans learn to converse. From birth, we start recognizing sound as having meaning and constructing all other understanding from that knowledge. Humans have been refining the art of conversation since the dawn of time. Voice user interface is a logical step.
Conversational speech is much more complex than other forms of interaction. It involves processing several layers of cues from tone, volume, and body language and cross-referencing them with background knowledge and psychological adaptations. The challenge of creating voice user experience lies in identifying those interactions and providing a reliable translation.
Speech Recognition Development
Speech recognition systems have been around since 1952 when Bell Laboratories designed “Audrey.” The first voice-controlled computer understood digits only when they were spoken by one voice at a time. In 1962, IBM debuted “Shoebox,” which understood 16 English words. Since then, countries around the world strove to develop technology that could turn spoken language into usable data.
In the 1970s, Harpy used beam search to construct sentences and process enough words to have the equivalent of a 3-year-old’s vocabulary. During the next decade, vocabulary recognition improved as systems developed the ability to process unknown sounds and predict what words they might be part of based on speech patterns.
Now, voice user interfaces are everywhere. Smartphones, smart homes, and televisions all recognize speech, and voice interaction will continue to grow. VoiceLabs estimates 24.5 million voice-driven devices will ship to homes in 2017, four times as many as last year.
Siri and Cortana have been providing assistance for years. Now devices such as Google Home and Amazon Echo allow consumers to select and launch their music playlist, order takeout, check movie showtimes, and more.
Speech with these devices still isn’t conversational. Users have interfaces assist in performing a task like turning off the lights or providing information on the weather. Task completion doesn’t change based on the user’s mood or current activities.
What Users Want
Artificial intelligence is already a mainstay in modern digital products and marketing, and speech technology has grown tremendously within the past decade. Because humans learn to speak at such an early age, by the time they’re adults, they don’t even notice the principles that regulate conversation. Much of the information we communicate is not in the actual words but in the context. For example, think about the following exchange:
Bob: How about a number two to go, and I’ll take a small fry with that.
Mary: Ketchup and mustard?
With those two lines, the reader infers that Bob is a customer at a fast food restaurant, and he’s probably ordering some type of sandwich with French fries. Mary is a restaurant employee asking what condiments he would like. If Bob just walked up to the counter and made his statement, he would not have needed to provide any other information in that context.
Bob doesn’t have to say he would like to buy the food. He lets Mary know, using only 14 words, everything he wants to eat and that he would like it in a package he can carry with him. Mary implies she is willing to accommodate his request with three words when she asks what sauce he would like.
While people realize talking to their cell phone is different from talking to a person, they want to use the same principles they would in human communication. If they must restate requests and change their dialogue, they feel frustration and are more likely to abandon the interface for other forms of communication.
They don’t want to chit-chat with their devices any more than Bob wanted a long conversation with Mary. They want to accomplish their goal without wasting unnecessary time and energy. The challenge for designers is to recognize assumptions and context to create a user interface that allows human and machine to understand each other.
Voice Interface Design
Voice interactions involve three layers:
- A voice app such as Actions for Google or Skills for Amazon
- The artificial intelligence platform
- The computer, smartphone, or another device
Devices are always listening for cues indicating that their services are needed. When the user says, “Okay, Google,” the app is alerted that it’s about to it to complete a task. The device converts the user’s speech to text and uses that to perform the desired action.
Designers can structure voice apps around existing AI platforms. Google and Amazon both provide templates to quickly design apps with a minimal amount of expertise. Google advertises that it’s possible to build an app in 30 minutes that doesn’t require users to install anything to complete transactions, check order history, reorder past items, and more.
Amazon gives designers and developers access to more than 15,000 abilities or skills. The Alexa Skills Kit (ASK) lets developers select APIs, code samples, and tools to customize Alexa’s capabilities for their app.
As the result of improved access to voice app design processes, developers launch thousands of new voice apps every month. With increased competition, only the apps with good design will stand out from the rest.
How to Design for Users
Before designers start browsing resources like Alexa Skills Kit materials, approach the voice user experience as you would any other task, by asking how you can best provide what the user needs or desires. Break the process into several steps.
Identify Your App’s Value
People typically use a voice interface because it allows hands-free operation. Identify situations where your app will be better than using a cell phone or desktop device. If they use it at home or in the car, what other factors might impact use? Study what competitors offer and ask what will make your app better.
Determine the App’s Personality
When Mark Zuckerberg designed an AI to manage his home, he named it Jarvis and used Morgan Freeman’s voice for the interface. That implies his app has Tony Stark’s resourcefulness and Freeman’s soothing, powerful tone. The tone you choose for your app can communicate that your brand is trendy, studious, or laid back. Select one that closely aligns with your brand.
Offer Core Capabilities
What do you want your app to be able to do? Connect each capability to the value customers receive from your app. Glad makes trash bags and food storage solutions, so its app helps users locate leftovers and remember to empty the trash.
Compile Conversation Flows
This is where you teach your app how to respond to users and help them reach their goals. Give users clear information on what they can do so they respond with options the interface recognizes. If conversations involve on-screen content, make them appear at the same time as voice prompts to avoid confusion. Make it easy to exit or start over if users find themselves in an unintended location. Keep exchanges as brief as possible to avoid misunderstandings.
Each conversational path could follow several variations. Alexa provides sample utterances and suggests including as many variations as possible to help the interface determine user intent.
Refine and Test Conversation Flows
What sounds natural in print can sound artificial or ridiculous when read aloud. Role-play the dialogue with a user or use prototyping software to test conversation flows before creating your app.
The templates are simple to use; it’s the interaction that’s incredibly complex. Design for situations including incoherent speech and muddled questions to give the user tools to clarify their requests. Reduce friction through user-centered design to add voice interface that enhances each experience.
This post was written by Stephen Moyers, an online marketer, designer and avid tech-savvy blogger. He is associated with Los Angeles web design company SPINX Digital. He loves to write about web design, online marketing, entrepreneurship and much more. Apart from writing, he loves traveling & photography. Follow Stephen on Twitter & Google+.