The Voice Design Guide can help brand managers and agencies understand the non technical elements involved in creating a voice AI experience. It’s based on our own experiences developing on IBM Watson and Amazon Alexa.
Voice Experience Strategy
Start your journey by asking yourself where voice AI can truly improve the customer experience. Don’t start with “solving a problem” unless the problem is the customer/user experience.
It’s tempting to look at how voice could be used to solve efficiency or scaling challenges and no doubt there are cases where it can. But then you’re solving for metrics instead of people.
Netflix, the iPhone, and SalesForce didn’t solve a problem; there were existing solutions that did what they do. Instead, each reduced friction with a better experience. Focus on the customer/user experience.
Voice. The voice of your brand can go 3 ways.
- Robot: Use the stock synthetic voice(s) provided by the platform.
- Human: A voice actor or person you know reads the dialog which then replaces the native voice assistant voice.
- Synthetic human: a person reads a training script that enables a computer to understand the base elements of that person’s voice. The computer then replicates the human voice to turn written dialog into your voice assistant’s voice.
Sound. Will you have an audio logo or a score? It’s a few musical notes or small jingle associated with your brand. Scoring is more elaborate sound/music that accompanies your voice experience.
Visual. Will there be a visual dimension to your voice experience? An avatar can give users a mental image for voice-only situations such as in the car or when using a smart speaker. It’s not mandatory but worth considering if your brand has a recognized avatar or you plan to personify your voice experience with an image.
Voice Search Optimization
Like SEO, VSO is positioning your voice assistant to be suggested when a user inquires about the product or service your organization provides. Google uses its search rankings while Alexa draws on the keywords/phrases that skill makers attach to their skill and on products and services within the Amazon ecosystem. Cortana uses Bing. Siri draws from a variety of sources. Be sure to understand and leverage how your voice platforms surface voice search results.
Conversation design has two components: the interaction model and the conversation itself.
The interaction model is how your voice assistant works and what it does. For a pizza ordering voicebot the essence of an interaction model might be “our model will gather information about the pizza and the transaction including the size, toppings, payment info, and customer name and delivery address.”
The conversation embodies the user context and dialog that’s used to gather the pizza information.
The user context is vitally important because it’s the foundation of your use case. It’s who, what, when, where, why and how. A voice experience that doesn’t sync with user context will fail.
The dialog can come from existing sources, such as phone transcripts, your dialog team, or even you if you are a team of one. Beta testing irons out any conversational friction from wordiness or bot pronunciation issues.
These are the main voice AI development platforms.
Google Assistant has the largest installed voice assistant base, being on a billion Android phones. Assistant uses Google’s search engine data to surface the brands it recommends; something to consider if you have strong search rankings. It’s also widely recognized as the most capable voice assistant.
Alexa is the smart speaker champ with about 100 million Echos in homes (Q1, 2019). Alexa is a voice-only experience while the other platforms facilitate chat too, i.e., users can type when privacy is desired.
Siri is currently a B2C outlier due to the need to link Siri to a corresponding App Store app. Siri is also bound by a limited number of use cases under which Apple allows development. If you meet those requirements adding voice to your iOS app gives users a native Siri voice experience; worth considering since Siri is on 500 million iPhones and iPads (recent-model Macs too).
Cortana is on 145 million Windows PCs. While not native to a smartphone or smart speaker, Cortana is integrated with Alexa. That makes it possible for a Windows PC user to ask Cortana to ask Alexa to do something (turn on the lights at home) and for Alexa users to ask Alexa to ask Cortana to create an Outlook calendar event, etc..
IBM Watson is a B2B suite of cognitive services. Watson is a popular white label choice and is used by enterprises that want total control of the user experience and functionality. Although it’s B2B, Watson isn’t any more difficult to develop on then the B2C platforms. Watson also has data privacy options that are unavailable on the B2C platforms.
Define business objects first, then examine the platforms in that context.
Expectations and KPIs
A key performance indicator (KPI) for a voice experience depends on the use case. If your voicebot is designed to handle sales inquiries then a good KPI might be voice assistant conversion rate versus conversion rates of a web form or email.
Today, voice is neither a mature technology nor a mature user experience. At this point in the voice experience your KPIs serve as a baseline to which future performance can be compared.
This is the era of exploring how and where voice AI can improve the CX/UX and business workflows. Maybe you’ll find a clear path to ROI today but uncertainty should not stop you from exploring. Voice AI is a pivotal technology of the magnitude that ecommerce was 20 years ago.
Today’s voice experience innovators are tomorrow’s market share leaders.