For those uninitiated to the Echo, its killer feature is that it’s always on and listening for the wake word — it’s the first truly dedicated vocal computing device. Every Echo owner knows it takes about 30 seconds to convert first-timers into believers.
“Alexa, turn on the lights,” I’ll say, using the Belkin WeMo integration. And in one command our apartment is illuminated.
Alexa, light of my life, fire of my loins.
The ease is jaw-dropping; it’s one of those magical tech moments. Pretty quickly it becomes obvious how voice will be the next computing interface — it’s the most effortless and human way to operate a computer.
In my four months with an Amazon Echo, I’ve noticed two interesting quirks changing how I interact with technology:
- Voice begets more voice — I’m increasingly outsourcing my tasks to Siri, even if the hit rate is lackluster.
- There’s a new psychology to interacting via voice — having to deliberately input alleviates a ton of mental clutter.
These two items have given me an idea of where voice is going in the short-term.
Voice begets more voice
Becoming “spoiled” by Alexa has transformed how I use my iPhone.
I realized that around 40% of my smartphone usage is for tasks that translate extremely well to voice: basic things like checking the weather, playing music, setting timers, or doing lightweight search like “when is the next Chicago Blackhawks game?” Even in its current gestation phase, voice tech can do all this well enough to keep me coming back. (The other 60% like checking tweets, emails, or watching video will still require a visual interface, for now.) And I can’t help but disagree with the Siri naysayers. It works fine enough for most of that 40% of tasks, even if the hit rate is worse.
But it took owning an Echo for me to realize that a ton of time is wasted performing that 40% chunk of tasks. Take calling a friend as an example. After months with Alexa-level immediacy, suddenly you feel like a Neanderthal using your fingers to unlock your phone (1), opening the Contacts app (2), scrolling to find a name (3), clicking through to the chosen contact (4), and selecting their phone number (5).
Instead, I just hold the button (1) and tell Siri, “call Friend X” (2). Way easier, even if you aren’t counting strokes.
I’m not alone in wanting to avoid the hassle of app interfaces. 12% say it’s their primary reason they use voice. And 30% say they use voice because it’s faster. See below:
Also, there are some great use cases for “hands-free” situations, which is the #1 reason for adoption. But in my experience it’s mostly limited to cooking. Sure, it’s difficult to overstate the utility of setting a cooking timer vocally, or being able to ask “how many cups in a quart?” when your hands are covered in olive oil. Getting dressed in the morning, too, it’s super useful to hear the weather. Certainly, telling Alexa to play “Eye of the Tiger” when doing a few push-ups helps the cause.
But, for better or worse, my phone is within arms reach the other 95% of the time, so the key to adoption is about outdoing mobile. Consequently, I think that speed/ease will be the biggest driver of adoption. Speed and ease begins with being able interpret commands, and Echo has a solid head start by having a Natural Language Processing system that actually works (mostly).
Perhaps the biggest value-add with the Echo is that it shipped with API integrations for developers. That meant right out-of-box we could run the Belkin WeMo integration, hook up a Spotify account, play verbal games of Jeopardy!, and and roll clips of NPR TuneIn whenever we ask “Alexa, what’s in the news?” To some extent, the failure of other voice tech like Siri is due to the fact it can’t do anything outside of the walled garden.
Any serious Echo competitor will need to push the boundary of what’s possible for that 40% tasks that translate well to voice.
The New Psychology of Voice
One unforeseen benefit of voice tech is it allows you to stay in the moment.That’s because it’s totally execution-focused — there’s no discovery feature, no autoscroll, and no promoted content, which is the main hook for most of the apps in our lives.
Everyone loves knocking smartphones as making us dumber (a point with which I disagree). But I think there’s no denying that smartphones give way to the paradox of choice: the more options given, the more anxiety we feel. Too many siloed apps, too little time spent on the things that matter.
Having to think deliberately — and then verbally — about your command simply feels cleaner and more efficient. There’s less soul-suck, less of that Pavlovian compulsion to check and re-check feeds. With voice, I find I have more mental bandwidth, in large part because I’m not getting sucked into app interfaces. Part of this could be because we’re still in the “dumb” stage of voice tech, where devices only work upon user command. Down the line I imagine we’ll want our voice assistants to interrupt us.
The contradictory part here is that voice, indeed, makes you lazier and perhaps more impatient. Once you get a taste of voice, using any interface feels like a total drag. But you’re able to quickly move on with your life.
A question going forward: how much interruption will we accept into our lives?
Looking Forward to Voice 2.0
Voice as a platform will naturally evolve. The language processing will improve. Killer apps will get made. Making the Echo (and its soon-to-be released competitors) do more tricks is the easy part.
The difficult thing will be crowding out the touch interface. I can already imagine the next Echo moving into mixed-media, either by projecting on the wall or taking control of the TV. Once that happens, there isn’t much in the way of scrolling through tweets, emails, and Snapchats — things we never thought voice could do.
Right now it’s all centered around the smart home, or letting Alexa be the universal remote for your lights and your lightweight Google searches.
A natural application will be in the dashboard of connected cars. For radio, GPS, and hands-free Google queries, voice would be immensely useful. By law, most cars still require drivers to be idle when inputting the GPS.
And as we move beyond the smartphone computing cycle, voice will likely proliferate as the main user interface for wearables and IoT tech. Consumer wearables need to do more than just count our steps to be worthy of staying on our wrists.
Finally, the effect on customer support industry will be great. Currently we’re converging upon “consolidation technologies” — things like AI chatbots that allow for multi-functionality through a single point-of-access (such as talking to a virtual bank teller through an app). Future virtual assistants could free up a lot of mental bandwidth for the average person, especially once we trust them with complicated tasks like filing tax returns or bulk purchasing.
All of this amounts to a very exciting next few years. As others have theorized, the Echo’s long-run plan could normalizing voice into the public psyche, and in effect, programming people to interact with a dedicated voice assistant. In just a few short months, I know I’ve become totally programmed.