Voice commands are not only for assistants like Google or Alexa. They can also be added to your mobile and desktop apps, offering both extra functionality and even fun for your end users. And adding voice commands or voice search to your apps can be very easy to implement. In this article, we’ll use the Web Speech API to build a voice controlled book search application.
The complete code for what we’ll build is available on GitHub. And for the impatient, there’s a working demo of what we’ll build at the end of the article.
Introduction to the Web Speech API
Before we get started, it’s important to note that the Web Speech API currently has limited browser support. To follow along with this article, you’ll need to use a supported browser.
Data on support for the mdn-api__SpeechRecognition feature across the major browsers
First, let’s see how easy it is to get the Web Speech API up and running. (You might also like to read SitePoint’s introduction to the Web Speech API and check out some other experiments with the Web Speech API.) To start using the Speech API, we just need to instantiate a new
SpeechRecognition class to allow us to listen to the user’s voice:
const SpeechRecognition = webkitSpeechRecognition; const speech = new SpeechRecognition(); speech.onresult = event => console.log(event); ; speech.start();
We start by creating a
SpeechRecognition constant, which is equal to the global browser vendor prefix
webkitSpeechRecognition. After this, we can then create a speech variable that will be the new instance of our
SpeechRecognition class. This will allow us to start listening to the user’s speech. To be able to handle the results from a user’s voice, we need to create an event listener that will be triggered when the user stops speaking. Finally, we call the
start function on our class instance.
When running this code for the first time, the user will be prompted to allow access to the mic. This a security check that the browser puts in place to prevent unwanted snooping. Once the user has accepted, they can start speaking, and they won’t be asked for permission again on that domain. After the user has stopped speaking, the
onresult event handler function will be triggered.
onresult event is passed a
SpeechRecognitionEvent object, which is made up of a
SpeechRecognitionResultList results array. The
SpeechRecognitionResultList object contains
SpeechRecognitionResult objects. The first item in the array returns a
SpeechRecognitionResult object, which contains a further array. The first item in this array contains the transcript of what the user had spoken.
Using Web Speech in React
Using what we’ve already learned, it’s a simple process to add the Web Speech API to a React application. The only issue we have to deal with is the React component lifecycle. First, let’s create a new project with Create React App, following its getting start guide. This assumes that Node is installed on your machine:
npx create-react-app book-voice-search cd book-voice-search npm start
Next, we replace the
App file with the code below to define a basic React component. Then we can add some speech logic to it:
// App.js import React from 'react'; const App = () => return ( <div> Example component </div> ); ; export default App;
This simple component renders a div with some text inside it. Now we can start adding our speech logic to the component. We want to build a component that creates the speech instance, then uses this inside the React lifecycle. When the React component renders for the first time, we want to create the speech instance, start listening to results, and provide the user a way to start the speech recognition. We first need to import some React hooks (you can learn more about the core React hooks here), some CSS styles, and a mic image for our user to click:
// App.js import useState, useEffect from "react"; import "./index.css"; import Mic from "./microphone-black-shape.svg";
After this, we’ll create our speech instance. We can use what we learned earlier when looking at the basics of the Web Speech API. We have to make a few changes to the original code we pasted into the browser developer tools. Firstly, we make the code more robust by adding browser support detection. We can do this by checking if the
webkitSpeechRecognition class exists on the window object. This will tell us if the browser knows of the API we want to use.
Then we change the
continuous setting to true. This configures the speech recognition API to keep listening. In our very first example, this was defaulted to false and meant that when the user stopped speaking, the
onresult event handler would trigger. But as we’re allowing the user to control when they want the site to stop listening, we use
continuous to allow the user to talk for as long as they want:
// App.js let speech; if (window.webkitSpeechRecognition) // eslint-disable-next-line const SpeechRecognition = webkitSpeechRecognition; speech = new SpeechRecognition(); speech.continuous = true; else speech = null; const App = () => ... ;
Now that we’ve set up the speech recognition code, we can start to use this inside the React component. As we saw before, we imported two React hooks — the
useEffect hooks. These will allow us to add the
onresult event listener and store the user transcript to state so we can display it on the UI:
// App.js const App = () => const [isListening, setIsListening] = useState(false); const [text, setText] = useState(""); const listen = () => setIsListening(!isListening); if (isListening) speech.stop(); else speech.start(); ; useEffect(() => //handle if the browser does not support the Speech API if (!speech) return; speech.onresult = event => setText(event.results[event.results.length - 1].transcript); ; , ); return ( <> <div className="app"> <h2>Book Voice Search</h2> <h3>Click the Mic and say an author's name</h3> <div> <img className=`microphone $isListening && "isListening"` src=Mic alt="microphone" onClick=listen /> </div> <p>text</p> </div> </> ); export default App;
In our component, we firstly declare two state variables — one to hold the transcript text from the user’s speech, and one to determine if our application is listening to the user. We call the React
useState hook, passing the default value of
isListening and an empty string for text. These values will be updated later in the component based on the user’s interactions.
After we set up our state, we create a function that will be triggered when the user clicks the mic image. This checks if the application is currently listening. If it is, we stop the speech recognition; otherwise, we start it. This function is later added to the
onclick for the mic image.
We then need to add our event listener to capture results from the user. We only need to create this event listener once, and we only need it when the UI has rendered. So we can use a
useEffect hook to capture when the component has mounted and create our
onresult event. We also pass an empty array to the
useEffect function so that it will only run once.
Finally, we can render out the UI elements needed to allow the user to start talking and see the text results.
Adding Voice Search to a React Application