Londonchiropracter.com

This domain is available to be leased

Menu
Menu

The secret to powering web apps with full speech recognition

Posted on November 24, 2020 by admin

A few months ago, I wrote an article on web speech recognition using TensorflowJS. Even though it was super interesting to implement, it was cumbersome for many of you to extend. The reason was pretty simple: it required a deep learning model to be trained if you wanted to detect more words than the model I provided, which was pretty basic.

For those of you who needed a more practical approach, that article wasn’t enough. Following your requests, I’m writing today about how you can bring full speech recognition to your web applications using the Web Speech API.

But before we address the actual implementation, let’s understand some scenarios where this functionality may be helpful:

  • Building an application for situations where it is not possible to use a keyboard or touch devices. For example, people working in the field of special globes make interactions with input devices hard.
  • To support people with disabilities.
  • Because it’s awesome!

What’s the secret to powering web apps with speech recognition?

The secret is Chrome (or Chromium) Web Speech API . This API, which works with Chromium-based browsers, is fantastic and does all the heavy work for us, leaving us to only care about building better interfaces using voice.

However incredible this API is, as of Nov 2020, it is not widely supported, and that can be an issue depending on your requirements. Here is the current support status. Additionally, it only works online, so you will need a different setup if you are offline.

Can I use it? Chrome speech recognition API

Naturally, this API is available through JavaScript, and it is not unique or restricted to React. Nonetheless, there’s a great React library that simplifies the API even more, and it’s what we are going to use today.

Feel free to read the documentation of the Speech Recognition API on MDN Docs if you want to do your implementation on vanilla JS or any other framework.

[Read: Here’s how to make your website more accessible]

Hello world, I’m transcribing

We will start with the basics, and we will build a Hello World app that will transcribe in real-time what the user is saying. Before doing all the good stuff, we need a good working base, so let’s start setting up our project. For simplicity, we will use create-react-app to set up our project.

Next, we will work on the file App.js. CRA (create-react-app) creates a good starting point for us. Just kidding, we won’t need any of it, so start with a blank App.js file and code with me.

Before we can do anything, we need the imports:

Pretty easy, right? Let’s see in detail what we are doing, starting with the useSpeechRecognition hook.

This hook is responsible for capturing the results of the speech recognition process. It’s our gateway to producing the desire results. In its simplest form, we can extract the transcript of what the user is saying when the microphone is enabled as we do here:

Even when we activate the hook, we don’t start immediately listening; for that, we need to interact with the object SpeechRecognition that we imported at the beginning. This object exposes a series of methods that will help us control the speech recognition API, methods to start listening on the microphone, stop, change languages, etc.

Our interface simply exposes two buttons for controlling the microphone status; if you copied the provided code, your interface should look and behave like this:

Hello world!

Start listening for transcript:

  

If you tried the demo application, you might have noticed that you had missing words if you perhaps paused after listening. This is because the library by default sets this behavior, but you can change it by setting the parameter continuous on the startListening method, like this:

Compatibility detection

Our app is nice! But what happens if your browser is not supported? Can we have a fallback behavior for those scenarios? Yes, we can. If you need to change your app’s behavior based on whether the speech recognition API is supported or not, react-speech-recognition has a method for exactly this purpose. Here is an example:

Detecting commands

So far, we covered how to convert voice into text, but now we will take it one step further by recognizing pre-defined commands in our app. Building this functionality will make it possible for us to build apps that can fully function by voice.

If we need to build a command parser, it could be a lot of work, but thankfully, the speech recognition API already has a built-in command recognition functionality.

To respond when the user says a particular phrase, you can pass in a list of commands to the useSpeechRecognition hook. Each command is an object with the following properties:

  • command: This is a string or RegExp representing the phrase you want to listen for
  • callback: The function that is executed when the command is spoken. The last argument that this function receives will always be an object containing the following properties:
  • resetTranscript: A function that sets the transcript to an empty string
  • matchInterim: Boolean that determines whether “interim” results should be matched against the command. This will make your component respond faster to commands, but also makes false positives more likely – i.e. the command may be detected when it is not spoken. This is false by default and should only be set for simple commands.
  • isFuzzyMatch: Boolean that determines whether the comparison between speech and command is based on similarity rather than an exact match. Fuzzy matching is useful for commands that are easy to mispronounce or be misinterpreted by the Speech Recognition engine (e.g. names of places, sports teams, restaurant menu items). It is intended for commands that are string literals without special characters. If command is a string with special characters or a RegExp, it will be converted to a string without special characters when fuzzy matching. The similarity that is needed to match the command can be configured with fuzzyMatchingThreshold. isFuzzyMatch is false by default. When it is set to true, it will pass four arguments to callback:

    • The value of command
    • The speech that matched command
  • The similarity between command and the speech
  • The object mentioned in the callback description above
  • fuzzyMatchingThreshold: If the similarity of speech to command is higher than this value when isFuzzyMatch is turned on, the callback will be invoked. You should set this only if isFuzzyMatch is true. It takes values between 0 (will match anything) and 1 (needs an exact match). The default value is 0.8.

Here is an example of how to pre-define commands for your application:

Conclusion

Thanks to Chrome’s speech recognition APIs, building voice-activated apps couldn’t be easier and more fun. Hopefully, in the near future, we’ll see this API supported by more browsers and with offline capabilities. Then, it will become a very powerful API that may change the way we build the web.

This article was originally published on Live Code Stream by Juan Cruz Martinez (twitter: @bajcmartinez), founder and publisher of Live Code Stream, entrepreneur, developer, author, speaker, and doer of things.

Live Code Stream is also available as a free weekly newsletter. Sign up for updates on everything related to programming, AI, and computer science in general.

Source

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Recent Posts

  • Trump says Anthropic Pentagon deal is ‘possible’, weeks after blacklisting the company as a national security risk
  • Samsung and IKEA just made the $6 smart home real, and your TV is already the hub
  • OpenAI recruits Cognizant and CGI to take Codex into enterprise software shops worldwide
  • Lovable left thousands of projects exposed for 48 days, and the vibe coding security crisis is only getting worse
  • Humble emerges from stealth with $24M and a cableless autonomous electric truck built to go dock-to-dock

Recent Comments

    Archives

    • April 2026
    • March 2026
    • February 2026
    • January 2026
    • December 2025
    • September 2025
    • August 2025
    • July 2025
    • June 2025
    • May 2025
    • April 2025
    • March 2025
    • February 2025
    • January 2025
    • December 2024
    • November 2024
    • October 2024
    • September 2024
    • August 2024
    • July 2024
    • June 2024
    • May 2024
    • April 2024
    • March 2024
    • February 2024
    • January 2024
    • December 2023
    • November 2023
    • October 2023
    • September 2023
    • August 2023
    • July 2023
    • June 2023
    • May 2023
    • April 2023
    • March 2023
    • February 2023
    • January 2023
    • December 2022
    • November 2022
    • October 2022
    • September 2022
    • August 2022
    • July 2022
    • June 2022
    • May 2022
    • April 2022
    • March 2022
    • February 2022
    • January 2022
    • December 2021
    • November 2021
    • October 2021
    • September 2021
    • August 2021
    • July 2021
    • June 2021
    • May 2021
    • April 2021
    • March 2021
    • February 2021
    • January 2021
    • December 2020
    • November 2020
    • October 2020

    Categories

    • Uncategorized

    Meta

    • Log in
    • Entries feed
    • Comments feed
    • WordPress.org
    ©2026 Londonchiropracter.com | Design: Newspaperly WordPress Theme