{"id":1353,"date":"2020-11-24T09:01:23","date_gmt":"2020-11-24T09:01:23","guid":{"rendered":"https:\/\/thenextweb.com\/?p=1329163"},"modified":"2020-11-24T09:01:23","modified_gmt":"2020-11-24T09:01:23","slug":"the-secret-to-powering-web-apps-with-full-speech-recognition","status":"publish","type":"post","link":"https:\/\/www.londonchiropracter.com\/?p=1353","title":{"rendered":"The secret to powering web apps with full speech recognition"},"content":{"rendered":"\n<p>A few months ago, I wrote an article on<span>&nbsp;<\/span><a href=\"https:\/\/livecodestream.dev\/post\/2020-06-23-speech-recognition-with-tensorflowjs\/\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">web speech recognition using TensorflowJS<\/a>. Even though it was super interesting to implement, it was cumbersome for many of you to extend. The reason was pretty simple: it required a deep learning model to be trained if you wanted to detect more words than the model I provided, which was pretty basic.<\/p>\n<p>For those of you who needed a more practical approach, that article wasn\u2019t enough. Following your requests, I\u2019m writing today about how you can bring<span>&nbsp;<\/span><strong>full speech recognition<\/strong>&nbsp;to your web applications using the Web Speech API.<\/p>\n<p>But before we address the actual implementation, let\u2019s understand some scenarios where this functionality may be helpful:<\/p>\n<ul>\n<li>Building an application for situations where it is not possible to use a keyboard or touch devices. For example, people working in the field of special globes make interactions with input devices hard.<\/li>\n<li>To support people with disabilities.<\/li>\n<li>Because it\u2019s awesome!<\/li>\n<\/ul>\n<h2 id=\"whats-the-secret-to-powering-web-apps-with-speech-recognition\">What\u2019s the secret to powering web apps with speech recognition?<\/h2>\n<p>The secret is Chrome (or Chromium)<span>&nbsp;<\/span><a href=\"https:\/\/developer.mozilla.org\/en-US\/docs\/Web\/API\/SpeechRecognition\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">Web Speech API<\/a><span>&nbsp;<\/span>. This API, which works with Chromium-based browsers, is fantastic and does all the heavy work for us, leaving us to only care about building better interfaces using voice.<\/p>\n<p>However incredible this API is, as of Nov 2020, it is not widely supported, and that can be an issue depending on your requirements. Here is the current support status.&nbsp;Additionally, it only works&nbsp;online, so you will need a different setup if you are offline.<\/p>\n<figure class data-src=\"\/post\/2020-11-22-how-to-control-your-react-app-with-your-voice\/caniuse_hu5f4e2fecd951ac02893a75bfb74d10fa_209197_700x0_resize_box_2.png\" readability=\"2\">\n<p><figure class=\"post-image post-mediaBleed aligncenter\"><img decoding=\"async\" loading=\"lazy\" class=\"lazy loaded lazy\" src=\"https:\/\/livecodestream.dev\/post\/2020-11-22-how-to-control-your-react-app-with-your-voice\/caniuse_hu5f4e2fecd951ac02893a75bfb74d10fa_209197_700x0_resize_box_2.png\" alt width=\"700\" height=\"185\" data-lazy=\"true\"><figcaption><a href=\"https:\/\/thenextweb.com\/syndication\/2020\/11\/24\/the-secret-to-powering-web-apps-with-full-speech-recognition\/#\" data-url=\"https:\/\/twitter.com\/intent\/tweet?url=https%3A%2F%2Fthenextweb.com%2Fsyndication%2F2020%2F11%2F24%2Fthe-secret-to-powering-web-apps-with-full-speech-recognition%2F&amp;via=thenextweb&amp;related=thenextweb&amp;text=Check out this picture on: Can I use it? Chrome speech recognition API\" data-title=\"Share Can I use it? Chrome speech recognition API on Twitter\" data-width=\"685\" data-height=\"500\" class=\"post-image-share popitup\" title=\"Share Can I use it? Chrome speech recognition API on Twitter\"><i class=\"icon icon--inline icon--twitter--dark\"><\/i><\/a>Can I use it? Chrome speech recognition API<\/figcaption><\/figure>\n<\/p>\n<\/figure>\n<p>Naturally, this API is available through JavaScript, and it is not unique or restricted to React. Nonetheless, there\u2019s a great<span>&nbsp;<\/span><a href=\"https:\/\/github.com\/JamesBrill\/react-speech-recognition#readme\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">React library<\/a><span>&nbsp;<\/span>that simplifies the API even more, and it\u2019s what we are going to use today.<\/p>\n<p>Feel free to read the documentation of the Speech Recognition API on<span>&nbsp;<\/span><a href=\"https:\/\/developer.mozilla.org\/en-US\/docs\/Web\/API\/SpeechRecognition\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">MDN Docs<\/a><span>&nbsp;<\/span>if you want to do your implementation on vanilla JS or any other framework.<\/p>\n<p><em>[Read:&nbsp;<a class=\"c-link c-message_attachment__title_link\" href=\"https:\/\/thenextweb.com\/dd\/2020\/11\/09\/heres-how-to-make-your-website-more-accessible\/\" target=\"_blank\" rel=\"noreferrer noopener\" data-qa=\"message_attachment_title_link\"><span dir=\"auto\">Here\u2019s how to make your website more accessible<\/span><\/a>]<\/em><\/p>\n<h2 id=\"hello-world-im-transcribing\">Hello world, I\u2019m transcribing<\/h2>\n<p>We will start with the basics, and we will build a Hello World app that will transcribe in <span>real-time<\/span>&nbsp;what the user is saying. Before doing all the good stuff, we need a good working base, so let\u2019s start setting up our project. For simplicity, we will use create-react-app to set up our project.<\/p>\n<p><span><\/p>\n<figure class=\"post-image post-mediaBleed alignnone\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-1329164 lazy\" src=\"https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2020\/11\/Screenshot-2020-11-24-at-09.39.41.png\" alt width=\"791\" height=\"284\" sizes=\"(max-width: 791px) 100vw, 791px\" data-lazy=\"true\" data-srcset=\"https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2020\/11\/Screenshot-2020-11-24-at-09.39.41.png 1052w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2020\/11\/Screenshot-2020-11-24-at-09.39.41-280x101.png 280w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2020\/11\/Screenshot-2020-11-24-at-09.39.41-540x194.png 540w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2020\/11\/Screenshot-2020-11-24-at-09.39.41-270x97.png 270w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2020\/11\/Screenshot-2020-11-24-at-09.39.41-796x286.png 796w\"><\/figure>\n<p><\/span><\/p>\n<p>Next, we will work on the file&nbsp;<code>App.js<\/code>. CRA (create-react-app) creates a good starting point for us. Just kidding, we won\u2019t need any of it, so start with a blank<span>&nbsp;<\/span><code>App.js<\/code><span>&nbsp;<\/span>file and code with me.<\/p>\n<p>Before we can do anything, we need the<span>&nbsp;<\/span><code>imports<\/code>:<\/p>\n<p><span><\/p>\n<figure class=\"post-image post-mediaBleed alignnone\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-1329165 lazy\" src=\"https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2020\/11\/Screenshot-2020-11-24-at-09.40.28.png\" alt width=\"795\" height=\"886\" sizes=\"(max-width: 795px) 100vw, 795px\" data-lazy=\"true\" data-srcset=\"https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2020\/11\/Screenshot-2020-11-24-at-09.40.28.png 1060w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2020\/11\/Screenshot-2020-11-24-at-09.40.28-188x210.png 188w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2020\/11\/Screenshot-2020-11-24-at-09.40.28-242x270.png 242w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2020\/11\/Screenshot-2020-11-24-at-09.40.28-121x135.png 121w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2020\/11\/Screenshot-2020-11-24-at-09.40.28-796x888.png 796w\"><\/figure>\n<p><\/span><\/p>\n<p>Pretty easy, right? Let\u2019s see in detail what we are doing, starting with the<span>&nbsp;<\/span><a href=\"https:\/\/github.com\/JamesBrill\/react-speech-recognition\/blob\/master\/docs\/API.md#useSpeechRecognition\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">useSpeechRecognition<\/a><span>&nbsp;<\/span>hook.<\/p>\n<p>This hook is responsible for capturing the results of the speech recognition process. It\u2019s our gateway to producing the desire results. In its simplest form, we can extract the<span>&nbsp;<\/span><code>transcript<\/code><span>&nbsp;<\/span>of what the user is saying when the microphone is enabled as we do here:<\/p>\n<p><span><\/p>\n<figure class=\"post-image post-mediaBleed alignnone\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-1329166 lazy\" src=\"https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2020\/11\/Screenshot-2020-11-24-at-09.41.33.png\" alt width=\"813\" height=\"119\" sizes=\"(max-width: 813px) 100vw, 813px\" data-lazy=\"true\" data-srcset=\"https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2020\/11\/Screenshot-2020-11-24-at-09.41.33.png 1050w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2020\/11\/Screenshot-2020-11-24-at-09.41.33-280x41.png 280w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2020\/11\/Screenshot-2020-11-24-at-09.41.33-540x79.png 540w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2020\/11\/Screenshot-2020-11-24-at-09.41.33-270x40.png 270w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2020\/11\/Screenshot-2020-11-24-at-09.41.33-796x117.png 796w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2020\/11\/Screenshot-2020-11-24-at-09.41.33-1044x154.png 1044w\"><\/figure>\n<p><\/span><\/p>\n<p>Even when we activate the hook, we don\u2019t start immediately listening; for that, we need to interact with the object<span>&nbsp;<\/span><a href=\"https:\/\/github.com\/JamesBrill\/react-speech-recognition\/blob\/master\/docs\/API.md#SpeechRecognition\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">SpeechRecognition<\/a><span>&nbsp;<\/span>that we imported at the beginning. This object exposes a series of methods that will help us control the speech recognition API, methods to start listening on the microphone, stop, change languages, etc.<\/p>\n<p>Our interface simply exposes two buttons for controlling the microphone status; if you copied the provided code, your interface should look and behave like this:<\/p>\n<div id=\"demo-voice-recognition\" class=\"callout\" readability=\"7\">\n<div readability=\"9\">\n<p><strong>Hello world!<\/strong><\/p>\n<p>Start listening for transcript:<\/p>\n<p><button>Start listening<\/button>&nbsp;<button>Stop listening<\/button>&nbsp;<button>Reset<\/button><\/p>\n<\/div>\n<\/div>\n<p>If you tried the demo application, you might have noticed that you had missing words if you perhaps paused after listening. This is because the library by default sets this behavior, but you can change it by setting the parameter<span>&nbsp;<\/span><code>continuous<\/code><span>&nbsp;<\/span>on the<span>&nbsp;<\/span><code>startListening<\/code><span>&nbsp;<\/span>method, like this:<\/p>\n<p><span><\/p>\n<figure class=\"post-image post-mediaBleed alignnone\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-1329167 lazy\" src=\"https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2020\/11\/Screenshot-2020-11-24-at-09.42.06.png\" alt width=\"790\" height=\"119\" sizes=\"(max-width: 790px) 100vw, 790px\" data-lazy=\"true\" data-srcset=\"https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2020\/11\/Screenshot-2020-11-24-at-09.42.06.png 1062w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2020\/11\/Screenshot-2020-11-24-at-09.42.06-280x42.png 280w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2020\/11\/Screenshot-2020-11-24-at-09.42.06-540x81.png 540w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2020\/11\/Screenshot-2020-11-24-at-09.42.06-270x41.png 270w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2020\/11\/Screenshot-2020-11-24-at-09.42.06-796x120.png 796w\"><\/figure>\n<p><\/span><\/p>\n<h2 id=\"compatibility-detection\">Compatibility detection<\/h2>\n<p>Our app is nice! But what happens if your browser is not supported? Can we have a fallback behavior for those scenarios? Yes, we can. If you need to change your app\u2019s behavior based on whether the speech recognition API is supported or not,<span>&nbsp;<\/span><code>react-speech-recognition<\/code><span>&nbsp;<\/span>has a method for exactly this purpose. Here is an example:<\/p>\n<p><span><\/p>\n<figure class=\"post-image post-mediaBleed alignnone\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-1329168 lazy\" src=\"https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2020\/11\/Screenshot-2020-11-24-at-09.42.11.png\" alt width=\"802\" height=\"261\" sizes=\"(max-width: 802px) 100vw, 802px\" data-lazy=\"true\" data-srcset=\"https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2020\/11\/Screenshot-2020-11-24-at-09.42.11.png 1082w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2020\/11\/Screenshot-2020-11-24-at-09.42.11-280x91.png 280w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2020\/11\/Screenshot-2020-11-24-at-09.42.11-540x176.png 540w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2020\/11\/Screenshot-2020-11-24-at-09.42.11-270x88.png 270w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2020\/11\/Screenshot-2020-11-24-at-09.42.11-796x259.png 796w\"><\/figure>\n<p><\/span><\/p>\n<h2 id=\"detecting-commands\">Detecting commands<\/h2>\n<p>So far, we covered how to convert voice into text, but now we will take it one step further by recognizing pre-defined commands in our app. Building this functionality will make it possible for us to build apps that can fully function by voice.<\/p>\n<p>If we need to build a command parser, it could be a lot of work, but thankfully, the speech recognition API already has a built-in command recognition functionality.<\/p>\n<p>To respond when the user says a particular phrase, you can pass in a list of commands to the<span>&nbsp;<\/span><code>useSpeechRecognition<\/code><span>&nbsp;<\/span>hook. Each command is an object with the following properties:<\/p>\n<ul>\n<li>\n<code>command<\/code>: This is a string or RegExp representing the phrase you want to listen for<\/li>\n<li>\n<code>callback<\/code>: The function that is executed when the command is spoken. The last argument that this function receives will always be an object containing the following properties:<\/li>\n<li>\n<code>resetTranscript<\/code>: A function that sets the transcript to an empty string<\/li>\n<li>\n<code>matchInterim<\/code>: Boolean that determines whether \u201cinterim\u201d results should be matched against the command. This will make your component respond faster to commands, but also makes false positives more likely \u2013 i.e. the command may be detected when it is not spoken. This is false by default and should only be set for simple commands.<\/li>\n<li>\n<code>isFuzzyMatch<\/code>: Boolean that determines whether the comparison between speech and command is based on similarity rather than an exact match. Fuzzy matching is useful for commands that are easy to mispronounce or be misinterpreted by the Speech Recognition engine (e.g. names of places, sports teams, restaurant menu items). It is intended for commands that are string literals without special characters. If command is a string with special characters or a RegExp, it will be converted to a string without special characters when fuzzy matching. The similarity that is needed to match the command can be configured with fuzzyMatchingThreshold. isFuzzyMatch is false by default. When it is set to true, it will pass four arguments to callback:<\/p>\n<ul>\n<li>The value of<span>&nbsp;<\/span><code>command<\/code>\n<\/li>\n<li>The speech that matched command<\/li>\n<\/ul>\n<\/li>\n<li>The similarity between command and the speech<\/li>\n<li>The object mentioned in the callback description above<\/li>\n<li>\n<code>fuzzyMatchingThreshold<\/code>: If the similarity of speech to command is higher than this value when isFuzzyMatch is turned on, the callback will be invoked. You should set this only if isFuzzyMatch is true. It takes values between 0 (will match anything) and 1 (needs an exact match). The default value is 0.8.<\/li>\n<\/ul>\n<p>Here is an example of how to pre-define commands for your application:<\/p>\n<p><span><\/p>\n<figure class=\"post-image post-mediaBleed alignnone\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-1329169 lazy\" src=\"https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2020\/11\/Screenshot-2020-11-24-at-09.43.03.png\" alt width=\"785\" height=\"848\" sizes=\"(max-width: 785px) 100vw, 785px\" data-lazy=\"true\" data-srcset=\"https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2020\/11\/Screenshot-2020-11-24-at-09.43.03.png 1060w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2020\/11\/Screenshot-2020-11-24-at-09.43.03-195x210.png 195w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2020\/11\/Screenshot-2020-11-24-at-09.43.03-250x270.png 250w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2020\/11\/Screenshot-2020-11-24-at-09.43.03-125x135.png 125w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2020\/11\/Screenshot-2020-11-24-at-09.43.03-796x859.png 796w\"><\/figure>\n<p><\/span><\/p>\n<h2 id=\"conclusion\">Conclusion<\/h2>\n<p>Thanks to Chrome\u2019s speech recognition APIs, building voice-activated apps couldn\u2019t be easier and more fun. Hopefully, in the near future, we\u2019ll see this API supported by more browsers and with offline capabilities. Then, it will become a very powerful API that may change the way we build the web.<\/p>\n<p><i><span>This <\/span><\/i><a href=\"https:\/\/livecodestream.dev\/post\/2020-11-22-how-to-control-your-react-app-with-your-voice\/\" target=\"_blank\" rel=\"nofollow noopener noreferrer\"><i><span>article<\/span><\/i><\/a><i><span> was originally published on <\/span><\/i><a href=\"https:\/\/livecodestream.dev\/\" target=\"_blank\" rel=\"nofollow noopener noreferrer\"><i><span>Live Code Stream<\/span><\/i><\/a><i><span> by <\/span><\/i><a href=\"https:\/\/www.linkedin.com\/in\/bajcmartinez\/\" target=\"_blank\" rel=\"nofollow noopener noreferrer\"><i><span>Juan Cruz Martinez<\/span><\/i><\/a><i><span> (twitter: <\/span><\/i><a href=\"https:\/\/twitter.com\/bajcmartinez\" target=\"_blank\" rel=\"nofollow noopener noreferrer\"><i><span>@bajcmartinez<\/span><\/i><\/a><i><span>), founder and publisher of Live Code Stream, entrepreneur, developer, author, speaker, and doer of things.<\/span><\/i><\/p>\n<p><a href=\"https:\/\/livecodestream.dev\/subscribe\" target=\"_blank\" rel=\"nofollow noopener noreferrer\"><i><span>Live Code Stream<\/span><\/i><\/a><i><span> is also available as a free weekly newsletter. Sign up for updates on everything related to programming, AI, and computer science in general.<\/span><\/i><\/p>\n<p> <a href=\"https:\/\/thenextweb.com\/syndication\/2020\/11\/24\/the-secret-to-powering-web-apps-with-full-speech-recognition\/\">Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>A few months ago, I wrote an article on&nbsp;web speech recognition using TensorflowJS. Even though it was super interesting to implement, it was cumbersome for many of you to extend. The reason&#8230;<\/p>\n","protected":false},"author":1,"featured_media":1354,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[1],"tags":[],"_links":{"self":[{"href":"https:\/\/www.londonchiropracter.com\/index.php?rest_route=\/wp\/v2\/posts\/1353"}],"collection":[{"href":"https:\/\/www.londonchiropracter.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.londonchiropracter.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.londonchiropracter.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.londonchiropracter.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1353"}],"version-history":[{"count":0,"href":"https:\/\/www.londonchiropracter.com\/index.php?rest_route=\/wp\/v2\/posts\/1353\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.londonchiropracter.com\/index.php?rest_route=\/wp\/v2\/media\/1354"}],"wp:attachment":[{"href":"https:\/\/www.londonchiropracter.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1353"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.londonchiropracter.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1353"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.londonchiropracter.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1353"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}