{"id":806,"date":"2020-10-30T08:25:02","date_gmt":"2020-10-30T08:25:02","guid":{"rendered":"https:\/\/thenextweb.com\/?p=1326119"},"modified":"2020-10-30T08:25:02","modified_gmt":"2020-10-30T08:25:02","slug":"how-nvidias-maxine-uses-ai-to-improve-video-calls","status":"publish","type":"post","link":"https:\/\/www.londonchiropracter.com\/?p=806","title":{"rendered":"How Nvidia\u2019s Maxine uses AI to improve video calls"},"content":{"rendered":"\n<p>One of the things that caught my eye at Nvidia\u2019s flagship event, the GPU Technology Conference (GTC), was Maxine, a platform that leverages artificial intelligence to improve the quality and experience of video-conferencing applications in real-time.<\/p>\n<p>Maxine used <a href=\"https:\/\/bdtechtalks.com\/2019\/02\/15\/what-is-deep-learning-neural-networks\/\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">deep learning<\/a> for resolution improvement, background noise reduction, video compression, face alignment, and real-time translation and transcription.<\/p>\n<p>In this post, which marks the first installation of our \u201cdeconstructing artificial intelligence\u201d series, we will take a look at how some of these features work and how they tie-in with AI research done at Nvidia. We\u2019ll also explore the pending issues and the possible business model for Nvidia\u2019s AI-powered video-conferencing platform.<\/p>\n<p><iframe loading=\"lazy\" src=\"https:\/\/www.youtube.com\/embed\/eFK7Iy8enqM\" width=\"560\" height=\"315\" frameborder=\"0\" allowfullscreen=\"allowfullscreen\">[embedded content]<\/iframe><\/p>\n<h2>Super-resolution with neural networks<\/h2>\n<p>The first feature shown in the Maxine presentation is \u201csuper resolution,\u201d which <a href=\"https:\/\/developer.nvidia.com\/maxine\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">according to Nvidia<\/a>, \u201ccan convert lower resolutions to higher resolution videos in real time.\u201d Super resolution enables video-conference callers to send lo-res video streams and have them upscaled at the server. This reduces the bandwidth requirement of video conference applications and can make their performance more stable in areas where network connectivity is not very stable.<\/p>\n<p><em>[Read: <span class=\"c-message_attachment__title\"><a class=\"c-link c-message_attachment__title_link\" href=\"https:\/\/thenextweb.com\/politics\/2020\/10\/16\/what-audience-intelligence-data-tells-us-about-the-2020-us-presidential-election\/\" target=\"_blank\" rel=\"noreferrer noopener\" data-qa=\"message_attachment_title_link\"><span dir=\"auto\">What audience intelligence data tells us about the 2020 US presidential election<\/span><\/a>]<\/span><\/em><\/p>\n<p>The big challenge of upscaling visual data is filling in the missing information. You have a limited array of pixels that represent an image, and you want to expand it to a larger canvas that contains many more pixels. How do you decide what color values those new pixels get?<\/p>\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" loading=\"lazy\" class=\"wp-image-8559 jetpack-lazy-image jetpack-lazy-image--handled\" src=\"https:\/\/i0.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/10\/upscaling.jpg?resize=696%2C432&amp;ssl=1\" sizes=\"(max-width: 696px) 100vw, 696px\" srcset=\"https:\/\/i0.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/10\/upscaling.jpg?resize=1024%2C636&amp;ssl=1 1024w, https:\/\/i0.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/10\/upscaling.jpg?resize=300%2C186&amp;ssl=1 300w, https:\/\/i0.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/10\/upscaling.jpg?resize=768%2C477&amp;ssl=1 768w, https:\/\/i0.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/10\/upscaling.jpg?resize=356%2C220&amp;ssl=1 356w, https:\/\/i0.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/10\/upscaling.jpg?resize=696%2C432&amp;ssl=1 696w, https:\/\/i0.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/10\/upscaling.jpg?resize=1068%2C663&amp;ssl=1 1068w, https:\/\/i0.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/10\/upscaling.jpg?resize=677%2C420&amp;ssl=1 677w, https:\/\/i0.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/10\/upscaling.jpg?w=1366&amp;ssl=1 1366w\" alt=\"upscaling\" width=\"696\" height=\"432\" data-attachment-id=\"8559\" data-permalink=\"https:\/\/bdtechtalks.com\/2020\/10\/19\/nvidia-maxine-ai-video-conferencing\/upscaling\/\" data-orig-file=\"https:\/\/i0.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/10\/upscaling.jpg?fit=1366%2C848&amp;ssl=1\" data-orig-size=\"1366,848\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;1&quot;}\" data-image-title=\"upscaling\" data-image-description data-medium-file=\"https:\/\/i0.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/10\/upscaling.jpg?fit=300%2C186&amp;ssl=1\" data-large-file=\"https:\/\/i0.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/10\/upscaling.jpg?fit=696%2C432&amp;ssl=1\" data-recalc-dims=\"1\" data-lazy-loaded=\"1\"><\/figure>\n<p>Old upscaling techniques use different interpolation methods (bicubic, lanczos, etc.) to fill the space between pixels. These techniques are too general and might provide mixed results in different types of images and backgrounds.<\/p>\n<p>One of the benefits of <a href=\"https:\/\/bdtechtalks.com\/2017\/08\/28\/artificial-intelligence-machine-learning-deep-learning\/\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">machine learning algorithms<\/a> is that they can be tuned to perform very specific tasks. For instance, a deep neural network can be trained on scaled-down video frames grabbed from video conference streams and their corresponding hi-res original images. With enough examples, the neural network will tune its parameters to the general features found in video-conference visual data (mostly faces) and will be able to provide a better low- to hi-res conversion than general-purpose upscaling algorithms. In general, the more narrow the domain, the better the chances of the neural network to converging on a very high accuracy performance.<\/p>\n<p>There\u2019s already a solid body of research on using <a href=\"https:\/\/bdtechtalks.com\/2019\/08\/05\/what-is-artificial-neural-network-ann\/\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">artificial neural networks<\/a> for upscaling visual data, including <a href=\"https:\/\/developer.download.nvidia.com\/assets\/gameworks\/downloads\/regular\/GDC17\/DeepLearning_MaterialsTextures_GDC17_FINAL.pdf?xLIIHDMriTGFZ9LxSH7BvPef2lyOYmzQWT_eS84zqUgBPBCk2YA9QlGN5RREerBKU3boYkj1DBxXbmSg_4ZKg6NnpZypvig2RXZqZiTjrHbHn1IwS3-kGifKcYyiELNbDJoB_weGlkTlsb1Jf75s7Cd7eOyq0ldWHtIfY07QRiTixrBOlRKD02sLfXarCT-N1b7Ovg-J\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">a 2017 Nvidia paper<\/a> that discusses general super resolution with deep neural networks. With video-conferencing being a very specialized case, a well-trained neural network is bound to perform even better than more general tasks. Aside from video conferencing, there are applications for this technology in other areas, such as the film industry, which uses deep learning to remaster old videos to higher quality.<\/p>\n<p><iframe loading=\"lazy\" src=\"https:\/\/www.youtube.com\/embed\/RhUmSeko1ZE\" width=\"560\" height=\"315\" frameborder=\"0\" allowfullscreen=\"allowfullscreen\">[embedded content]<\/iframe><\/p>\n<h2>Video compression with neural networks<\/h2>\n<p>One of the more interesting parts of the Maxine presentation was the AI video compression feature. The video posted on Nvidia\u2019s YouTube shows that using neural networks to compress video streams reduces bandwidth from ~97 KB\/frame to ~0.12 KB\/frame, which is a bit exaggerated, as <a href=\"https:\/\/www.reddit.com\/r\/MachineLearning\/comments\/j6n90c\/d_deconstructing_nvidia_maxine\/g7zwg2z\/?utm_source=reddit&amp;utm_medium=web2x&amp;context=3\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">users have pointed out on Reddit<\/a>. Nvidia\u2019s website <a href=\"https:\/\/developer.nvidia.com\/maxine?ncid=so-yout-79832#cid=dl13_so-yout_en-us\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">states<\/a> developers can reduce bandwidth use down to \u201cone-tenth of the bandwidth needed for the H.264 video compression standard,\u201d which is a much more reasonable\u2014and still impressive\u2014figure.<\/p>\n<p><iframe loading=\"lazy\" src=\"https:\/\/www.youtube.com\/embed\/NqmMnjJ6GEg\" width=\"560\" height=\"315\" frameborder=\"0\" allowfullscreen=\"allowfullscreen\">[embedded content]<\/iframe><\/p>\n<p>How does Nvidia\u2019s AI achieve such impressive compression rates? A <a href=\"https:\/\/blogs.nvidia.com\/blog\/2020\/10\/05\/gan-video-conferencing-maxine\/\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">blog post<\/a> on Nvidia\u2019s website provides more detail on how the technology works. A neural network extracts and encodes the locations of key facial features of the user for each frame, which is much more efficient than compressing pixel and color data. The encoded data is then passed on to a <a href=\"https:\/\/bdtechtalks.com\/2018\/05\/28\/generative-adversarial-networks-artificial-intelligence-ian-goodfellow\/\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">generative adversarial network<\/a> along with a reference video frame captured at the beginning of the session. The GAN is trained to reconstruct the new image by projecting the facial features onto the reference frame.<\/p>\n<figure class=\"wp-block-image size-large\" readability=\"3\">\n<p><figure class=\"post-image post-mediaBleed aligncenter\"><img decoding=\"async\" loading=\"lazy\" class=\"jetpack-lazy-image jetpack-lazy-image--handled wp-image-8560 lazy\" src=\"https:\/\/i2.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/10\/AI-video-compression.jpg?resize=696%2C364&amp;ssl=1\" sizes=\"(max-width: 696px) 100vw, 696px\" alt=\"AI video compression\" width=\"696\" height=\"364\" data-attachment-id=\"8560\" data-permalink=\"https:\/\/bdtechtalks.com\/2020\/10\/19\/nvidia-maxine-ai-video-conferencing\/ai-video-compression\/\" data-orig-file=\"https:\/\/i2.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/10\/AI-video-compression.jpg?fit=1680%2C877&amp;ssl=1\" data-orig-size=\"1680,877\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;1&quot;}\" data-image-title=\"AI video compression\" data-image-description data-medium-file=\"https:\/\/i2.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/10\/AI-video-compression.jpg?fit=300%2C157&amp;ssl=1\" data-large-file=\"https:\/\/i2.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/10\/AI-video-compression.jpg?fit=696%2C364&amp;ssl=1\" data-recalc-dims=\"1\" data-lazy-loaded=\"1\" data-lazy=\"true\" data-srcset=\"https:\/\/i2.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/10\/AI-video-compression.jpg?resize=1024%2C535&amp;ssl=1 1024w, https:\/\/i2.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/10\/AI-video-compression.jpg?resize=300%2C157&amp;ssl=1 300w, https:\/\/i2.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/10\/AI-video-compression.jpg?resize=768%2C401&amp;ssl=1 768w, https:\/\/i2.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/10\/AI-video-compression.jpg?resize=1536%2C802&amp;ssl=1 1536w, https:\/\/i2.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/10\/AI-video-compression.jpg?resize=696%2C363&amp;ssl=1 696w, https:\/\/i2.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/10\/AI-video-compression.jpg?resize=1068%2C558&amp;ssl=1 1068w, https:\/\/i2.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/10\/AI-video-compression.jpg?resize=805%2C420&amp;ssl=1 805w, https:\/\/i2.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/10\/AI-video-compression.jpg?w=1680&amp;ssl=1 1680w, https:\/\/i2.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/10\/AI-video-compression.jpg?w=1392&amp;ssl=1 1392w\"><figcaption><a href=\"https:\/\/thenextweb.com\/neural\/2020\/10\/30\/how-nvidias-maxine-uses-ai-to-improve-video-calls-syndication\/#\" data-url=\"https:\/\/twitter.com\/intent\/tweet?url=https%3A%2F%2Fthenextweb.com%2Fneural%2F2020%2F10%2F30%2Fhow-nvidias-maxine-uses-ai-to-improve-video-calls-syndication%2F&amp;via=thenextweb&amp;related=thenextweb&amp;text=Check out this picture on: Deep neural networks extract and encode key facial features. Generative adversarial networks than project those encodings on a reference frame with the user\u2019s face\" data-title=\"Share Deep neural networks extract and encode key facial features. Generative adversarial networks than project those encodings on a reference frame with the user\u2019s face on Twitter\" data-width=\"685\" data-height=\"500\" class=\"post-image-share popitup\" title=\"Share Deep neural networks extract and encode key facial features. Generative adversarial networks than project those encodings on a reference frame with the user\u2019s face on Twitter\"><i class=\"icon icon--inline icon--twitter--dark\"><\/i><\/a>Deep neural networks extract and encode key facial features. Generative adversarial networks than project those encodings on a reference frame with the user\u2019s face<\/figcaption><\/figure>\n<\/p>\n<\/figure>\n<p>The work builds up on <a href=\"https:\/\/arxiv.org\/abs\/1903.07291\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">previous GAN research<\/a> done at Nvidia, which mapped rough sketches to rich, detailed images and drawings.<\/p>\n<p>The AI video compression shows once again how narrow domains provide excellent settings for the use of deep learning algorithms.<\/p>\n<h2>Face realignment with deep learning<\/h2>\n<p>The face alignment feature readjusts the angle of users\u2019 faces to make it appear as if they\u2019re looking directly at the camera. This is a problem that is very common in video conferencing because people tend to look at the faces of others on the screen rather than gaze at the camera.<\/p>\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" loading=\"lazy\" class=\"wp-image-8562 jetpack-lazy-image jetpack-lazy-image--handled\" src=\"https:\/\/i1.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/10\/face-alignment.gif?resize=640%2C360&amp;ssl=1\" alt=\"NVidia AI face alignment\" width=\"640\" height=\"360\" data-attachment-id=\"8562\" data-permalink=\"https:\/\/bdtechtalks.com\/2020\/10\/19\/nvidia-maxine-ai-video-conferencing\/face-alignment\/\" data-orig-file=\"https:\/\/i1.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/10\/face-alignment.gif?fit=640%2C360&amp;ssl=1\" data-orig-size=\"640,360\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"NVidia AI face alignment\" data-image-description data-medium-file=\"https:\/\/i1.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/10\/face-alignment.gif?fit=300%2C169&amp;ssl=1\" data-large-file=\"https:\/\/i1.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/10\/face-alignment.gif?fit=640%2C360&amp;ssl=1\" data-recalc-dims=\"1\" data-lazy-loaded=\"1\"><\/figure>\n<p>Although there isn\u2019t much detail about how this works, the blog post mentions that they use GANs. It\u2019s not hard to see how this feature can be bundled with the AI compression\/decompression technology. Nvidia has already done extensive research on <a href=\"https:\/\/arxiv.org\/abs\/1709.01591\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">landmark detection and encoding<\/a>, including the extraction of facial features and gaze direction at different angles. The encodings can be fed to the same GAN that projects the facial features onto the reference image and let it do the rest.<\/p>\n<p><iframe loading=\"lazy\" src=\"https:\/\/www.youtube.com\/embed\/ZYx3jek0KCs\" width=\"560\" height=\"315\" frameborder=\"0\" allowfullscreen=\"allowfullscreen\">[embedded content]<\/iframe><\/p>\n<h2>Where does Maxine run its deep learning models?<\/h2>\n<p>There are a lot of other neat features in Maxine, including the integration with JARVIS, Nvidia\u2019s conversational AI platform. Getting into all of that would be beyond the scope of this article.<\/p>\n<p>But some technical issues remain to be resolved. For instance, one issue is how much of Maxine\u2019s functionalities will run on cloud servers and how much of it on user devices. In response to a query from <em>TechTalks<\/em>, a spokesperson for Nvidia said, \u201cNVIDIA Maxine is designed to execute the AI features in the cloud so that every user access them, regardless of the device they\u2019re using.\u201d<\/p>\n<p>This makes sense for some of the features such as super resolution, virtual background, auto-frame, and noise reduction. But it seems pointless for others. Take, for example, the AI video compression example. Ideally, the neural network doing the facial expression encoding must run on the sender\u2019s device, and the GAN that reconstruct the video frame must run on the receiver\u2019s device. If all these functions are being carried out on servers, there would be no bandwidth savings, because users would send and receive full frames instead of the much lighter facial expression encodings.<\/p>\n<p>Ideally, there should be some sort of configuration that allows users to choose the right balance between local and on-cloud AI inference to strike the right balance between network and compute availabilities. For instance, a user who has a workstation with a strong GPU card might want to run all deep learning models on their computer in exchange for lower bandwidth usage or cost savings. On the other hand, a user joining a conference from a mobile device with low processing power would forgo the local AI compression and defer virtual background and noise reduction to the Maxine server.<\/p>\n<h2>What is Maxine\u2019s business model?<\/h2>\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" loading=\"lazy\" class=\"wp-image-8564 jetpack-lazy-image jetpack-lazy-image--handled\" src=\"https:\/\/i0.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/10\/NVidia-AI-stack.jpg?resize=696%2C379&amp;ssl=1\" sizes=\"(max-width: 696px) 100vw, 696px\" srcset=\"https:\/\/i0.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/10\/NVidia-AI-stack.jpg?resize=1024%2C558&amp;ssl=1 1024w, https:\/\/i0.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/10\/NVidia-AI-stack.jpg?resize=300%2C164&amp;ssl=1 300w, https:\/\/i0.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/10\/NVidia-AI-stack.jpg?resize=768%2C419&amp;ssl=1 768w, https:\/\/i0.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/10\/NVidia-AI-stack.jpg?resize=1536%2C837&amp;ssl=1 1536w, https:\/\/i0.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/10\/NVidia-AI-stack.jpg?resize=2048%2C1117&amp;ssl=1 2048w, https:\/\/i0.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/10\/NVidia-AI-stack.jpg?resize=696%2C379&amp;ssl=1 696w, https:\/\/i0.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/10\/NVidia-AI-stack.jpg?resize=1068%2C582&amp;ssl=1 1068w, https:\/\/i0.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/10\/NVidia-AI-stack.jpg?resize=770%2C420&amp;ssl=1 770w, https:\/\/i0.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/10\/NVidia-AI-stack.jpg?resize=1920%2C1047&amp;ssl=1 1920w, https:\/\/i0.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/10\/NVidia-AI-stack.jpg?w=1392&amp;ssl=1 1392w\" alt=\"NVidia AI stack\" width=\"696\" height=\"379\" data-attachment-id=\"8564\" data-permalink=\"https:\/\/bdtechtalks.com\/2020\/10\/19\/nvidia-maxine-ai-video-conferencing\/nvidia-ai-stack\/\" data-orig-file=\"https:\/\/i0.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/10\/NVidia-AI-stack.jpg?fit=3360%2C1832&amp;ssl=1\" data-orig-size=\"3360,1832\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;1&quot;}\" data-image-title=\"NVidia AI stack\" data-image-description data-medium-file=\"https:\/\/i0.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/10\/NVidia-AI-stack.jpg?fit=300%2C164&amp;ssl=1\" data-large-file=\"https:\/\/i0.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/10\/NVidia-AI-stack.jpg?fit=696%2C379&amp;ssl=1\" data-recalc-dims=\"1\" data-lazy-loaded=\"1\"><\/figure>\n<p>With the covid-19 pandemic pushing companies to implement remote-working protocols, it seems as good a time as any to market video-conferencing apps. And with AI still being in the climax of its hype season, companies have a tendency to rebrand their products as \u201c<a href=\"https:\/\/bdtechtalks.com\/2018\/10\/08\/artificial-intelligence-vs-machine-learning\/\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">AI-powered<\/a>\u201d to improve sales. So, I\u2019m generally a bit skeptical about anything that has \u201cvideo conferencing\u201d and \u201cAI\u201d in its name these days, and I think many of them will not live up to the promise.<\/p>\n<p>But I have a few reasons to believe Nvidia\u2019s Maxine will succeed where others fail. First, Nvidia has a track record of doing reliable deep learning research, especially in <a href=\"https:\/\/bdtechtalks.com\/2019\/01\/14\/what-is-computer-vision\/\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">computer vision<\/a> and more recently in <a href=\"https:\/\/bdtechtalks.com\/2018\/02\/20\/ai-machine-learning-nlg-nlp\/\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">natural language processing<\/a>. The company also has the infrastructure and financial means to continue to develop and improve its AI models and make them available to its customers. Nvidia\u2019s GPU servers and its partnerships with cloud providers will enable it to scale as its customer base grows. And its recent <a href=\"https:\/\/www.theverge.com\/2020\/9\/14\/21435890\/nvidia-arm-acquisition-40-billion-ai-cloud-edge-why\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">acquisition of mobile chipmaker ARM<\/a> will put it in a suitable position to move some of these AI capabilities to the edge (maybe a Maxine-powered video-conferencing camera in the future?).<\/p>\n<p>Finally, Maxine is an ideal example of <a href=\"https:\/\/bdtechtalks.com\/2020\/04\/09\/what-is-narrow-artificial-intelligence-ani\/\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">narrow AI<\/a> being put to good use. As opposed to <a href=\"https:\/\/bdtechtalks.com\/2019\/12\/30\/computer-vision-applications-deep-learning\/\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">computer vision applications<\/a> that try to address a wide range of issues, all of Maxine\u2019s features are tailored for a special setting: a person talking to a camera. As <a href=\"https:\/\/bdtechtalks.com\/2019\/12\/16\/objectnet-dataset-ai-computer-vision\/\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">various experiments have shown<\/a>, even the most advanced deep learning algorithms lose their accuracy and stability as their problem domain expands. Reciprocally, neural networks are more likely to capture the real data distribution as its problem domain becomes narrower.<\/p>\n<p>But as we\u2019ve seen on these pages before, <a href=\"https:\/\/bdtechtalks.com\/2020\/09\/21\/gpt-3-economy-business-model\/\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">there\u2019s a huge difference<\/a> between an interesting piece of technology that works and one that has a successful business model.<\/p>\n<p>Maxine is currently in early access mode, so a lot of things might change in the future. For the moment, Nvidia plans to make it available as an SDK and a set of APIs hosed on Nvidia\u2019s servers that developers can integrate into their video-conferencing applications. Corporate video conferencing already has two big players, Teams and Zoom. Teams already has plenty of AI-powered features and it wouldn\u2019t be hard for Microsoft to add some of the functionalities Maxine offers.<\/p>\n<p>What will be the final pricing model for Maxine? Will the benefits provided by the bandwidth savings be enough to justify those costs? Will there be incentives for large players such as Zoom and Microsoft teams to partner with Nvidia, or will they add their own versions of the same features? Will Nvidia continue with the SDK\/API model or develop its own standalone video-conferencing platform? Nvidia will have to answer these and many other questions as developers explore its new AI-powered video-conferencing platform.<\/p>\n<hr>\n<p><i><span>This article was originally published by Ben Dickson on <\/span><\/i><a href=\"https:\/\/bdtechtalks.com\/\" target=\"_blank\" rel=\"nofollow noopener noreferrer\"><i><span>TechTalks<\/span><\/i><\/a><i><span>, a publication that examines trends in technology, how they affect the way we live and do business, and the problems they solve. But we also discuss the evil side of technology, the darker implications of new tech and what we need to look out for. You can read the original article <a href=\"https:\/\/bdtechtalks.com\/2020\/10\/19\/nvidia-maxine-ai-video-conferencing\/\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">here<\/a>.<\/span><\/i><\/p>\n<p class=\"c-post-pubDate\"> Published October 30, 2020 \u2014 08:25 UTC <\/p>\n<p> <a href=\"https:\/\/thenextweb.com\/neural\/2020\/10\/30\/how-nvidias-maxine-uses-ai-to-improve-video-calls-syndication\/\">Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>One of the things that caught my eye at Nvidia\u2019s flagship event, the GPU Technology Conference (GTC), was Maxine, a platform that leverages artificial intelligence to improve the quality and experience of&#8230;<\/p>\n","protected":false},"author":1,"featured_media":807,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[1],"tags":[],"_links":{"self":[{"href":"https:\/\/www.londonchiropracter.com\/index.php?rest_route=\/wp\/v2\/posts\/806"}],"collection":[{"href":"https:\/\/www.londonchiropracter.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.londonchiropracter.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.londonchiropracter.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.londonchiropracter.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=806"}],"version-history":[{"count":0,"href":"https:\/\/www.londonchiropracter.com\/index.php?rest_route=\/wp\/v2\/posts\/806\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.londonchiropracter.com\/index.php?rest_route=\/wp\/v2\/media\/807"}],"wp:attachment":[{"href":"https:\/\/www.londonchiropracter.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=806"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.londonchiropracter.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=806"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.londonchiropracter.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=806"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}