{"id":373,"date":"2020-10-15T11:21:54","date_gmt":"2020-10-15T11:21:54","guid":{"rendered":"https:\/\/thenextweb.com\/?p=1323610"},"modified":"2020-10-15T11:21:54","modified_gmt":"2020-10-15T11:21:54","slug":"microsofts-image-captioning-ai-is-pretty-darn-good-at-describing-pictures-like-a-human","status":"publish","type":"post","link":"https:\/\/www.londonchiropracter.com\/?p=373","title":{"rendered":"Microsoft\u2019s image-captioning AI is pretty darn good at describing pictures like a human"},"content":{"rendered":"\n<p>Microsoft has built a new AI image-captioning system that described photos more accurately than humans in limited tests.<\/p>\n<p>The model has been added to <a href=\"https:\/\/www.microsoft.com\/en-us\/ai\/seeing-ai\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">Seeing AI<\/a>, a free app for people with visual impairments that uses a smartphone camera to read text, identify people, and describe objects and surroundings.<\/p>\n<p>It\u2019s also now available to<span>&nbsp;app developers through&nbsp;the Computer Vision API in Azure Cognitive Services, and will start rolling out in Microsoft Word, Outlook, and PowerPoint later this year.<\/span><\/p>\n<p>The model can generate \u201calt text\u201d image descriptions for web pages and documents, an important feature for people with limited vision that\u2019s all-too-often unavailable.<\/p>\n<section class=\"f-content-section f-content-block\">\n<div data-grid=\"container\" readability=\"10.569498069498\">\n<div class=\"f-content-entry m-rich-content-block\" readability=\"38.050193050193\">\n<p class>\u201cIdeally, everyone would include alt text for all images in documents, on the web, in social media \u2013 as this enables people who are blind to access the content and participate in the conversation,\u201d said&nbsp;Saqib Shaikh, a software engineering manager at Microsoft\u2019s AI platform group. \u201cBut, alas, people don\u2019t. So, there are several apps that use image captioning as [a] way to fill in alt text when it\u2019s missing.\u201d<\/p>\n<p><em>[Read:&nbsp;<a href=\"https:\/\/thenextweb.com\/neural\/2020\/10\/13\/microsoft-unveils-efforts-to-make-ai-more-accessible-to-people-with-disabilities\/\">Microsoft unveils efforts to make AI more accessible to people with disabilities<\/a>]<\/em><\/p>\n<\/div>\n<\/div>\n<\/section>\n<p>The algorithm now tops the leaderboard of an image-captioning benchmark called&nbsp;<a href=\"https:\/\/nocaps.org\/\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">nocaps<\/a>.&nbsp;Microsoft achieved this by&nbsp;<span>pre-training a large AI model on a dataset of images paired with word tags \u2014 rather than full captions, which are less efficient to create. Each of the tags was mapped to a specific object in an image.<\/span><\/p>\n<p>The pre-trained model was then fine-tuned on a dataset of captioned images, which enabled it to compose sentences. It then used its \u201cvisual vocabulary\u201d to create captions for images containing novel objects.<\/p>\n<p>Microsoft said the model is twice as good as the one it\u2019s used in products since 2015. The&nbsp;image below shows how these improvements work in practice:<\/p>\n<figure class=\"post-image post-mediaBleed aligncenter\"><img decoding=\"async\" loading=\"lazy\" class=\"size-full wp-image-1323655 lazy\" src=\"https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2020\/10\/Screenshot-2020-10-15-at-11.39.02.png\" alt width=\"1332\" height=\"742\" sizes=\"(max-width: 1332px) 100vw, 1332px\" data-lazy=\"true\" data-srcset=\"https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2020\/10\/Screenshot-2020-10-15-at-11.39.02.png 1332w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2020\/10\/Screenshot-2020-10-15-at-11.39.02-280x156.png 280w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2020\/10\/Screenshot-2020-10-15-at-11.39.02-485x270.png 485w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2020\/10\/Screenshot-2020-10-15-at-11.39.02-242x135.png 242w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2020\/10\/Screenshot-2020-10-15-at-11.39.02-796x443.png 796w\"><figcaption>Credit: Microsoft<\/figcaption><figcaption><a href=\"https:\/\/thenextweb.com\/neural\/2020\/10\/15\/microsofts-image-captioning-ai-is-pretty-darn-good-at-describing-pictures-like-a-human\/#\" data-url=\"https:\/\/twitter.com\/intent\/tweet?url=https%3A%2F%2Fthenextweb.com%2Fneural%2F2020%2F10%2F15%2Fmicrosofts-image-captioning-ai-is-pretty-darn-good-at-describing-pictures-like-a-human%2F&amp;via=thenextweb&amp;related=thenextweb&amp;text=Check out this picture on: The legacy AI captioned this image as \u201cA person sitting at a table using a laptop.\u201d The new model described it as \u201cA person using a microscope.\u201d\" data-title=\"Share The legacy AI captioned this image as \u201cA person sitting at a table using a laptop.\u201d The new model described it as \u201cA person using a microscope.\u201d on Twitter\" data-width=\"685\" data-height=\"500\" class=\"post-image-share popitup\" title=\"Share The legacy AI captioned this image as \u201cA person sitting at a table using a laptop.\u201d The new model described it as \u201cA person using a microscope.\u201d on Twitter\"><i class=\"icon icon--inline icon--twitter--dark\"><\/i><\/a>The legacy AI captioned this image as \u201cA person sitting at a table using a laptop.\u201d The new model described it as \u201cA person using a microscope.\u201d<\/figcaption><\/figure>\n<p>However, the benchmark performance achievement doesn\u2019t mean the model will be better than humans at image captioning in the real world. Harsh Agrawal, one of the creators of the benchmark, <a href=\"https:\/\/www.theverge.com\/2020\/10\/14\/21514405\/image-captioning-seeing-ai-microsoft-algorithm-word-powerpoint-outlook\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">told The Verge<\/a> that its evaluation metrics \u201conly roughly correlate with human preferences\u201d and that it \u201conly covers a small percentage of all the possible visual concepts.\u201d<\/p>\n<p class=\"c-post-pubDate\"> Published October 15, 2020 \u2014 11:21 UTC <\/p>\n<p> <a href=\"https:\/\/thenextweb.com\/neural\/2020\/10\/15\/microsofts-image-captioning-ai-is-pretty-darn-good-at-describing-pictures-like-a-human\/\">Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Microsoft has built a new AI image-captioning system that described photos more accurately than humans in limited tests. The model has been added to Seeing AI, a free app for people with&#8230;<\/p>\n","protected":false},"author":1,"featured_media":374,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[1],"tags":[],"_links":{"self":[{"href":"https:\/\/www.londonchiropracter.com\/index.php?rest_route=\/wp\/v2\/posts\/373"}],"collection":[{"href":"https:\/\/www.londonchiropracter.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.londonchiropracter.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.londonchiropracter.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.londonchiropracter.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=373"}],"version-history":[{"count":0,"href":"https:\/\/www.londonchiropracter.com\/index.php?rest_route=\/wp\/v2\/posts\/373\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.londonchiropracter.com\/index.php?rest_route=\/wp\/v2\/media\/374"}],"wp:attachment":[{"href":"https:\/\/www.londonchiropracter.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=373"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.londonchiropracter.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=373"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.londonchiropracter.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=373"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}