{"id":1523,"date":"2020-12-02T11:00:08","date_gmt":"2020-12-02T11:00:08","guid":{"rendered":"https:\/\/thenextweb.com\/?p=1330155"},"modified":"2020-12-02T11:00:08","modified_gmt":"2020-12-02T11:00:08","slug":"these-new-metrics-help-grade-ai-models-trustworthiness","status":"publish","type":"post","link":"https:\/\/www.londonchiropracter.com\/?p=1523","title":{"rendered":"These new metrics help grade AI models\u2019 trustworthiness"},"content":{"rendered":"\n<p>Whether it\u2019s diagnosing patients or driving cars, we want to know whether we can trust a person before assigning them a sensitive task. In the human world, we have different ways to establish and measure trustworthiness. In artificial intelligence, the establishment of trust is still developing.<\/p>\n<p>In the past years,<span>&nbsp;<\/span>deep learning&nbsp;has proven to be remarkably good at difficult tasks in computer vision, natural language processing, and other fields that were previously off-limits for computers. But we also have ample proof that placing blind trust in AI algorithms is a recipe for disaster: self-driving cars that<span>&nbsp;<\/span><a href=\"https:\/\/arstechnica.com\/cars\/2019\/03\/dashcam-video-shows-tesla-steering-toward-lane-divider-again\/\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">miss lane dividers<\/a>, melanoma detectors that<span>&nbsp;<\/span><a href=\"https:\/\/www.sciencedirect.com\/science\/article\/pii\/S0022202X18322930\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">look for ruler marks<\/a><span>&nbsp;<\/span>instead of malignant skin patterns, and<span>&nbsp;<\/span><a href=\"https:\/\/www.reuters.com\/article\/us-amazon-com-jobs-automation-insight-idUSKCN1MK08G\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">hiring algorithms<\/a><span>&nbsp;<\/span>that discriminate against women are just a few of the many incidents that have been reported in the past years.<\/p>\n<p>Recent work by scientists at the University of Waterloo and Darwin AI, a Toronto-based AI company, provides new metrics to measure the trustworthiness of deep learning systems in an intuitive and interpretable way. Trust is often a subjective issue, but their research, presented in two papers, provides clear guidelines on what to look for when evaluating the scope of situations in which AI models can and can\u2019t be trusted.<\/p>\n<p><em>[Read:&nbsp;<a class=\"c-link c-message_attachment__title_link\" href=\"https:\/\/thenextweb.com\/readme\/2020\/11\/26\/how-to-build-a-search-engine-for-criminal-data\/\" target=\"_blank\" rel=\"noreferrer noopener\" data-qa=\"message_attachment_title_link\"><span dir=\"auto\">How to build a search engine for criminal data<\/span><\/a>]<\/em><\/p>\n<h2>How far do you trust machine learning?<\/h2>\n<p>For many years, machine learning researchers measured the trustworthiness of their models through metrics such as accuracy, precision, and F1 score. These metrics compare the number of correct and incorrect predictions made by a machine learning model in various ways. They can answer important questions such as if a model is making random guesses or if it has actually learned something. But counting the number of correct predictions doesn\u2019t necessarily tell you whether a machine learning model is doing its job correctly.<\/p>\n<figure class=\"wp-block-image size-large\" readability=\"3\">\n<p><figure class=\"post-image post-mediaBleed aligncenter\"><img decoding=\"async\" loading=\"lazy\" class=\"jetpack-lazy-image jetpack-lazy-image--handled wp-image-8839 lazy\" src=\"https:\/\/i1.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/11\/confusion-matrix.jpg?resize=696%2C171&amp;ssl=1\" sizes=\"(max-width: 696px) 100vw, 696px\" alt=\"confusion matrix\" width=\"696\" height=\"171\" data-attachment-id=\"8839\" data-permalink=\"https:\/\/bdtechtalks.com\/2020\/11\/23\/deep-learning-trust-metrics\/confusion-matrix\/\" data-orig-file=\"https:\/\/i1.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/11\/confusion-matrix.jpg?fit=1748%2C428&amp;ssl=1\" data-orig-size=\"1748,428\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;1&quot;}\" data-image-title=\"confusion matrix\" data-image-description data-medium-file=\"https:\/\/i1.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/11\/confusion-matrix.jpg?fit=300%2C73&amp;ssl=1\" data-large-file=\"https:\/\/i1.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/11\/confusion-matrix.jpg?fit=696%2C171&amp;ssl=1\" data-recalc-dims=\"1\" data-lazy-loaded=\"1\" data-lazy=\"true\" data-srcset=\"https:\/\/i1.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/11\/confusion-matrix.jpg?resize=1024%2C251&amp;ssl=1 1024w, https:\/\/i1.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/11\/confusion-matrix.jpg?resize=300%2C73&amp;ssl=1 300w, https:\/\/i1.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/11\/confusion-matrix.jpg?resize=768%2C188&amp;ssl=1 768w, https:\/\/i1.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/11\/confusion-matrix.jpg?resize=1536%2C376&amp;ssl=1 1536w, https:\/\/i1.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/11\/confusion-matrix.jpg?resize=696%2C170&amp;ssl=1 696w, https:\/\/i1.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/11\/confusion-matrix.jpg?resize=1068%2C262&amp;ssl=1 1068w, https:\/\/i1.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/11\/confusion-matrix.jpg?resize=1715%2C420&amp;ssl=1 1715w, https:\/\/i1.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/11\/confusion-matrix.jpg?w=1748&amp;ssl=1 1748w, https:\/\/i1.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/11\/confusion-matrix.jpg?w=1392&amp;ssl=1 1392w\"><figcaption><a href=\"https:\/\/thenextweb.com\/neural\/2020\/12\/02\/these-new-metrics-help-grade-ai-models-trustworthiness-syndication\/#\" data-url=\"https:\/\/twitter.com\/intent\/tweet?url=https%3A%2F%2Fthenextweb.com%2Fneural%2F2020%2F12%2F02%2Fthese-new-metrics-help-grade-ai-models-trustworthiness-syndication%2F&amp;via=thenextweb&amp;related=thenextweb&amp;text=Check out this picture on: Confusion matrices present the ratio of right and wrong predictions made by machine learning models.\" data-title=\"Share Confusion matrices present the ratio of right and wrong predictions made by machine learning models. on Twitter\" data-width=\"685\" data-height=\"500\" class=\"post-image-share popitup\" title=\"Share Confusion matrices present the ratio of right and wrong predictions made by machine learning models. on Twitter\"><i class=\"icon icon--inline icon--twitter--dark\"><\/i><\/a>Confusion matrices present the ratio of right and wrong predictions made by machine learning models.<\/figcaption><\/figure>\n<\/p>\n<\/figure>\n<p>More recently, the field has shown a growing interest in<span>&nbsp;<\/span><a href=\"https:\/\/bdtechtalks.com\/2018\/09\/25\/explainable-interpretable-ai\/\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">explainability<\/a>, a set of techniques that try to interpret decisions made by<span>&nbsp;<\/span><a href=\"https:\/\/bdtechtalks.com\/2019\/08\/05\/what-is-artificial-neural-network-ann\/\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">deep neural networks<\/a>. Some techniques highlight the pixels that have contributed to a deep learning model\u2019s output. For instance, if your convolutional neural network has classified an image as \u201csheep,\u201d explainability techniques can help you figure out whether the neural network has learned to detect sheep or if it is classifying patches of grass as sheep.<\/p>\n<p>Explainability techniques can help you make sense of how a deep learning model works, but not when and where it can and can\u2019t be trusted.<\/p>\n<figure class=\"wp-block-image size-large\" readability=\"2\">\n<p><figure class=\"post-image post-mediaBleed aligncenter\"><img decoding=\"async\" loading=\"lazy\" class=\"jetpack-lazy-image jetpack-lazy-image--handled wp-image-3766 lazy\" src=\"https:\/\/i1.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2018\/10\/RISE-explainable-AI-example-saliency-map.png?resize=696%2C524&amp;ssl=1\" sizes=\"(max-width: 696px) 100vw, 696px\" alt=\"RISE explainable AI example saliency map\" width=\"696\" height=\"524\" data-attachment-id=\"3766\" data-permalink=\"https:\/\/bdtechtalks.com\/rise-explainable-ai-example-saliency-map\/\" data-orig-file=\"https:\/\/i1.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2018\/10\/RISE-explainable-AI-example-saliency-map.png?fit=2116%2C1594&amp;ssl=1\" data-orig-size=\"2116,1594\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"RISE explainable AI example saliency map\" data-image-description data-medium-file=\"https:\/\/i1.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2018\/10\/RISE-explainable-AI-example-saliency-map.png?fit=300%2C226&amp;ssl=1\" data-large-file=\"https:\/\/i1.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2018\/10\/RISE-explainable-AI-example-saliency-map.png?fit=696%2C524&amp;ssl=1\" data-recalc-dims=\"1\" data-lazy-loaded=\"1\" data-lazy=\"true\" data-srcset=\"https:\/\/i1.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2018\/10\/RISE-explainable-AI-example-saliency-map.png?resize=1024%2C771&amp;ssl=1 1024w, https:\/\/i1.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2018\/10\/RISE-explainable-AI-example-saliency-map.png?resize=300%2C226&amp;ssl=1 300w, https:\/\/i1.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2018\/10\/RISE-explainable-AI-example-saliency-map.png?resize=768%2C579&amp;ssl=1 768w, https:\/\/i1.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2018\/10\/RISE-explainable-AI-example-saliency-map.png?resize=80%2C60&amp;ssl=1 80w, https:\/\/i1.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2018\/10\/RISE-explainable-AI-example-saliency-map.png?resize=696%2C524&amp;ssl=1 696w, https:\/\/i1.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2018\/10\/RISE-explainable-AI-example-saliency-map.png?resize=1068%2C805&amp;ssl=1 1068w, https:\/\/i1.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2018\/10\/RISE-explainable-AI-example-saliency-map.png?resize=558%2C420&amp;ssl=1 558w, https:\/\/i1.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2018\/10\/RISE-explainable-AI-example-saliency-map.png?resize=1920%2C1446&amp;ssl=1 1920w, https:\/\/i1.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2018\/10\/RISE-explainable-AI-example-saliency-map.png?w=1392&amp;ssl=1 1392w, https:\/\/i1.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2018\/10\/RISE-explainable-AI-example-saliency-map.png?w=2088&amp;ssl=1 2088w\"><figcaption><a href=\"https:\/\/thenextweb.com\/neural\/2020\/12\/02\/these-new-metrics-help-grade-ai-models-trustworthiness-syndication\/#\" data-url=\"https:\/\/twitter.com\/intent\/tweet?url=https%3A%2F%2Fthenextweb.com%2Fneural%2F2020%2F12%2F02%2Fthese-new-metrics-help-grade-ai-models-trustworthiness-syndication%2F&amp;via=thenextweb&amp;related=thenextweb&amp;text=Check out this picture on: Examples of saliency maps produced by RISE\" data-title=\"Share Examples of saliency maps produced by RISE on Twitter\" data-width=\"685\" data-height=\"500\" class=\"post-image-share popitup\" title=\"Share Examples of saliency maps produced by RISE on Twitter\"><i class=\"icon icon--inline icon--twitter--dark\"><\/i><\/a>Examples of saliency maps produced by RISE<\/figcaption><\/figure>\n<\/p>\n<\/figure>\n<p>In their<span>&nbsp;<\/span><a href=\"https:\/\/arxiv.org\/abs\/2009.05835\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">first paper<\/a>, titled, \u201cHow Much Can We Really Trust You? Towards Simple, Interpretable Trust Quantification Metrics for Deep Neural Networks,\u201d the AI researchers at Darwin AI and the University of Waterloo introduce four new metrics for \u201cassessing the overall trustworthiness of deep neural networks based on their behavior when answering a set of questions.\u201d<\/p>\n<p>While there are other papers and research work on measuring trust, these four metrics have been designed to be practical for everyday use. On the one hand, the developers and users of AI systems should be able to continuously compute and use these metrics to constantly monitor areas in which their deep learning models can\u2019t be trusted. On the other hand, the metrics should be simple and interpretable.<\/p>\n<p>In the<span>&nbsp;<\/span><a href=\"https:\/\/arxiv.org\/abs\/2009.14701\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">second paper<\/a>, titled, \u201cWhere Does Trust Break Down? A Quantitative Trust Analysis of Deep Neural Networks via Trust Matrix and Conditional Trust Densities,\u201d the researchers introduce the \u201ctrust matrix,\u201d a visual representation of the trust metrics across different tasks.<\/p>\n<h2>Overconfident of overcautious?<\/h2>\n<p>Consider two types of people, one who is overly confident in their wrong decisions and another who is too hesitant about the right decision. Both would be untrustworthy partners. We all like to work with people who have a balanced behavior: they should be confident about their right answers and also know when a task is beyond their abilities.<\/p>\n<p>In this regard, machine learning systems are not very different from humans. If a neural network classifies a stop sign as a speed limit sign with a 99 percent confidence score, then you probably shouldn\u2019t install it in your self-driving car. Likewise, if another neural network is only 30 percent confident it is standing on a road, then it wouldn\u2019t help much in driving your car.<\/p>\n<p>\u201cQuestion-answer trust,\u201d the first metric introduced by the researchers, measures an AI model\u2019s confidence in its right and wrong answer. Like classical metrics, it takes into account the number of right and wrong predictions a machine learning model makes, but also factors in their confidence scores to penalize overconfidence and overcautiousness.<\/p>\n<p>Say your machine learning model must classify nine photos and determine which ones contain cats. The question-answer trust metric will reward every right classification by a factor of its confidence score. So obviously, higher confidence scores will receive a higher reward. But the metric will also reward wrong answers by the inverse of the confidence score (i.e., 100% \u2013 confidence score). So a low confidence score in a wrong classification can earn as much reward as high confidence in the right classification.<\/p>\n<div class=\"wp-block-image\" readability=\"6.5\">\n<figure class=\"aligncenter size-large is-resized\" readability=\"3\">\n<p><figure class=\"post-image post-mediaBleed aligncenter\"><img decoding=\"async\" loading=\"lazy\" class=\"jetpack-lazy-image jetpack-lazy-image--handled wp-image-8841 lazy\" src=\"https:\/\/i1.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/11\/cats-and-dogs.jpg?resize=600%2C600&amp;ssl=1\" sizes=\"(max-width: 600px) 100vw, 600px\" alt=\"cats and dogs\" width=\"600\" height=\"600\" data-attachment-id=\"8841\" data-permalink=\"https:\/\/bdtechtalks.com\/2020\/11\/23\/deep-learning-trust-metrics\/cats-and-dogs\/\" data-orig-file=\"https:\/\/i1.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/11\/cats-and-dogs.jpg?fit=976%2C976&amp;ssl=1\" data-orig-size=\"976,976\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"cats and dogs\" data-image-description data-medium-file=\"https:\/\/i1.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/11\/cats-and-dogs.jpg?fit=300%2C300&amp;ssl=1\" data-large-file=\"https:\/\/i1.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/11\/cats-and-dogs.jpg?fit=696%2C696&amp;ssl=1\" data-recalc-dims=\"1\" data-lazy-loaded=\"1\" data-lazy=\"true\" data-srcset=\"https:\/\/i1.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/11\/cats-and-dogs.jpg?w=976&amp;ssl=1 976w, https:\/\/i1.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/11\/cats-and-dogs.jpg?resize=300%2C300&amp;ssl=1 300w, https:\/\/i1.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/11\/cats-and-dogs.jpg?resize=150%2C150&amp;ssl=1 150w, https:\/\/i1.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/11\/cats-and-dogs.jpg?resize=768%2C768&amp;ssl=1 768w, https:\/\/i1.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/11\/cats-and-dogs.jpg?resize=696%2C696&amp;ssl=1 696w, https:\/\/i1.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/11\/cats-and-dogs.jpg?resize=420%2C420&amp;ssl=1 420w\"><figcaption><a href=\"https:\/\/thenextweb.com\/neural\/2020\/12\/02\/these-new-metrics-help-grade-ai-models-trustworthiness-syndication\/#\" data-url=\"https:\/\/twitter.com\/intent\/tweet?url=https%3A%2F%2Fthenextweb.com%2Fneural%2F2020%2F12%2F02%2Fthese-new-metrics-help-grade-ai-models-trustworthiness-syndication%2F&amp;via=thenextweb&amp;related=thenextweb&amp;text=Check out this picture on: The two behaviors that receive the less reward are high confidence in wrong predictions and low confidence in right predictions.\" data-title=\"Share The two behaviors that receive the less reward are high confidence in wrong predictions and low confidence in right predictions. on Twitter\" data-width=\"685\" data-height=\"500\" class=\"post-image-share popitup\" title=\"Share The two behaviors that receive the less reward are high confidence in wrong predictions and low confidence in right predictions. on Twitter\"><i class=\"icon icon--inline icon--twitter--dark\"><\/i><\/a>The two behaviors that receive the less reward are high confidence in wrong predictions and low confidence in right predictions.<\/figcaption><\/figure>\n<\/p>\n<\/figure>\n<\/div>\n<p>What\u2019s interesting about this metric is that, unlike precision and accuracy scores, it\u2019s not about how many right predictions your machine learning model makes\u2014after all, nobody is perfect. &nbsp;It\u2019s rather about how trustworthy the model\u2019s predictions are.<\/p>\n<h2>Setting up a hierarchy of trust scores<\/h2>\n<p>Question-answer trust enables us to measure the trust level of single outputs made by our deep learning models. In their paper, the researchers expand on this notion and provide three more metrics that enable us to evaluate the overall trust level of a machine learning model.<\/p>\n<p>The first, \u201ctrust density,\u201d measures the trust level of a model on a specific output class. Say you have a neural network trained to detect 20 different types of pictures, but you want to measure its overall trust level in the class \u201ccat.\u201d Trust density visualizes the distribution question-answer trust of the machine learning model for \u201ccat\u201d across multiple examples. A strong model should show higher density toward the right (question-answer trust = 1.0) and lower density toward the left (question-answer trust = 0.0).<\/p>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large\">\n<p><figure class=\"post-image post-mediaBleed aligncenter\"><img decoding=\"async\" loading=\"lazy\" class=\"jetpack-lazy-image jetpack-lazy-image--handled wp-image-8842 lazy\" src=\"https:\/\/i0.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/11\/image.png?resize=640%2C480&amp;ssl=1\" sizes=\"(max-width: 640px) 100vw, 640px\" alt=\"trust density\" width=\"640\" height=\"480\" data-attachment-id=\"8842\" data-permalink=\"https:\/\/bdtechtalks.com\/2020\/11\/23\/deep-learning-trust-metrics\/image-2\/\" data-orig-file=\"https:\/\/i0.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/11\/image.png?fit=640%2C480&amp;ssl=1\" data-orig-size=\"640,480\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"Trust density\" data-image-description data-medium-file=\"https:\/\/i0.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/11\/image.png?fit=300%2C225&amp;ssl=1\" data-large-file=\"https:\/\/i0.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/11\/image.png?fit=640%2C480&amp;ssl=1\" data-recalc-dims=\"1\" data-lazy-loaded=\"1\" data-lazy=\"true\" data-srcset=\"https:\/\/i0.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/11\/image.png?w=640&amp;ssl=1 640w, https:\/\/i0.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/11\/image.png?resize=300%2C225&amp;ssl=1 300w, https:\/\/i0.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/11\/image.png?resize=80%2C60&amp;ssl=1 80w, https:\/\/i0.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/11\/image.png?resize=265%2C198&amp;ssl=1 265w, https:\/\/i0.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/11\/image.png?resize=560%2C420&amp;ssl=1 560w\"><figcaption><a href=\"https:\/\/thenextweb.com\/neural\/2020\/12\/02\/these-new-metrics-help-grade-ai-models-trustworthiness-syndication\/#\" data-url=\"https:\/\/twitter.com\/intent\/tweet?url=https%3A%2F%2Fthenextweb.com%2Fneural%2F2020%2F12%2F02%2Fthese-new-metrics-help-grade-ai-models-trustworthiness-syndication%2F&amp;via=thenextweb&amp;related=thenextweb&amp;text=Check out this picture on: Trust density\" data-title=\"Share Trust density on Twitter\" data-width=\"685\" data-height=\"500\" class=\"post-image-share popitup\" title=\"Share Trust density on Twitter\"><i class=\"icon icon--inline icon--twitter--dark\"><\/i><\/a>Trust density<\/figcaption><\/figure>\n<\/p>\n<\/figure>\n<\/div>\n<p>The second metric, \u201ctrust spectrum,\u201d further zooms out and measures the model\u2019s trustworthiness across different classes when tested on a finite set of inputs. When visualized, the trust spectrum provides a nice overview of where you can and can\u2019t trust a machine learning model. For instance, the following trust spectrum shows that our neural network, in this case ResNet-50, can be trusted in detecting teapots and school buses, but not in screens and monitors.<\/p>\n<figure class=\"wp-block-image size-large\">\n<p><figure class=\"post-image post-mediaBleed aligncenter\"><img decoding=\"async\" loading=\"lazy\" class=\"jetpack-lazy-image jetpack-lazy-image--handled wp-image-8843 lazy\" src=\"https:\/\/i0.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/11\/image-1.png?resize=696%2C232&amp;ssl=1\" sizes=\"(max-width: 696px) 100vw, 696px\" alt=\"Trust spectrum\" width=\"696\" height=\"232\" data-attachment-id=\"8843\" data-permalink=\"https:\/\/bdtechtalks.com\/2020\/11\/23\/deep-learning-trust-metrics\/image-1\/\" data-orig-file=\"https:\/\/i0.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/11\/image-1.png?fit=1200%2C400&amp;ssl=1\" data-orig-size=\"1200,400\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"Trust spectrum\" data-image-description data-medium-file=\"https:\/\/i0.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/11\/image-1.png?fit=300%2C100&amp;ssl=1\" data-large-file=\"https:\/\/i0.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/11\/image-1.png?fit=696%2C232&amp;ssl=1\" data-recalc-dims=\"1\" data-lazy-loaded=\"1\" data-lazy=\"true\" data-srcset=\"https:\/\/i0.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/11\/image-1.png?resize=1024%2C341&amp;ssl=1 1024w, https:\/\/i0.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/11\/image-1.png?resize=300%2C100&amp;ssl=1 300w, https:\/\/i0.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/11\/image-1.png?resize=768%2C256&amp;ssl=1 768w, https:\/\/i0.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/11\/image-1.png?resize=696%2C232&amp;ssl=1 696w, https:\/\/i0.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/11\/image-1.png?resize=1068%2C356&amp;ssl=1 1068w, https:\/\/i0.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/11\/image-1.png?w=1200&amp;ssl=1 1200w\"><figcaption><a href=\"https:\/\/thenextweb.com\/neural\/2020\/12\/02\/these-new-metrics-help-grade-ai-models-trustworthiness-syndication\/#\" data-url=\"https:\/\/twitter.com\/intent\/tweet?url=https%3A%2F%2Fthenextweb.com%2Fneural%2F2020%2F12%2F02%2Fthese-new-metrics-help-grade-ai-models-trustworthiness-syndication%2F&amp;via=thenextweb&amp;related=thenextweb&amp;text=Check out this picture on: Trust spectrum\" data-title=\"Share Trust spectrum on Twitter\" data-width=\"685\" data-height=\"500\" class=\"post-image-share popitup\" title=\"Share Trust spectrum on Twitter\"><i class=\"icon icon--inline icon--twitter--dark\"><\/i><\/a>Trust spectrum<\/figcaption><\/figure>\n<\/p>\n<\/figure>\n<p>Finally, the \u201cNetTrustScore\u201d summarizes the information of the trust spectrum into a single metric. \u201cFrom an interpretation perspective, the proposed NetTrustScore is fundamentally a quantitative score that indicates how well placed the deep neural network\u2019s confidence is expected to be under all possible answer scenarios that can occur,\u201d the researchers write.<\/p>\n<h2>The machine learning trust matrix<\/h2>\n<p>In their complementary paper, the AI researchers introduce the trust matrix, a visual aid that gives a quick glimpse of the overall trust level of a machine learning model. Basically, the trust matrix is a grid that maps the outputs of a machine learning model to their actual values and the trust level. The vertical axis represents the \u201coracle,\u201d the known values of the inputs provided to the machine learning model. The horizontal axis is the prediction made by the model. The squares represent a test, with the X-axis representing the output by the model and the Y-axis representing its actual value. The color of each square shows its trust level, with bright colors representing high and dark colors representing low trust.<\/p>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large is-resized\"><img decoding=\"async\" loading=\"lazy\" class=\"wp-image-8844 jetpack-lazy-image jetpack-lazy-image--handled\" src=\"https:\/\/i0.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/11\/image-2.png?resize=696%2C557&amp;ssl=1\" sizes=\"(max-width: 696px) 100vw, 696px\" srcset=\"https:\/\/i0.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/11\/image-2.png?resize=1024%2C819&amp;ssl=1 1024w, https:\/\/i0.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/11\/image-2.png?resize=300%2C240&amp;ssl=1 300w, https:\/\/i0.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/11\/image-2.png?resize=768%2C614&amp;ssl=1 768w, https:\/\/i0.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/11\/image-2.png?resize=1536%2C1229&amp;ssl=1 1536w, https:\/\/i0.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/11\/image-2.png?resize=696%2C557&amp;ssl=1 696w, https:\/\/i0.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/11\/image-2.png?resize=1068%2C854&amp;ssl=1 1068w, https:\/\/i0.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/11\/image-2.png?resize=525%2C420&amp;ssl=1 525w, https:\/\/i0.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/11\/image-2.png?resize=1920%2C1536&amp;ssl=1 1920w, https:\/\/i0.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/11\/image-2.png?w=2000&amp;ssl=1 2000w, https:\/\/i0.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/11\/image-2.png?w=1392&amp;ssl=1 1392w\" alt=\"Trust matrix\" width=\"696\" height=\"557\" data-attachment-id=\"8844\" data-permalink=\"https:\/\/bdtechtalks.com\/2020\/11\/23\/deep-learning-trust-metrics\/image-2-2\/\" data-orig-file=\"https:\/\/i0.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/11\/image-2.png?fit=2000%2C1600&amp;ssl=1\" data-orig-size=\"2000,1600\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"Trust matrix\" data-image-description data-medium-file=\"https:\/\/i0.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/11\/image-2.png?fit=300%2C240&amp;ssl=1\" data-large-file=\"https:\/\/i0.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/11\/image-2.png?fit=696%2C557&amp;ssl=1\" data-recalc-dims=\"1\" data-lazy-loaded=\"1\"><\/figure>\n<\/div>\n<p>A perfect model should have bright-colored squares across the diagonal going from the top-left to the bottom-right, where predictions and ground truth cross paths. A trustworthy model can have squares that are off the diagonal, but those squares should be colored brightly as well. A bad model will quickly show itself with dark-colored squares.<\/p>\n<p>For instance, the red circle represents a \u201cswitch\u201d that was predicted as a \u201cstreet sign\u201d by the machine learning model with a low trust score. This means that the model was very confident it was seeing a street sign while in reality, it was looking at a switch. On the other hand, the pink circle represents a high trust level on a \u201cwater bottle\u201d that was classified as a \u201claptop.\u201d This means the machine learning model had provided a low confidence score, signaling that it was doubtful of its own classification.<\/p>\n<h2>Putting trust metrics to use<\/h2>\n<p>The hierarchical structure of the trust metrics proposed in the papers makes them very useful. For instance, when choosing a machine learning model for a task, you can shortlist your candidates by reviewing their NetTrustScores and trust matrices. You can further investigate the candidates by comparing their trust spectrums on multiple classes and further compare their performance on single classes on the trust density score.<\/p>\n<div class=\"wp-block-image\" readability=\"6\">\n<figure class=\"aligncenter size-large is-resized\" readability=\"2\">\n<p><figure class=\"post-image post-mediaBleed aligncenter\"><img decoding=\"async\" loading=\"lazy\" class=\"jetpack-lazy-image jetpack-lazy-image--handled wp-image-8845 lazy\" src=\"https:\/\/i0.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/11\/comparing-metrics.jpg?resize=696%2C523&amp;ssl=1\" sizes=\"(max-width: 696px) 100vw, 696px\" alt=\"comparing metrics\" width=\"696\" height=\"523\" data-attachment-id=\"8845\" data-permalink=\"https:\/\/bdtechtalks.com\/2020\/11\/23\/deep-learning-trust-metrics\/comparing-metrics\/\" data-orig-file=\"https:\/\/i0.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/11\/comparing-metrics.jpg?fit=1297%2C976&amp;ssl=1\" data-orig-size=\"1297,976\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"comparing metrics\" data-image-description data-medium-file=\"https:\/\/i0.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/11\/comparing-metrics.jpg?fit=300%2C226&amp;ssl=1\" data-large-file=\"https:\/\/i0.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/11\/comparing-metrics.jpg?fit=696%2C524&amp;ssl=1\" data-recalc-dims=\"1\" data-lazy-loaded=\"1\" data-lazy=\"true\" data-srcset=\"https:\/\/i0.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/11\/comparing-metrics.jpg?resize=1024%2C771&amp;ssl=1 1024w, https:\/\/i0.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/11\/comparing-metrics.jpg?resize=300%2C226&amp;ssl=1 300w, https:\/\/i0.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/11\/comparing-metrics.jpg?resize=768%2C578&amp;ssl=1 768w, https:\/\/i0.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/11\/comparing-metrics.jpg?resize=80%2C60&amp;ssl=1 80w, https:\/\/i0.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/11\/comparing-metrics.jpg?resize=265%2C198&amp;ssl=1 265w, https:\/\/i0.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/11\/comparing-metrics.jpg?resize=696%2C524&amp;ssl=1 696w, https:\/\/i0.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/11\/comparing-metrics.jpg?resize=1068%2C804&amp;ssl=1 1068w, https:\/\/i0.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/11\/comparing-metrics.jpg?resize=558%2C420&amp;ssl=1 558w, https:\/\/i0.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/11\/comparing-metrics.jpg?w=1297&amp;ssl=1 1297w\"><figcaption><a href=\"https:\/\/thenextweb.com\/neural\/2020\/12\/02\/these-new-metrics-help-grade-ai-models-trustworthiness-syndication\/#\" data-url=\"https:\/\/twitter.com\/intent\/tweet?url=https%3A%2F%2Fthenextweb.com%2Fneural%2F2020%2F12%2F02%2Fthese-new-metrics-help-grade-ai-models-trustworthiness-syndication%2F&amp;via=thenextweb&amp;related=thenextweb&amp;text=Check out this picture on: Using trust metrics to compare different machine learning models\" data-title=\"Share Using trust metrics to compare different machine learning models on Twitter\" data-width=\"685\" data-height=\"500\" class=\"post-image-share popitup\" title=\"Share Using trust metrics to compare different machine learning models on Twitter\"><i class=\"icon icon--inline icon--twitter--dark\"><\/i><\/a>Using trust metrics to compare different machine learning models<\/figcaption><\/figure>\n<\/p>\n<\/figure>\n<\/div>\n<p>The trust metrics will help you quickly find the best model for your task or find important areas where you can make improvements to your model.<\/p>\n<p>Like many areas of machine learning, this is a work in progress. In their current form, the machine learning trust metrics only apply to a limited set of supervised learning problems, namely classification tasks. In the future, the researchers will be expanding on the work to create metrics for other kinds of tasks such as object detection, speech recognition, and time series. They will also be exploring trust in unsupervised machine learning algorithms.<\/p>\n<p>\u201cThe proposed metrics are by no means perfect, but the hope is to push the conversation towards better quantitative metrics for evaluating the overall trustworthiness of deep neural networks to help guide practitioners and regulators in producing, deploying, and certifying deep learning solutions that can be trusted to operate in real-world, mission-critical scenarios,\u201d the researchers write.<\/p>\n<p><i><span>This article was originally published by Ben Dickson on <\/span><\/i><a href=\"https:\/\/bdtechtalks.com\/2020\/11\/23\/deep-learning-trust-metrics\/\" target=\"_blank\" rel=\"nofollow noopener noreferrer\"><i><span>TechTalks<\/span><\/i><\/a><i><span>, a publication that examines trends in technology, how they affect the way we live and do business, and the problems they solve. But we also discuss the evil side of technology, the darker implications of new tech and what we need to look out for. You can read the original article <a href=\"https:\/\/bdtechtalks.com\/2020\/11\/23\/deep-learning-trust-metrics\/\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">here<\/a>.<\/span><\/i><\/p>\n<p class=\"c-post-pubDate\"> Published December 2, 2020 \u2014 11:00 UTC <\/p>\n<p> <a href=\"https:\/\/thenextweb.com\/neural\/2020\/12\/02\/these-new-metrics-help-grade-ai-models-trustworthiness-syndication\/\">Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Whether it\u2019s diagnosing patients or driving cars, we want to know whether we can trust a person before assigning them a sensitive task. In the human world, we have different ways to&#8230;<\/p>\n","protected":false},"author":1,"featured_media":1524,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[1],"tags":[],"_links":{"self":[{"href":"https:\/\/www.londonchiropracter.com\/index.php?rest_route=\/wp\/v2\/posts\/1523"}],"collection":[{"href":"https:\/\/www.londonchiropracter.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.londonchiropracter.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.londonchiropracter.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.londonchiropracter.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1523"}],"version-history":[{"count":0,"href":"https:\/\/www.londonchiropracter.com\/index.php?rest_route=\/wp\/v2\/posts\/1523\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.londonchiropracter.com\/index.php?rest_route=\/wp\/v2\/media\/1524"}],"wp:attachment":[{"href":"https:\/\/www.londonchiropracter.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1523"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.londonchiropracter.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1523"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.londonchiropracter.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1523"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}