{"id":10435,"date":"2022-03-02T16:11:45","date_gmt":"2022-03-02T16:11:45","guid":{"rendered":"http:\/\/TheNextWeb=1381678"},"modified":"2022-03-02T16:11:45","modified_gmt":"2022-03-02T16:11:45","slug":"reinforcement-learning-how-rewards-create-intelligent-machines","status":"publish","type":"post","link":"https:\/\/www.londonchiropracter.com\/?p=10435","title":{"rendered":"Reinforcement learning: How rewards create intelligent machines"},"content":{"rendered":"\n<div><img decoding=\"async\" src=\"https:\/\/img-cdn.tnwcdn.com\/image\/neural?filter_last=1&amp;fit=1280%2C640&amp;url=https%3A%2F%2Fcdn0.tnwcdn.com%2Fwp-content%2Fblogs.dir%2F1%2Ffiles%2F2022%2F03%2FUntitled-design-1.jpg&amp;signature=840670943ed2cc84f53f7265c443186a\" class=\"ff-og-image-inserted\"><\/div>\n<p><span>In June 2021, scientists at the AI lab DeepMind made a controversial claim. The researchers suggested that we could reach artificial general intelligence (AGI) using one single approach: <\/span><a href=\"https:\/\/thenextweb.com\/news\/what-the-hell-is-reinforcement-learning-and-how-does-it-work-syndication\" target=\"_blank\" rel=\"noopener noreferrer\"><span>reinforcement learning<\/span><\/a><span>. They titled their paper on the subject: \u201c<\/span><a href=\"https:\/\/deepmind.com\/research\/publications\/2021\/Reward-is-Enough\" target=\"_blank\" rel=\"nofollow noopener noreferrer\"><span>Reward is Enough<\/span><\/a><span>.\u201d<\/span><\/p>\n<p><span>The team argued that AGI could emerge through an incentive mechanism known as a reward function.<\/span><\/p>\n<p><span>\u201cWe hypothesize that intelligence, and its associated abilities, can be understood as subserving the maximization of reward,\u201d the study authors wrote.<\/span><\/p>\n<p><span>Their claims have been dismissed by some scientists, but they nonetheless shine a spotlight on a powerful technique.<\/span><\/p>\n<h2><b>What is reinforcement learning?<\/b><\/h2>\n<p><span>In reinforcement learning (RL), a software agent learns through trial and error. When it takes a desired action, the model receives a reward.<\/span><\/p>\n<p><span>Over time, the agent works out how to execute the task to optimize its reward.<\/span><\/p>\n<p><span>The technique can be applied to a vast array of tasks, from<\/span><a href=\"https:\/\/www.zdnet.com\/article\/uc-berkeley-robot-navigation-could-chart-a-new-course-for-self-driving-systems\/\" target=\"_blank\" rel=\"nofollow noopener noreferrer\"> <span>controlling autonomous vehicles<\/span><\/a><span> to<\/span><a href=\"https:\/\/sustainability.google\/progress\/projects\/machine-learning\/\" target=\"_blank\" rel=\"nofollow noopener noreferrer\"> <span>improving energy efficiency<\/span><\/a><span>. But its most celebrated achievements have come in the world of games.<\/span><\/p>\n<p><span>In March 2016, the technique had a landmark moment.&nbsp;<\/span><\/p>\n<p><span>A DeepMind system called AlphaGo became the first computer program to defeat a world champion in Go, a famously complex board game.<\/span><\/p>\n<p><span>The victory<\/span><a href=\"https:\/\/books.google.co.uk\/books?id=Z1FfDwAAQBAJ&amp;pg=PT98&amp;lpg=PT98&amp;dq=200+million+people+watched+alphago&amp;source=bl&amp;ots=If6iGPZcLY&amp;sig=ACfU3U0v0_xt6NDhKqlvREz5-njZwFs3wA&amp;hl=en&amp;sa=X&amp;ved=2ahUKEwjNzP6PkqX2AhWJEMAKHQLhB1kQ6AF6BAgsEAM#v=onepage&amp;q=200%20million%20people%20watched%20alphago&amp;f=false\" target=\"_blank\" rel=\"nofollow noopener noreferrer\"> <span>was reportedly watched<\/span><\/a><span> by over 200 million people.<\/span><\/p>\n<p><span> <\/p>\n<figure>\n<p> <iframe srcdoc=\"\n\n<style>*{padding:0;margin:0;overflow:hidden}html,body{background:#000;height:100%}img{position:absolute;top:0;left:0;width:100%;height:100%;object-fit:cover;transition:opacity .1s cubic-bezier(0.4,0,1,1)}a:hover img+img{opacity:1!important}<\/style>\n<p><a href='https:\/\/www.youtube.com\/embed\/WXuK6gekU1Y?feature=oembed&amp;autoplay=1&amp;mute=1&amp;modestbranding=1&amp;iv_load_policy=3&amp;theme=light&amp;playsinline=1'><img src='https:\/\/img.youtube.com\/vi\/WXuK6gekU1Y\/hqdefault.jpg'><img src='https:\/\/cdn0.tnwcdn.com\/wp-content\/themes\/cyberdelia\/assets\/img\/ytplaybtn.png' style='top: 50%;left:50%;width:68px;height:48px;transform:translate3d(-50%,-50%,0)'><img src='https:\/\/cdn0.tnwcdn.com\/wp-content\/themes\/cyberdelia\/assets\/img\/ytplaybtn-hover.png' style='top: 50%;left:50%;width:68px;height:48px;opacity:0;transform:translate3d(-50%,-50%,0)'><\/a>&#8221; height=&#8221;240&#8243; width=&#8221;320&#8243; allow=&#8221;accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture&#8221; allowfullscreen frameborder=&#8221;0&#8243;>[embedded content]<\/iframe> <\/p>\n<\/figure>\n<p> <!--resp-video-container--><\/span><\/p>\n<p><span>During the match, the AI played unconventional moves that baffled its opponent.<\/span><\/p>\n<p><span>\u201cThe final version of AlphaGo does not use any rules,\u201d said Demis Hassabis, DeepMind co-founder and CEO.<\/span><\/p>\n<p><span>\u201cInstead, it learns the game from scratch by playing against different versions of itself thousands of times, incrementally learning through a process of trial and error, known as reinforcement learning. This means it is free to learn the game for itself, unconstrained by orthodox thinking.\u201d<\/span><\/p>\n<p><span>These constraints were replaced by reward maximization.<\/span><\/p>\n<h2><b>How a reward function works<\/b><\/h2>\n<p><span>Rewards are common learning incentives for animals. A squirrel, for instance, develops intellectual abilities in its search for nuts. A child, meanwhile, may get a chocolate for tidying their room \u2014 or a spank for bad behavior. (<\/span><i><span>Don\u2019t worry \u2014 I don\u2019t have kids<\/span><\/i><span>).<\/span><\/p>\n<p><span>In AI systems, the rewards and punishments are calculated mathematically. A self-driving system could receive a -1 when the model hits a wall, and a +1 if it safely passes another car. These signals allow the agent to evaluate its performance.<\/span><\/p>\n<p><span>The <a href=\"https:\/\/thenextweb.com\/topic\/algorithm\" target=\"_blank\" rel=\"noopener noreferrer\">algorithm<\/a> then learns through trial and error to maximize the reward \u2014 and ultimately, complete the task in the most desirable manner.<\/span><\/p>\n<p><span>\u201cBecause it\u2019s learning from interaction in an incremental way, it feels very much like what biological intelligence systems do,\u201d Doina Precup, who leads DeepMind\u2019s Montreal office, <\/span><a href=\"https:\/\/thenextweb.com\/news\/deepmind-reinforcement-learning-only-one-possible-pathway-to-agi\" target=\"_blank\" rel=\"noopener noreferrer\"><span>told TNW<\/span><\/a><span>.<\/span><\/p>\n<p><span>Precup\u2019s colleagues are now developing multi-purpose RL agents.<\/span><\/p>\n<p><span>In 2020, DeepMind unveiled MuZeru, a program that figures out the rules of a game it\u2019s never seen before. Eventually, the lab believes such agents could solve multiple problems in the real world.<\/span><\/p>\n<p><span>There are still major challenges to overcome. RL agents struggle to maximize rewards in complex environments and assess the long-term repercussions of their actions. Nonetheless, the reward-is-enough proponents believe the algorithms\u2019 adaptability could pave a path to AGI.<\/span><\/p>\n<p> <a href=\"https:\/\/thenextweb.com\/news\/how-rewards-work-in-reinforcement-learning-deepmind\">Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>In June 2021, scientists at the AI lab DeepMind made a controversial claim. The researchers suggested that we could reach artificial general intelligence (AGI) using one single approach: reinforcement learning. They titled&#8230;<\/p>\n","protected":false},"author":1,"featured_media":10436,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[1],"tags":[],"_links":{"self":[{"href":"https:\/\/www.londonchiropracter.com\/index.php?rest_route=\/wp\/v2\/posts\/10435"}],"collection":[{"href":"https:\/\/www.londonchiropracter.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.londonchiropracter.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.londonchiropracter.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.londonchiropracter.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=10435"}],"version-history":[{"count":0,"href":"https:\/\/www.londonchiropracter.com\/index.php?rest_route=\/wp\/v2\/posts\/10435\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.londonchiropracter.com\/index.php?rest_route=\/wp\/v2\/media\/10436"}],"wp:attachment":[{"href":"https:\/\/www.londonchiropracter.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=10435"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.londonchiropracter.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=10435"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.londonchiropracter.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=10435"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}