{"id":8813,"date":"2021-11-07T16:01:18","date_gmt":"2021-11-07T16:01:18","guid":{"rendered":"http:\/\/TheNextWeb=1372338"},"modified":"2021-11-07T16:01:18","modified_gmt":"2021-11-07T16:01:18","slug":"reinforcement-learning-makes-for-shitty-ai-teammates-in-co-op-games","status":"publish","type":"post","link":"https:\/\/www.londonchiropracter.com\/?p=8813","title":{"rendered":"Reinforcement learning makes for shitty AI teammates in co-op games"},"content":{"rendered":"\n<p><em>This article is part of our <a href=\"https:\/\/bdtechtalks.com\/tag\/ai-research-papers\/\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">reviews of AI research papers<\/a>, a series of posts that explore the latest findings in artificial intelligence.<\/em><\/p>\n<p>Artificial intelligence has proven that <a href=\"https:\/\/bdtechtalks.com\/2018\/07\/02\/ai-plays-chess-go-poker-video-games\/\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">complicated board and video games<\/a> are no longer the exclusive domain of the human mind. From chess to Go to StarCraft, AI systems that use reinforcement learning algorithms have outperformed human world champions in recent years.<\/p>\n<p>But despite the high individual performance of RL agents, they can become frustrating teammates when paired with human players, according to a study by AI researchers at MIT Lincoln Laboratory. The study, which involved cooperation between humans and AI agents in the card game Hanabi, shows that players prefer the classic and predictable rule-based AI systems over complex RL systems.<\/p>\n<p>The findings, presented in a <a href=\"https:\/\/arxiv.org\/abs\/2107.07630\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">paper published on arXiv<\/a>, highlight some of the underexplored challenges of applying reinforcement learning to real-world situations and can have important implications for the future development of AI systems that are meant to cooperate with humans.<\/p>\n<h2>Finding the gap in reinforcement learning<\/h2>\n<p><a href=\"https:\/\/bdtechtalks.com\/2021\/09\/02\/deep-reinforcement-learning-explainer\/\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">Deep reinforcement learning<\/a>, the algorithm used by state-of-the-art game-playing bots, starts by providing an agent with a set of possible actions in the game, a mechanism to receive feedback from the environment, and a goal to pursue. Then, through numerous episodes of gameplay, the RL agent gradually goes from taking random actions to learning sequences of actions that can help it maximize its goal.<\/p>\n<p>Early research of deep reinforcement learning relied on the agent being pretrained on gameplay data from human players. More recently, researchers have been able to develop RL agents that can learn games from scratch through pure self-play <a href=\"https:\/\/bdtechtalks.com\/2019\/01\/02\/humanizing-ai-deep-learning-alphazero\/\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">without human input<\/a>.<\/p>\n<p>In their study, the researchers at MIT Lincoln Laboratory were interested in finding out if a reinforcement learning program that outperforms humans could become a reliable coworker to humans.<\/p>\n<p>\u201cAt a very high level, this work was inspired by the question: What technology gaps exist that prevent reinforcement learning (RL) from being applied to real-world problems, not just video games?\u201d Dr. Ross Allen, AI researcher at Lincoln Laboratory and co-author of the paper, told TechTalks. \u201cWhile many such technology gaps exist (e.g., the real world is characterized by uncertainty\/partial-observability, data scarcity, ambiguous\/nuanced objectives, disparate timescales of decision making, etc.), we identified the need to collaborate with humans as a key technology gap for applying RL in the real-world.\u201d<\/p>\n<h2>Adversarial vs cooperative games<\/h2>\n<figure class=\"post-image post-mediaBleed aligncenter\"><img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/i1.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2021\/10\/DOTA-2-reinforcement-learning.jpg?ssl=1\" alt=\"A depiction of reinforcement learning used by an AI in the game Dota 2\" width=\"1920\" height=\"1080\" class=\"js-lazy\"><figcaption><a href=\"https:\/\/thenextweb.com\/news\/reinforcement-learning-makes-for-shitty-ai-teammates-co-op-games-syndication#\" data-url=\"https:\/\/twitter.com\/intent\/tweet?url=https%3A%2F%2Feditorial.thenextweb.com%2Fneural%2F2021%2F11%2F07%2Freinforcement-learning-makes-for-shitty-ai-teammates-co-op-games-syndication%2F&amp;via=thenextweb&amp;related=thenextweb&amp;text=Check out this picture on: A depiction of reinforcement learning used by an AI in the game Dota 2\" data-title=\"Share A depiction of reinforcement learning used by an AI in the game Dota 2 on Twitter\" data-width=\"685\" data-height=\"500\" class=\"post-image-share popitup\" title=\"Share A depiction of reinforcement learning used by an AI in the game Dota 2 on Twitter\"><i class=\"icon icon--inline icon--twitter--dark\"><\/i><\/a>A depiction of reinforcement learning used by an AI in the game <em>Dota 2<\/em><\/figcaption><noscript><img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/i1.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2021\/10\/DOTA-2-reinforcement-learning.jpg?ssl=1\" alt=\"A depiction of reinforcement learning used by an AI in the game Dota 2\" width=\"1920\" height=\"1080\" class><\/noscript><\/figure>\n<p>Recent research mostly applies reinforcement learning to single-player games (e.g., <em>Atari Breakout<\/em>) or adversarial games (e.g., <em>StarCraft, Go<\/em>), where the AI is pitted against a human player or another game-playing bot.<\/p>\n<p>\u201cWe think that reinforcement learning is well suited to address problems on human-AI collaboration for similar reasons that RL has been successful in human-AI competition,\u201d Allen said. \u201cIn competitive domains RL was successful because it avoided the biases and assumptions on how a game should be played, instead learning all of this from scratch.\u201d<\/p>\n<p>In fact, in some cases, the reinforcement systems have managed to hack the games and find tricks that baffled even the most talented and experienced human players. One famous example was a move made by DeepMind\u2019s AlphaGo in its matchup against Go world champion Lee Sedol. Analysts first thought the move was a mistake because it went against the intuitions of human experts. But the same move ended up turning the tide in favor of the AI player and defeating Sedol. Allen thinks the same kind of ingenuity can come into play when RL is teamed up with humans.<\/p>\n<p>\u201cWe think RL can be leveraged to advance the state of the art of human-AI collaboration by avoiding the preconceived assumptions and biases that characterize \u2018rule-based expert systems,\u201d Allen said.<\/p>\n<p>For their experiments, the researchers chose <em>Hanabi<\/em>, a card game in which two to five players must cooperate to play their cards in a specific order. <em>Hanabi<\/em> is especially interesting because while simple, it is also a game of full cooperation and limited information. Players must hold their cards backward and can\u2019t see their faces. Accordingly, each player can see the faces of their teammates\u2019 cards. Players can use a limited number of tokens to provide each other clues about the cards they\u2019re holding. Players must use the information they see on their teammates\u2019 hands and the limited hints they know about their own hand to develop a winning strategy.<\/p>\n<figure>\n<p><iframe loading=\"lazy\" src=\"\/\/www.youtube.com\/embed\/FZlk3rHbPcI\" height=\"240\" width=\"320\" allowfullscreen frameborder=\"0\">[embedded content]<\/iframe><\/p>\n<\/figure>\n<p><!--resp-video-container--><\/p>\n<p>\u201cIn the pursuit of real-world problems, we have to start simple,\u201d Allen said. \u201cThus we focus on the benchmark collaborative game of <em>Hanabi<\/em>.\u201d<\/p>\n<p>In recent years, several research teams have explored the development of AI bots that can play <em>Hanabi<\/em>. Some of these agents use <a href=\"https:\/\/bdtechtalks.com\/2019\/11\/18\/what-is-symbolic-artificial-intelligence\/\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">symbolic AI<\/a>, where the engineers provide the rules of gameplay beforehand, while others use reinforcement learning.<\/p>\n<p>The AI systems are rated based on their performance in self-play (where the agent plays with a copy of itself), cross-play (where the agent is teamed with other types of agents), and human-play (the agent is cooperates with a human).<\/p>\n<figure class=\"post-image post-mediaBleed aligncenter\"><img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/i0.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2021\/11\/Hanabi-reinforcement-learning-and-symbolic-AI-systems.jpg?resize=696%2C382&amp;ssl=1\" alt=\"Hanabi reinforcement learning and symbolic AI systems\" width=\"696\" height=\"382\" class=\"js-lazy\"><figcaption><a href=\"https:\/\/thenextweb.com\/news\/reinforcement-learning-makes-for-shitty-ai-teammates-co-op-games-syndication#\" data-url=\"https:\/\/twitter.com\/intent\/tweet?url=https%3A%2F%2Feditorial.thenextweb.com%2Fneural%2F2021%2F11%2F07%2Freinforcement-learning-makes-for-shitty-ai-teammates-co-op-games-syndication%2F&amp;via=thenextweb&amp;related=thenextweb&amp;text=Check out this picture on: Hanabi-reinforcement-learning-and-symbolic-AI-systems\" data-title=\"Share Hanabi-reinforcement-learning-and-symbolic-AI-systems on Twitter\" data-width=\"685\" data-height=\"500\" class=\"post-image-share popitup\" title=\"Share Hanabi-reinforcement-learning-and-symbolic-AI-systems on Twitter\"><i class=\"icon icon--inline icon--twitter--dark\"><\/i><\/a>Hanabi-reinforcement-learning-and-symbolic-AI-systems<\/figcaption><noscript><img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/i0.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2021\/11\/Hanabi-reinforcement-learning-and-symbolic-AI-systems.jpg?resize=696%2C382&amp;ssl=1\" alt=\"Hanabi reinforcement learning and symbolic AI systems\" width=\"696\" height=\"382\" class><\/noscript><\/figure>\n<p>\u201cCross-play with humans, referred to as human-play, is of particular importance as it measures human-machine teaming and is the foundation for the experiments in our paper,\u201d the researchers write.<\/p>\n<p>To test the efficiency of human-AI cooperation, the researchers used <a href=\"https:\/\/github.com\/Quuxplusone\/Hanabi\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">SmartBot<\/a>, the top-performing rule-based AI system in self-play, and <a href=\"https:\/\/arxiv.org\/abs\/1912.02288\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">Other-Play<\/a>, a <em>Hanabi<\/em> bot that ranked highest in cross-play and human-play among RL algorithms.<\/p>\n<p>\u201cThis work directly extends previous work on RL for training <em><span>Hanabi<\/span><\/em>agents. In particular we study the \u2018Other Play\u2019 RL agent from Jakob Foerster\u2019s lab,\u201d Allen said. \u201cThis agent was trained in such a way that made it particularly well suited for collaborating with other agents it had not met during training. It had produced state-of-the-art performance in <em><span>Hanabi<\/span><\/em>when teamed with other AI it had not met during training.\u201d<\/p>\n<h2>Human-AI cooperation<\/h2>\n<p>In the experiments, human participants played several games of <em>Hanabi<\/em> with an AI teammate. The players were exposed to both SmartBot and Other-Play but weren\u2019t told which algorithm was working behind the scenes.<\/p>\n<p>The researchers evaluated the level of human-AI cooperation based on objective and subjective metrics. Objective metrics include scores, error rates, etc. Subjective metrics include the experience of the human players, including the level of trust and comfort they feel in their AI teammate, and their ability to understand the AI\u2019s motives and predict its behavior.<\/p>\n<p>There was no significant difference in the objective performance of the two AI agents. But the researchers expected the human players to have a more positive subjective experience with Other-Play, since it had been trained to cooperate with agents other than itself.<\/p>\n<p>\u201cOur results were surprising to us because of how strongly human participants reacted to teaming with the Other Play agent. In short, they hated it,\u201d Allen said.<\/p>\n<p>According to the surveys from the participants, the more experienced <em>Hanabi<\/em> players had a poorer experience with Other-Play RL algorithm in comparison to the rule-based SmartBot agent. One of the key points to success in <em>Hanabi<\/em> is the skill of providing subtle hints to other players. For example, say the \u201cone of squares\u201d card is laid on the table and your teammate holds the two of squares in his hand. By pointing at the card and saying \u201cthis is a two\u201d or \u201cthis is a square,\u201d you\u2019re implicitly telling your teammate to play that card without giving him full information about the card. An experienced player would catch on the hint immediately. But providing the same kind of information to the AI teammate proves to be much more difficult.<\/p>\n<p>\u201cI gave him information and he just throws it away,\u201d one participant said after being frustrated with the Other-Play agent, according to the paper. Another said, \u201cAt this point, I don\u2019t know what the point is.\u201d<\/p>\n<p>Interestingly, Other-Play is designed to avoid the creation of \u201csecretive\u201d conventions that RL agents develop when they only go through self-play. This makes Other-Play an optimal teammate for AI algorithms that weren\u2019t part of its training regime. But it still has assumptions about the types of teammates it will encounter, the researchers note.<\/p>\n<p>\u201cNotably, [Other-Play] assumes that teammates are also optimized for zero-shot coordination. In contrast, human <em>Hanabi<\/em> players typically do not learn with this assumption. Pre-game convention-setting and post-game reviews are common practices for human <em>Hanabi<\/em> players, making human learning more akin to few-shot coordination,\u201d the researchers note in their paper.<\/p>\n<h2>Implications for future AI systems<\/h2>\n<p>\u201cOur current findings give evidence that an AI\u2019s objective task performance alone (what we refer to as \u2018self-play\u2019 and \u2018cross-play\u2019 in the paper) may not correlate to human trust and preference when collaborating with that AI,\u201d Allen said. \u201cThis raises the question: what objective metrics do correlate to subjective human preferences? Given the huge amount of data needed to train RL-based agents, it\u2019s not really tenable to train with humans in the loop. Therefore, if we want to train AI agents that are accepted and valued by human collaborators, we likely need to find trainable objective functions that can act as surrogates to, or strongly correlate with, human preferences.\u201d<\/p>\n<p>Meanwhile, Allen warns against extrapolating the results of the <em>Hanabi<\/em> experiment to other environments, games, or domains that they have not been able to test. The paper also acknowledges some of the limits in the experiments, which the researchers are working to address in the future. For example, the subject pool was small (29 participants) and skewed toward people who were skilled in <em>Hanabi<\/em>, which implies that they had predefined behavioral expectations from the AI teammate and were more likely to have a negative experience with the eccentric behavior of the RL agent.<\/p>\n<p>Nonetheless, the results can have important implications for the future of reinforcement learning research.<\/p>\n<p>\u201cIf state-of-the-art RL agents can\u2019t even make an acceptable collaborator in a game as constrained and narrow scope as <em>Hanabi<\/em>; should we really expect that same RL techniques to \u2018just work\u2019 when applied to more complicated, nuanced, consequential games and real-world situations?\u201d Allen said. \u201cThere is a lot of buzz about reinforcement learning within tech and academic fields; and rightfully so. However, I think our findings show that the remarkable performance of RL systems shouldn\u2019t be taken for granted in all possible applications.\u201d<\/p>\n<p>For example, it might be easy to assume that RL is could be used to train robotic agents capable of close collaboration with humans. But the results from the work done at MIT Lincoln Laboratory suggests the contrary, at least given the current state of the art, Allen says.<\/p>\n<p>\u201cOur results seem to imply that much more theoretical and applied work is needed before learning-based agents will be effective collaborators in complicated situations like human-robot interactions,\u201d he said.<\/p>\n<p><em>This article was originally published by Ben Dickson on <a href=\"https:\/\/bdtechtalks.com\/\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">TechTalks<\/a>, a publication that examines trends in technology, how they affect the way we live and do business, and the problems they solve. But we also discuss the evil side of technology, the darker implications of new tech, and what we need to look out for. You can read the original article <a href=\"https:\/\/bdtechtalks.com\/2021\/11\/01\/reinforcement-learning-hanabi\/\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">here<\/a>.<\/em><\/p>\n<p> <a href=\"https:\/\/thenextweb.com\/news\/reinforcement-learning-makes-for-shitty-ai-teammates-co-op-games-syndication\">Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>This article is part of our reviews of AI research papers, a series of posts that explore the latest findings in artificial intelligence. Artificial intelligence has proven that complicated board and video&#8230;<\/p>\n","protected":false},"author":1,"featured_media":8814,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[1],"tags":[],"_links":{"self":[{"href":"https:\/\/www.londonchiropracter.com\/index.php?rest_route=\/wp\/v2\/posts\/8813"}],"collection":[{"href":"https:\/\/www.londonchiropracter.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.londonchiropracter.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.londonchiropracter.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.londonchiropracter.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=8813"}],"version-history":[{"count":0,"href":"https:\/\/www.londonchiropracter.com\/index.php?rest_route=\/wp\/v2\/posts\/8813\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.londonchiropracter.com\/index.php?rest_route=\/wp\/v2\/media\/8814"}],"wp:attachment":[{"href":"https:\/\/www.londonchiropracter.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=8813"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.londonchiropracter.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=8813"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.londonchiropracter.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=8813"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}