One programmer has given an AI model 50,000 hours worth of training in how to play Pokemon Red, leading to an algorithm that's capable of exploring the game and building a team to defeat the first gym leader - but not one that can find its way through Mt. Moon or know better than to keep buying Magikarp. Most of all, this exercise is a fascinating way to get an idea of how machine learning actually works.
As outlined in an extensive video by Peter Whidden, the AI is able to interact with the game through the usual control inputs on an emulator. It hits a button and looks at the screen to see what happened, the same as a human player. Whidden set learning sessions at two hours worth of game time apiece, though with emulation sped up those sessions could be completed in around six minutes of real-time - and the process was further sped up by running 40 testing sessions simultaneously.
Since a machine algorithm doesn't inherently care about beating a video game, Whidden set up particular goals for the AI to be rewarded for. To encourage curious exploration, the AI got a reward point whenever it saw something new, as measured by noticeably different pixels appearing on-screen. That has some unintended consequences - the AI would just stare, fascinated, at the slight animation of water, for example - but it broadly served to get the computer motivated to make it from Pallet Town through Viridian Forest and up to Pewter City, where the first gym battle against Brock takes place.
The AI needs further rewards and punishments, too. With rewards all tied up in seeing new things, the AI just wants to keep moving forward, which means it doesn't care about fighting battles or catching Pokemon, so it initially just ran away from every
Read more on gamesradar.com