AI models that play games go back decades, but they generally specialize in one game and always play to win. Google Deepmind researchers have a different goal with their latest creation: a model that learned to play multiple 3D games like a human, but also does its best to understand and act on your verbal instructions.
There are of course “AI” or computer characters that can do this kind of thing, but they’re more like features of a game: NPCs that you can use formal in-game commands to indirectly control.
Deepmind’s SIMA (scalable instructable multiworld agent) doesn’t have any kind of access to the game’s internal code or rules; instead, it was trained on many, many hours of video showing gameplay by humans. From this data — and the annotations provided by data labelers — the model learns to associate certain visual representations of actions, objects, and interactions. They also recorded videos of players instructing one another to do things in game.
For example, it might learn from how the pixels move in a certain pattern on screen that this is an action called “moving forward,” or when the character approaches a door-like object and uses the doorknob-looking object, that’s “opening” a “door.” Simple things like that, tasks or events that take a few seconds but are more than just pressing a key or identifying something.
The training videos were taken in multiple games, from Valheim to Goat Simulator 3, the developers of which were involved with and consenting to this use of their software. One of the main goals, the researchers said in a call with press, was to see whether training an AI to play one set of games makes it capable of playing others it hasn’t seen, a process called generalization.
The answer is yes, with caveats. AI agents trained on multiple games performed better on games they hadn’t been exposed to. But of course many games involve specific and unique mechanics or terms that will stymie the best-prepared AI. But there’s nothing stopping the model
Read more on techcrunch.com