IMITATION BALL
fall 22'
Imitation Ball is a VR game based on a 3D Pong setup, designed as an exploration for developing game NPCs through reinforcement learning (RL) techniques. The project as a whole attempts to explore this intersection of machine learning and game development by giving humans and AI equal control over the same paddle mechanic. As a result, tuning training parameters and environment conditions began to reflect different behavioral qualities in the enemy AI, unique to traditional heuristic approaches to developing NPCs.
AI development also included the use of imitation learning with RL. With imitation learning, a developer can record a dataset of gameplay demonstrations to help guide an otherwise random agent to learn how to play the game and potentially to certain behaviors. Using generative adversarial imitation learning (GAIL) with on-policy RL algorithm proximal policy optimization (PPO) resulted in agents that both learn to maximize rewards through playing the game and play similarly to the given demonstrations.
Agent design workflow
Creating agent demonstration data for imitation learning
Parallel agent training environments
Atop the base gameplay mechanics, environmental variations were developed to explore new modes of agent training. This included adding targets that rewards agents for hitting them to create aiming behaviors, obstacles that would obstruct ball rallying, and a new type of agent altogether, the monkey, The monkey is a simpler agent that only moves in the x,y plane in the middle of the hallway, rewarded for hitting the ball back to whomever last hit it. Paddle and monkey agent can train together suggesting multi-agent approaches to developing different interacting agents.
From left to right: target training, obstacle training and monkey training
The agent's action space is comprised of a 6-length vector that includes agent position (3) and agent Euler rotation (3). When creating observations for the agent, the VR controller position is also translated into a similar 6-length vector to act as the target for the paddle's pose. The agent's observation space is a 13-length vector that includes agent rotation (4), agent position (3), ball position (3), and ball velocity (3).
For agent rewards, the base agent in an empty hallway is rewarded as follows:
-
+0.5 for making contact with the ball
-
+2 for contacting the ball on a central region of the paddle
-
-2 for conceding a life
-
+0.2 for maintaining a centered x,y position in the hallway
(left) Agent textures and expressions, (right) monkey-in-the-middle model
Agent expression system
Part of the project's ambition and outcomes was having agents of different difficulties and playstyles to go along with. By tuning agent action parameters, such as how fast they move and their ability to rotate, as well as certain training parameters like balancing reward signals between GAIL and PPO and total training time, I was able to get three distinct paddle AI.
Part of this project was to also extend these trained agents to impact the gameplay design itself. When considering the impact that a more behavior-aware NPC might have in a game or VR context, how they are personified is important to address. For art direction in this game, we decided to abstract the notion of AI but still provide uniquely designed expressions that respond to the nature of behaviors trained.
Daimian training runs using subsequent network weights per iteration for better rewards
Monkey training reflecting faster convergence given a simpler task
In conclusion, the final project was very successful in starting a conversation on how to further extend the impact machine learning could have on developing VR games and applications. Imitation Ball does a great job of creating bridges between game development and RL that speculating how to take this further is what we are left with. The two key directions to further this work firstly explores more complex gameplay to expose more nuanced behaviors, and then secondly, explore in greater detail the impact of imitation learning on crafting NPC behaviors. This subject of using machine learning for designing subjective behaviors for immersive experiences will have pressing implications for the future, and is a particular interest of mine to keep exploring