top of page


spring 23'

Continuing from Part 1, in Part 2 of overviewing my final MSc thesis, I explore the ways I have developed systems and frameworks for evaluating the agent variants and human demonstration such that we can make better-informed decisions about the nature of designing and choosing agents trained with RL for a variety of different needs. Often the case of RL, one evaluates the final agents through the rewards received during training, or just qualitatively from observing how they perform in their trained scenario in real-time. The shortcoming is that when basing an evaluation on just rewards, we are not well informed on the agent's behavioral nature in applied scenarios. This is especially a shortcoming when we want the agent to behave a certain way that is say more human-like. Too, when evaluating qualitatively with the agent in the field, we lack quantitative metrics that correspond with the temporal nature of the agent's actions. In this part of the thesis, I go over how we can start exploring interesting methods by constructing a latent space for the agents' actions and diagrams for tracking actions.

Left: Dietmar Offenhuber's proposal for Autographic Visualization of material traces in data

Right: Mnih et al. using t-SNE to project in 2D the latent actions of an RL agent playing Space Invaders

I wanted to approach this novel application of RL with a unique way of exploring agent actions and variants, inspired by both my previous work and that of other researchers ranging from information theorist Dietmar Offenhuber, and work dating back to original work done by DeepMind in the field of deep RL. What I borrow are techniques on how to visualize and treat data such that we can compare and understand the difference between trained agent variants beyond just qualitatively analyzing their behavior. Similar to A Walk in Latent City, I consider creating a latent embedding space with t-SNE and PCA but with visualizations that are designed to be more contextualized. My goal is to produce a visualization similar to Mnih et al. that connects 3D observations and its corresponding latent action embedding.

Sample latent action embeddings of the Guiding Agents, visualizing the DataFrame directly in Plotly

These latent action spaces are modeled by collecting action-observation pairs from a trained agent. All of the agent variants and their recorded actions are collected for t-SNE and their action vectors are projected to three dimensions. From this space, we can already begin comparing and visualizing the difference between the structural nature of different agent variants' action embeddings. That being said, it would be nice to spruce it up a little to reference more the nature of the co-creation environment. As a result, I utilize the corresponding observation to the action embedding in latent space to reconstruct what the agent observed to make that action, replacing a standard 'plain' data point.

Furthermore, I also made it such that instead of the action embeddings being just the sole action vector, where applicable, an action vector fed to the t-SNE model is actually a concatenation of its previous three action vectors. The reason for this is that in RL and PPO, the agent learns from a sequence of actions instead of solely just one. I acknowledge this in my latent space to try and emphasize the sequence of agent actions taken.

Left: A representation of a Guiding Agent's binary observation of the Block Matrix

Right: A representation of the Creator Agent's observation of itself in context of the Block Matrix and its target

In my final interface, a user can navigate the 3D latent space as if they were using an application like Unity or Blender. Agent variants can be toggled on and off and specific embeddings can be viewed by clicking on them, viewing metadata and the preceding action embeddings. My hope is that this interface can lead to exploring a number of different scales of observing RL agents and their variants, from comparing the macro action structure and similarity between agents, as well understanding what makes an agent unique at the embedding-specific level. Furthermore, recorded human demonstrations can also be a part of this latent space such that a comparison between human and agent actions can be made, quantifying the effect that GAIL and human demonstration has on agent training.

Exploring the Guiding Agent's latent action space

Exploring the Creator Agent's latent action space

To see this working, we can evaluate statically the overall structure of these action embeddings. What I am most interested in, and what becomes evident, is the relationship between agents trained with human demonstration, without, and the human demonstration itself. What you can see, especially for the Creator Agent plot is the clear division between agent actions embeddings of the human demonstration and the agent trained with demonstration, as compared to the base agent's. This helps quantify the similarity of the behaviors of the agent variants to the human demonstration. I propose that more investigation into formulating Euclidean metrics in this space can even provide more granular comparative values between agent variants.

View of the Creator Agent variants action embeddings

GA plot.png
GA plot.png

View of the Guiding Agent variants action embeddings

Developing the latent space and its visualization was an essential contribution from my thesis, that being said, I also created other visualizations and means for evaluating agent differences and their relationships to the human demonstration. These techniques help us analyze the qualitative difference between the agents and differ given the natural differences between the Creator and Guiding Agent. 

The following visualizations are derived from a unique orthographic camera view that abstracts the Creator's Agent's action within the design space. The goal of this view is to help a user understand the difference in movement patterns between agent variants, which might not be obvious through a 3rd-person camera or even 1st-person in VR. By distilling the 3D nature of this environment into 2D we can begin to see clearly the effect the human demonstration has on the trained behavior of the agent. The agent trained from demonstration without any prior trained weights offer a smoother more human-like movement and looking behavior.

From left to right: Base Agent, Base-Demo Agent, and Demo Agent

For the Guiding Agent, more traditional metrics are used to record the distribution of actions taken and the performance of the agent relative to the target voxel design. In the below views, you can see that the agent trained simply with PPO is able to achieve up to 50% of the target design. That being said, the set up is rather simple, not lending itself to much performance. Also, there is a clear lack of stylistic consistency suggested by human demonstration. Even visible in the latent space of the Guiding Agent, there is a low correlation between demonstrated human actions to that of the trained agent sequences. As a result, more work needs to be done to consider RL in this field of design action making.

Top row, left to right: Base Agent and Multi-Agent

Bottom row, left to right: Red 1 Agent and Red 2 Agent

Again, this is just a very brief overview of the contributions of my thesis and what stood out to me in my discovery process of trying to evaluate RL agents, their variants, and human demonstration impact for a unique problem. One of my key conclusions from a user standpoint is that in the space of VR and design, human-like styles and behaviors are very important, especially on the topic of co-creation and human-AI interaction. I say this because especially in VR, the agent's actions are especially felt, where performance does not become the sole ultimate criterion. I hope my work helps expand the conversation we might have with AI and the emergence of immersive technologies, and advocate for explorations that engage with more open-ended behavioral questions like design in the context of humans and AI.

I will make the full print of my thesis available when able, otherwise feel free to reach out to me with questions, as I have left out many details regarding methods, like training and implementation processes, that are essential to this thesis. That being said, the technology is not in focus, whereas the impact of the technologies is.

latent view 2.png

Some nicely framed images of the latent spaces to enjoy

bottom of page