Evolving Agents with NEAT

NeuroEvolution of Augmenting Topologies (NEAT) is a groundbreaking method for evolving neural networks using genetic algorithms. Unlike traditional methods that rely on fixed architectures, NEAT starts with minimal networks and progressively complexifies them through evolutionary processes.

One of the key ideas behind NEAT is to preserve innovation by protecting new structures through speciation. This allows for a more diverse set of network configurations to be explored.

The probability maximizing objective used in many of these evolutionary methods can be expressed as:

$$\max_\theta \mathbb{E}_{x \sim P}\left[ \log p_\theta(x) \right]$$

and we would maximize the expected likelihood of a favorable policy through gradient. However, in NEAT, the objective is to maximize the expected likelihood of a favorable policy through genetic algorithms.

In my experiments, I applied NEAT to create a Pong bot and evolve agents for Smash Bros-style games. These projects demonstrate NEAT's potential to generate adaptive, robust solutions even in dynamic and competitive environments.

Despite its "retro" reputation compared to modern deep learning techniques, NEAT's evolutionary approach provides unique insights into neural network design. For researchers and practitioners alike, it remains a powerful tool for exploring innovative architectures.

The Pong Experiment

Project done in Collaboration with Jet Chiang, check it out

Project done using the neat-python library, check it out

When school started, sun shines warmly. I had some spare time to work on a project that I have been wanting to do for a long time. I saw videos about evolved creatures and different kinds of games or environments that were evolved. I felt it was very interesting compared to deep learning (which are just matrix multiplication in very smart ways). I wanted to simulate the evolution of creatures like how animals evolved.

I started with a simple game called Pong. I wanted to see if I could evolve a Pong bot that could play the game well. I used NEAT to evolve the neural network that controlled the Pong bot. I was surprised by how well the bot played after a few generations. It was able to beat the game against me (that is partly due to my subpar Pong skills). However, just the raw neat is not enough to make the bot play well. Evidently, those who are more gifted in the game of Pong can still beat the bot easily.

Demo of a self-play Pong agent trained after 5 minutes.

How NEAT Works - Speciation, Crossover, and Mutation

In NEAT, each generation is composed of a population of "genome" structures, which represent neural networks. We call those network "phenotypes" of the genome. The network, is simply a directed graph, with no concept of "layers" (which we expect in a typical neural network). The genome is composed of nodes and connections, where nodes represent neurons and connections represent synapses. The nodes are divided into three types: input, output, and hidden nodes. The connections are weighted and can be either enabled or disabled. The genome is then evaluated based on a fitness function, which determines its performance in the given task. As in real evolution, the fittest genomes are selected to reproduce, passing on their genes to the next generation. This process involves crossover and mutation, which introduce new genetic material and drive diversity in the population.

One of the important pillars of NEAT is Speciation. Speciation is the process of dividing the population into species based on the similarity of the genomes. This is important because it allows the population to maintain diversity and prevents the population from converging to a local optimum. It also protects new innovations from being wiped out as the genome evolves. We determine the similarity of two genomes by calculating the compatibility distance between them. The compatibility distance is a measure of how similar two genomes are based on their structure, which is given by the formula:

$$\delta = c_1 \cdot E + c_2 \cdot D + c_3 \cdot W$$

where $$E$$ is the number of excess genes, $$D$$ is the number of disjoint genes, and $$W$$ is the average weight difference of matching genes. The coefficients $$c_1$$, $$c_2$$, and $$c_3$$ are constants that determine the importance of each term.

Crossover is the process of combining the genetic material of two parents to produce offspring. In NEAT, crossover is performed by aligning the genes of the parents based on their innovation numbers. Matching genes are inherited randomly from either parent, while disjoint or excess genes are inherited from the more fit parent.

Mutation introduces genetic diversity by altering the structure of the genome. In NEAT, mutation can occur in three ways: adding a new node, adding a new connection, or modifying the weight of an existing connection. These mutations allow the genome to explore new network topologies and improve its performance.

Smash Bros. Evolution

Project done in Collaboration with Eric Xie, Paul Dong, and Jet Chiang

Project done on UTMIST AI^2 Platform, check it out

The second time we use neat, is an integration of neat and LSTM on a smash bros game. We used the UTMIST AI^2 platform to create a smash bros game where the agents are evolved using neat. The agents are trained using the neat-python library. The agents are trained to play the game by maximizing the expected likelihood of a favorable policy through genetic algorithms.

Why did I use rNEAT?

I used rNEAT because it is a powerful tool for evolving neural networks. It allows for the evolution of complex structures through speciation, crossover, and mutation. rNEAT is particularly well-suited for evolving agents in dynamic and competitive environments, such as games. By combining rNEAT with LSTM, we can create agents that adapt to changing conditions and learn complex strategies over time.

For more information on LSTM, check out my post on LSTM.

Improvements

We observe that the agent started to evolve "sensible" moves like trying to attack after a short time. However, neat is extremely dependent on the set up of the training (the reward function, the environment, etc.). The agents seemed to be stuck in a certain local optimum where they just spam the same move over and over again. This is a common problem in neat, where the agents are stuck in a local optimum. This is a problem that is still being researched and is a problem that is still being worked on.

I spoke with one of the people that worked in AI Warehouse: Tanayjyot. The suggestion is to also evolve with the hyperparameters of the neat algorithm, by trying different training with different hyperparameters. This is a good suggestion and is something that I will try in the future.

Acknowledgment