Evolving the goal priorities of autonomous agents

Projects

Ray tracing

Project goals

The objective of this research is to design a multi-agent control system that allows the agents to intelligently decide how to handle prioritizing and combine the actions from multiple, conflicting goals.

Approach

The method proposed in this research uses a Genetic Algorithm (GA) to evolve the weights associated to each of the agents' goals. At each time step in the simulation, an agent gathers information about its immediate environment and sends this data to five different goal functions. The five goal functions are: avoid obstacles, avoid agents, momentum, follow obstacle, and go to target. These goal functions return a two-dimensional vector that tells the agent where it should move in order to accomplish this goal. For example, the avoid obstacle goal function would return a vector perpendicular to the obstacle closest to the agent. Next, the five vectors returned by the goal functions are multiplied by their respective goal weights. These goal weights range from 0.0 to 1.0 and indicate the priority of conflicting goals. Finally, these weighted vectors are summed together and then normalized, giving the agent its next direction for movement. It is the goal weights that are evolved by the GA.

Further details

These two images show the simulation environments that the agents learn in. The black round object centered at the top of the environments is where the agents start, the four grey regions near the corners are the areas of interest, and the smaller black squares are the obstacles.

This image illustrates the problem that occurs when vectors from the avoid obstacle and follow obstacle goals are summed together. The black area is the obstacle, the vector perpendicular to the obstacle is the one returned by the avoid obstacle goal function, and the parallel vector is the one returned by the follow obstacle goal function. The resulting vector is at a 45 degree angle to the obstacle. Without the comfort value, this would cause obstacles to never be followed.

These images show the effects of adding in different random vector lengths to the final direction vector obtained from the goal functions. The random vector length in the top-left image is 0.0, whereas the top-right image was obtained using a length of 0.01. With no random vector, the agents bounce back and forth between the top and bottom of the environment. The upper right image shows that even small amounts of randomness in the system can significantly affect its overall behavior and avoid this repetitive behavior. The bottom-left agent has a randomness value of 0.02, while the bottom-right agent's randomness value is 0.04.

These graphs show the average fitness per generation for the thirty runs for each of the environments. Fitness is computed by summing the number of agents alive and the number of AOIs seen by each agent. The graphs are ordered so that they correspond to the environments above. Along with the average fitness, the standard deviation of that average and the best fitness per generation are shown.

These two figures show the plots of the evolved values from the last generation of each individual for all runs. If the goal function is not used by the agents in the environment (such as the avoid obstacle goal in Environment 1), a large range of evolved values is produced. On the other hand, if the parameter is important to the agents' behavior, it typically evolves within a specific range.

This is an extremely efficient behavior that was evolved. There is no "follow the leader" strategy employed in our system, but because the agents are deterministic and the evolved randomness value was 0.0, a "follow the leader" behavior emerged. The lines extending from the agents to the obstacles indicate that the agent is sensing that obstacle.

Future work

Social interactions between agents
- Allow communication of data between agents
- New immediate goal functions needed
Allow agents to have more than one set of goal weights
- Depending on the agent's state (hungry, low on fuel, in danger, etc.) use a different set of goal weights
Other ways to combine vectors from immediate goal functions
- Non-linear combination of vectors
- Genetic programming
Better test scenarios
Evolve parameters that generalize well to unseen environments

Additional information

UCF I2Forum 2006 poster [ppt]
UCF Graduate Research Forum 2006 presentation [ppt]