Q-Learning - Food/Poison

Press fast to speed up.

This is a practical implementation of the artificial intelligence algorithm Q-Learning. The tutorial A Painless Q-Learning Tutorial is a very nice introduction. The Q-Learning maximizes the expected reward for an action.

The black circle represents an agent. Green circles represent food (+1) and gray circles represent poison (-1). Food and poison are inserted with the same probability. The agent can move left, right or stay. The states are string representations of the objects in front of the agent, that's what he sees.

In some states, the exploration of new actions is done with a probability of 10%, if the outcomes of that action is not known.

We are using online learning, where training and using the algorithm are done simultaneously, i.e. we don't need to collect considerable amounts of data before the algorithm is trained. A consequence is that in early stages the performance is poor. In the beginning the outcome should be something like a random walk and as time goes by, the performance should be considerably better: the agent not only avoids poison, but catch more food. Simple changes to this code can produce agents which battle, chase others, run away or follow paths.