Software agents are programs that can observe their
environment and act in an attempt to reach their design
goals. In most cases the selection of particular agent
architecture determines the behaviour in response to the
different problem states
However, there are some problem domains in which it is
desirable that the agent learns a good action execution policy
by interacting with its environment. This kind of learning is
called Reinforcement Learning and it is useful in the process
control area. Given a problem state, the agent selects the
adequate action to do and receives an immediate reward,
then estimations about every action are updated and, after a
certain period of time, the agent learns which the best action
to be executed is. Most reinforcement learning algorithms
perform simple actions while two or more are capable of
being used. This work involves the use of RL algorithms to
find an optimal policy in a gridworld problem and proposes
a mechanism to combine actions of different types.