The increasing trend towards delegating complex tasks to autonomous artificial agents in safety-critical socio-technical systems makes agent behavior monitoring of paramount importance. In this work, a probabilistic approach for on-line monitoring using optimal action selection and twin Gaussian processes (TGP) is proposed. A Kullback-Leibler (KL) based metric is proposed to characterize the deviation of an agent behavior (modeled as a controlled stochastic process) to its specification. The optimal behavior specification is obtained using Linearly Solvable Markov Decision Processes (LSMDP) whereby the Bellman equation is made linear through an exponential transformation such that the optimal control policy is obtained in an explicit form.