In these days, there are a growing interest in pattern recognition for tasks as predicting weather events, recommending best routes, intrusion detection or face detection. These tasks can be modelled as a classification problem, where a common alternative is using an ensemble model of classification. An usual ensemble model is given by Mixture of Experts model, which belongs to modular artificial neural networks consisting of two subcomponents type: networks of experts and Gating network, and whose combination creates an environment of competition among experts seeking to obtain patterns of the data source, in order to specialize in that particular task, all this supervised in the Gating network, which is the mediator agent and ponders the quality delivered by each expert model solution. We observe that this architecture assume that one gate influence one data point, consequently the training can be misleading to real datasets where the data is better explained by multiple experts. In this work, we present a variant of traditional MoE model, which consists of maximizing the entropy of the evaluation function in the Gating network in conjunction with standard error minimization. The results show the advantage of our approach in multiple datasets in terms of accuracy metric. As a future work, we plan to apply this idea to the Mixture-of-Experts with embedded feature selection.