Numerical Taxonomy aims to group in clusters, using so-called structure analysis of operational taxonomic units (OTUs or taxons or taxa) through numerical methods. Clusters that consitute families was the purpose of this series of last projects.
Structural analysis, based on their phenotypic characteristics, exhibits the relationships, in terms of degrees of similarity, between two or more OTUs.
Entities formed by dynamic domains of attributes, change according to taxonomical requirements:
Classification of objects to form families.
Taxonomic objects are represented by semantics application of Dynamic Relational Database Model.
Families of OTUs are obtained employing as tools i) the Euclidean distance and ii) nearest neighbor techniques. Thus taxonomic evidence is gathered so as to quantify the similarity for each pair of OTUs (pair-group method) obtained from the basic data matrix.
The main contribution up until now is to introduce the concept of spectrum of the OTUs, based in the states of their characters. The concept of families’ spectra emerges, if the superposition principle is applied to the spectra of the OTUs, and the groups are delimited through the maximum of the Bienaymé-Tchebycheff relation, that determines Invariants (centroid, variance and radius).
A new taxonomic criterion is thereby formulated.
An astronomic application is worked out. The result is a new criterion for the classification of asteroids in the hyperspace of orbital proper elements.
Thus, a new approach to Computational Taxonomy is presented, that has been already employed with reference to Data Mining.
This paper analyses the application of Machine Learning techniques to Data Mining. We focused our interest on the TDIDT (Top Down Induction Trees) induction family from pre-classified data, and in particular to the ID3 and the C4.5 algorithms, created by Quinlan. We tried to determine the degree of efficiency achieved by the TDIDT family’s algorithms when applied in data mining to generate valid models of the data in classification problems with the Gain of Entropy.
The Informatics (Data Mining and Computational Taxonomy), is always the original objective of our researches.