Cluster Ensembles for Big Data Mining Problems

Pividori, Milton; Stegmayer, Georgina; Milone, Diego H.

Buscar material

Busque entre los 168768 recursos disponibles en el repositorio

Subir material

Suba sus trabajos a SEDICI, para mejorar notoriamente su visibilidad e impacto

Mostrar el registro sencillo del ítem

dc.date.accessioned	2016-04-01T12:25:02Z
dc.date.available	2016-04-01T12:25:02Z
dc.date.issued	2015
dc.identifier.uri	http://sedici.unlp.edu.ar/handle/10915/51984
dc.description.abstract	Mining big data involves several problems and new challenges, in addition to the huge volume of information. One the one hand, these data generally come from autonomous and decentralized sources, thus its dimensionality is heterogeneous and diverse, and generally involves privacy issues. On the other hand, algorithms for mining data such as clustering methods, have particular characteristics that make them useful for different types of data mining problems. Due to the huge amount of information, the task of choosing a single clustering approach becomes even more difficult. For instance, k-means, a very popular algorithm, always assumes spherical clusters in data; hierarchical approaches can be used when there is interest in finding this type of structure; expectationmaximization iteratively adjusts the parameters of a statistical model to fit the observed data. Moreover, all these methods work properly only with relatively small data sets. Large-volume data often make their application unfeasible, not to mention if data come from autonomous sources that are constantly growing and evolving. In the last years, a new clustering approach has emerged, called consensus clustering or cluster ensembles. Instead of running a single algorithm, this approach produces, at first, a set of data partitions (ensemble) by employing different clustering techniques on the same original data set. Then, this ensemble is processed by a consensus function, which produces a single consensus partition that outperforms individual solutions in the input ensemble. This approach has been successfully employed for distributed data mining, what makes it very interesting and applicable in the big data context. Although many techniques have been proposed for large data sets, most of them mainly focus on making individual components more efficient, instead of improving the whole consensus approach for the case of big data.	en
dc.format.extent	52-54	es
dc.language	en	es
dc.subject	Data mining	es
dc.subject	big data	es
dc.subject	Clustering	es
dc.title	Cluster Ensembles for Big Data Mining Problems	en
dc.type	Objeto de conferencia	es
sedici.identifier.uri	http://44jaiio.sadio.org.ar/sites/default/files/agranda52-54.pdf	es
sedici.identifier.issn	2451-7569	es
sedici.creator.person	Pividori, Milton	es
sedici.creator.person	Stegmayer, Georgina	es
sedici.creator.person	Milone, Diego H.	es
sedici.subject.materias	Ciencias Informáticas	es
sedici.description.fulltext	true	es
mods.originInfo.place	Sociedad Argentina de Informática e Investigación Operativa (SADIO)	es
sedici.subtype	Objeto de conferencia	es
sedici.rights.license	Creative Commons Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0)
sedici.rights.uri	http://creativecommons.org/licenses/by-sa/3.0/
sedici.date.exposure	2015-09
sedici.relation.event	Simposio Argentino de GRANdes DAtos (AGRANDA 2015) - JAIIO 44 (Rosario, 2015)	es
sedici.description.peerReview	peer-review	es

Descargar archivos

Documento completo
Descargar archivo (295.4Kb) - PDF

Enlace externo

44jaiio.sadio.org.ar/...

Este ítem aparece en la(s) siguiente(s) colección(ones)

44 Jornadas Argentinas de Informática e Investigación Operativa (JAIIO) → Simposio Argentino de GRANdes DAtos (AGRANDA 2015)

Creative Commons Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0)

Excepto donde se diga explícitamente, este item se publica bajo la siguiente licencia Creative Commons Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0)

Iniciar sesión