Do we need hundreds of classifiers to solve real world classification problems?

Fernández Delgado, Manuel; Cernadas García, Eva; Barro Ameneiro, Senén; Amorim, Dinani Gomes

doi:10.1117/1.JRS.11.015020

Do we need hundreds of classifiers to solve real world classification problems?

dc.contributor.affiliation	Universidade de Santiago de Compostela. Centro de Investigación en Tecnoloxías da Información	gl
dc.contributor.affiliation	Universidade de Santiago de Compostela. Departamento de Electrónica e Computación	gl
dc.contributor.area	Área de Enxeñaría e Arquitectura
dc.contributor.author	Fernández Delgado, Manuel
dc.contributor.author	Cernadas García, Eva
dc.contributor.author	Barro Ameneiro, Senén
dc.contributor.author	Amorim, Dinani Gomes
dc.date.accessioned	2018-11-22T09:16:07Z
dc.date.available	2018-11-22T09:16:07Z
dc.date.issued	2014
dc.description.abstract	We evaluate 179 classifiers arising from 17 families (discriminant analysis, Bayesian, neural networks, support vector machines, decision trees, rule-based classifiers, boosting, bagging, stacking, random forests and other ensembles, generalized linear models, nearest-neighbors, partial least squares and principal component regression, logistic and multinomial regression, multiple adaptive regression splines and other methods), implemented in Weka, R (with and without the caret package), C and Matlab, including all the relevant classifiers available today. We use 121 data sets, which represent the whole UCI data base (excluding the large- scale problems) and other own real problems, in order to achieve significant conclusions about the classifier behavior, not dependent on the data set collection. The classifiers most likely to be the bests are the random forest (RF) versions, the best of which (implemented in R and accessed via caret) achieves 94.1% of the maximum accuracy overcoming 90% in the 84.3% of the data sets. However, the difference is not statistically significant with the second best, the SVM with Gaussian kernel implemented in C using LibSVM, which achieves 92.3% of the maximum accuracy. A few models are clearly better than the remaining ones: random forest, SVM with Gaussian and polynomial kernels, extreme learning machine with Gaussian kernel, C5.0 and avNNet (a committee of multi-layer perceptrons implemented in R with the caret package). The random forest is clearly the best family of classifiers (3 out of 5 bests classifiers are RF), followed by SVM (4 classifiers in the top-10), neural networks and boosting ensembles (5 and 3 members in the top-20, respectively)	gl
dc.description.peerreviewed	SI	gl
dc.description.sponsorship	We would like to acknowledge support from the Spanish Ministry of Science and Innovation (MICINN), which supported this work under projects TIN2011-22935 and TIN2012-32262	gl
dc.identifier.citation	Fernández-Delgado, M., Cernadas, E., Barro, S. & Amorim, D. (2014). Do we Need Hundreds of Classifiers to Solve Real World Classification Problems?, JMLR, 15, 3133−3181	gl
dc.identifier.doi	10.1117/1.JRS.11.015020
dc.identifier.essn	1533-7928
dc.identifier.issn	1532-4435
dc.identifier.uri	http://hdl.handle.net/10347/17792
dc.language.iso	eng	gl
dc.publisher	Journal of Machine Learning Research	gl
dc.relation.projectID	info:eu-repo/grantAgreement/MICINN/Plan Nacional de I+D+i 2008-2011/TIN2011-22935/ES/SOFTLEARN: SOFT COMPUTING PARA MINERIA DE PROCESOS EN E-LEARNING
dc.relation.publisherversion	http://jmlr.org/papers/v15/delgado14a.html	gl
dc.rights	© 2014 Manuel Fernández-Delgado, Eva Cernadas, Senén Barro, Dinani Amorim	gl
dc.rights.accessRights	open access	gl
dc.subject	Classification	gl
dc.subject	UCI data base	gl
dc.subject	Random forest	gl
dc.subject	Support vector machine	gl
dc.subject	Neural networks	gl
dc.subject	Decision trees	gl
dc.subject	Ensembles	gl
dc.subject	Rule-based classifiers	gl
dc.subject	Discriminant analysis	gl
dc.subject	Bayesian classifiers	gl
dc.subject	Generalized linear models	gl
dc.subject	Partial least squares and principal component regression	gl
dc.subject	Multiple adaptive regression splines	gl
dc.subject	Nearest-neighbors	gl
dc.subject	Logistic and multinomial regression	gl
dc.title	Do we need hundreds of classifiers to solve real world classification problems?	gl
dc.type	journal article	gl
dc.type.hasVersion	VoR	gl
dspace.entity.type	Publication
relation.isAuthorOfPublication	fe860f28-b531-4cad-859e-a38536a615ea
relation.isAuthorOfPublication	5b9d06b8-f9ab-4a8c-8105-38af29bd0562
relation.isAuthorOfPublication	aa2774e8-e4f1-4bdf-b706-6f69ce500e45
relation.isAuthorOfPublication.latestForDiscovery	fe860f28-b531-4cad-859e-a38536a615ea

Files

Original bundle

Now showing 1 - 1 of 1

Name:: 2014_jmlr_fernandez_do_we.pdf
Size:: 536.31 KB
Format:: Adobe Portable Document Format
Description:

Download

Collections

Centro de Investigación en Tecnoloxías Intelixentes da USC (CiTIUS)
Electrónica e Computación