Do we need hundreds of classifiers to solve real world classification problems?

dc.contributor.affiliationUniversidade de Santiago de Compostela. Centro de Investigación en Tecnoloxías da Informacióngl
dc.contributor.affiliationUniversidade de Santiago de Compostela. Departamento de Electrónica e Computacióngl
dc.contributor.areaÁrea de Enxeñaría e Arquitectura
dc.contributor.authorFernández Delgado, Manuel
dc.contributor.authorCernadas García, Eva
dc.contributor.authorBarro Ameneiro, Senén
dc.contributor.authorAmorim, Dinani Gomes
dc.date.accessioned2018-11-22T09:16:07Z
dc.date.available2018-11-22T09:16:07Z
dc.date.issued2014
dc.description.abstractWe evaluate 179 classifiers arising from 17 families (discriminant analysis, Bayesian, neural networks, support vector machines, decision trees, rule-based classifiers, boosting, bagging, stacking, random forests and other ensembles, generalized linear models, nearest-neighbors, partial least squares and principal component regression, logistic and multinomial regression, multiple adaptive regression splines and other methods), implemented in Weka, R (with and without the caret package), C and Matlab, including all the relevant classifiers available today. We use 121 data sets, which represent the whole UCI data base (excluding the large- scale problems) and other own real problems, in order to achieve significant conclusions about the classifier behavior, not dependent on the data set collection. The classifiers most likely to be the bests are the random forest (RF) versions, the best of which (implemented in R and accessed via caret) achieves 94.1% of the maximum accuracy overcoming 90% in the 84.3% of the data sets. However, the difference is not statistically significant with the second best, the SVM with Gaussian kernel implemented in C using LibSVM, which achieves 92.3% of the maximum accuracy. A few models are clearly better than the remaining ones: random forest, SVM with Gaussian and polynomial kernels, extreme learning machine with Gaussian kernel, C5.0 and avNNet (a committee of multi-layer perceptrons implemented in R with the caret package). The random forest is clearly the best family of classifiers (3 out of 5 bests classifiers are RF), followed by SVM (4 classifiers in the top-10), neural networks and boosting ensembles (5 and 3 members in the top-20, respectively)gl
dc.description.peerreviewedSIgl
dc.description.sponsorshipWe would like to acknowledge support from the Spanish Ministry of Science and Innovation (MICINN), which supported this work under projects TIN2011-22935 and TIN2012-32262gl
dc.identifier.citationFernández-Delgado, M., Cernadas, E., Barro, S. & Amorim, D. (2014). Do we Need Hundreds of Classifiers to Solve Real World Classification Problems?, JMLR, 15, 3133−3181gl
dc.identifier.doi10.1117/1.JRS.11.015020
dc.identifier.essn1533-7928
dc.identifier.issn1532-4435
dc.identifier.urihttp://hdl.handle.net/10347/17792
dc.language.isoenggl
dc.publisherJournal of Machine Learning Researchgl
dc.relation.projectIDinfo:eu-repo/grantAgreement/MICINN/Plan Nacional de I+D+i 2008-2011/TIN2011-22935/ES/SOFTLEARN: SOFT COMPUTING PARA MINERIA DE PROCESOS EN E-LEARNING
dc.relation.publisherversionhttp://jmlr.org/papers/v15/delgado14a.htmlgl
dc.rights© 2014 Manuel Fernández-Delgado, Eva Cernadas, Senén Barro, Dinani Amorimgl
dc.rights.accessRightsopen accessgl
dc.subjectClassificationgl
dc.subjectUCI data basegl
dc.subjectRandom forestgl
dc.subjectSupport vector machinegl
dc.subjectNeural networksgl
dc.subjectDecision treesgl
dc.subjectEnsemblesgl
dc.subjectRule-based classifiersgl
dc.subjectDiscriminant analysisgl
dc.subjectBayesian classifiersgl
dc.subjectGeneralized linear modelsgl
dc.subjectPartial least squares and principal component regressiongl
dc.subjectMultiple adaptive regression splinesgl
dc.subjectNearest-neighborsgl
dc.subjectLogistic and multinomial regressiongl
dc.titleDo we need hundreds of classifiers to solve real world classification problems?gl
dc.typejournal articlegl
dc.type.hasVersionVoRgl
dspace.entity.typePublication
relation.isAuthorOfPublicationfe860f28-b531-4cad-859e-a38536a615ea
relation.isAuthorOfPublication5b9d06b8-f9ab-4a8c-8105-38af29bd0562
relation.isAuthorOfPublicationaa2774e8-e4f1-4bdf-b706-6f69ce500e45
relation.isAuthorOfPublication.latestForDiscoveryfe860f28-b531-4cad-859e-a38536a615ea

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
2014_jmlr_fernandez_do_we.pdf
Size:
536.31 KB
Format:
Adobe Portable Document Format
Description: