Do we need hundreds of classifiers to solve real world classification problems?

Fernández Delgado, Manuel; Cernadas García, Eva; Barro Ameneiro, Senén; Amorim, Dinani Gomes

doi:10.1117/1.JRS.11.015020

Do we need hundreds of classifiers to solve real world classification problems?

Files

2014_jmlr_fernandez_do_we.pdf (536.31 KB)

Identifiers

URI: http://hdl.handle.net/10347/17792

ISSN: 1532-4435

E-ISSN: 1533-7928

DOI: 10.1117/1.JRS.11.015020

Publication date

2014

Authors

Fernández Delgado, Manuel

Cernadas García, Eva

Barro Ameneiro, Senén

Amorim, Dinani Gomes

Publisher

Journal of Machine Learning Research

Metrics

Export

Abstract

We evaluate 179 classifiers arising from 17 families (discriminant analysis, Bayesian, neural networks, support vector machines, decision trees, rule-based classifiers, boosting, bagging, stacking, random forests and other ensembles, generalized linear models, nearest-neighbors, partial least squares and principal component regression, logistic and multinomial regression, multiple adaptive regression splines and other methods), implemented in Weka, R (with and without the caret package), C and Matlab, including all the relevant classifiers available today. We use 121 data sets, which represent the whole UCI data base (excluding the large- scale problems) and other own real problems, in order to achieve significant conclusions about the classifier behavior, not dependent on the data set collection. The classifiers most likely to be the bests are the random forest (RF) versions, the best of which (implemented in R and accessed via caret) achieves 94.1% of the maximum accuracy overcoming 90% in the 84.3% of the data sets. However, the difference is not statistically significant with the second best, the SVM with Gaussian kernel implemented in C using LibSVM, which achieves 92.3% of the maximum accuracy. A few models are clearly better than the remaining ones: random forest, SVM with Gaussian and polynomial kernels, extreme learning machine with Gaussian kernel, C5.0 and avNNet (a committee of multi-layer perceptrons implemented in R with the caret package). The random forest is clearly the best family of classifiers (3 out of 5 bests classifiers are RF), followed by SVM (4 classifiers in the top-10), neural networks and boosting ensembles (5 and 3 members in the top-20, respectively)

Keywords

Bibliographic citation

Fernández-Delgado, M., Cernadas, E., Barro, S. & Amorim, D. (2014). Do we Need Hundreds of Classifiers to Solve Real World Classification Problems?, JMLR, 15, 3133−3181

Publisher version

http://jmlr.org/papers/v15/delgado14a.html

Rights

Collections

Centro de Investigación en Tecnoloxías Intelixentes da USC (CiTIUS)
Electrónica e Computación

Full item page

Do we need hundreds of classifiers to solve real world classification problems?

Files

Identifiers

Publication date

Authors

Advisors

Tutors

Editors

Journal Title

Journal ISSN

Volume Title

Publisher

Metrics

Export

Research Projects

Organizational Units

Journal Issue

Abstract

Description

Keywords

Bibliographic citation

Relation

Has part

Has version

Is based on

Is part of

Is referenced by

Is version of

Requires

Publisher version

Sponsors

Rights

Collections