RT Journal Article
T1 Do we need hundreds of classifiers to solve real world classification problems?
A1 Fernández Delgado, Manuel
A1 Cernadas García, Eva
A1 Barro Ameneiro, Senén
A1 Amorim, Dinani Gomes
K1 Classification
K1 UCI data base
K1 Random forest
K1 Support vector machine
K1 Neural networks
K1 Decision trees
K1 Ensembles
K1 Rule-based classifiers
K1 Discriminant analysis
K1 Bayesian classifiers
K1 Generalized linear models
K1 Partial least squares and principal component regression
K1 Multiple adaptive regression splines
K1 Nearest-neighbors
K1 Logistic and multinomial regression
AB We evaluate 179 classifiers arising from 17 families (discriminant analysis, Bayesian, neural networks, support vector machines, decision trees, rule-based classifiers, boosting, bagging, stacking, random forests and other ensembles, generalized linear models, nearest-neighbors, partial least squares and principal component regression, logistic and multinomial regression, multiple adaptive regression splines and other methods), implemented in Weka, R (with and without the caret package), C and Matlab, including all the relevant classifiers available today. We use 121 data sets, which represent the whole UCI data base (excluding the large- scale problems) and other own real problems, in order to achieve significant conclusions about the classifier behavior, not dependent on the data set collection. The classifiers most likely to be the bests are the random forest (RF) versions, the best of which (implemented in R and accessed via caret) achieves 94.1% of the maximum accuracy overcoming 90% in the 84.3% of the data sets. However, the difference is not statistically significant with the second best, the SVM with Gaussian kernel implemented in C using LibSVM, which achieves 92.3% of the maximum accuracy. A few models are clearly better than the remaining ones: random forest, SVM with Gaussian and polynomial kernels, extreme learning machine with Gaussian kernel, C5.0 and avNNet (a committee of multi-layer perceptrons implemented in R with the caret package). The random forest is clearly the best family of classifiers (3 out of 5 bests classifiers are RF), followed by SVM (4 classifiers in the top-10), neural networks and boosting ensembles (5 and 3 members in the top-20, respectively)
PB Journal of Machine Learning Research
SN 1532-4435
YR 2014
FD 2014
LK http://hdl.handle.net/10347/17792
UL http://hdl.handle.net/10347/17792
LA eng
NO Fernández-Delgado, M., Cernadas, E., Barro, S. & Amorim, D. (2014). Do we Need Hundreds of Classifiers to Solve Real World Classification Problems?, JMLR, 15,  3133−3181
NO We would like to acknowledge support from the Spanish Ministry of Science and Innovation(MICINN), which supported this work under projects TIN2011-22935 and TIN2012-32262
DS Minerva
RD 3 may 2026