Predictive Performance of Cross-Validation Techniques in Classification Models

Authors

  • O. F. Adedeji

Keywords:

Cross Validation, Classification Model, Machine Learning, K-fold, Overfitting

Abstract

Machine learning algorithms have proven to be breakthroughs in scientific research and other dynamic research areas. One attribute of machine learning process is data splitting to measure the generalization ability of the learning algorithm. However, due to the nature of sample size attribute (often small sample size) in many clinical and biological studies, data splitting may suffer relevance due to limited samples. This study investigates the learning generalization potential on small sample sizes and ascertains the most appropriate split ratio. Consequently, the study considers a family of Cross Validation (CV) techniques, namely K-fold, Nested and Repeated CV, given different split ratios and sample sizes. The study considers Naïve Bayes and Logistic Regression algorithms over simulation experiments and finds that when the sample size is small, the best split ratio is either 50:50 or 60:40.

Downloads

Published

2024-07-18

How to Cite

Adedeji, O. F. (2024). Predictive Performance of Cross-Validation Techniques in Classification Models. JOURNAL OF SCIENCE RESEARCH, 20(1). Retrieved from http://jsribadan.ng/index.php/ojs/article/view/165