Machine Learning-Based Obesity Classification: A Comparative Study Using Self-Reported Survey Data and Ensemble Learning Models

Authors

  • Gregorius Airlangga Atma Jaya Catholic University of Indonesia, Indonesia

DOI:

https://doi.org/10.37012/jtik.v11i1.2585

Abstract

Obesity has become one of the most pressing global health challenges of the 21st century, with its prevalence increasing at an alarming rate. Obesity is a major global health concern, contributing to an increased risk of cardiovascular disease, diabetes, and other metabolic disorders. Traditional assessment methods, such as BMI-based classification, often fail to incorporate lifestyle and behavioral factors, limiting their predictive capabilities. This study explores the use of machine learning for obesity classification based on self-reported survey data collected from individuals in Mexico, Peru, and Colombia. The dataset comprises 2111 instances with 17 attributes, covering demographic characteristics, eating habits, and physical activity levels. Eight machine learning models, including Logistic Regression, Random Forest, Gradient Boosting, Support Vector Machine (SVM), Decision Tree, K-Nearest Neighbors, Naïve Bayes, and AdaBoost, were evaluated using 10-fold cross-validation. Results indicate that Gradient Boosting achieved the highest accuracy of 96.49%, followed by Random Forest and SVM, demonstrating the effectiveness of ensemble learning techniques in capturing complex feature interactions. In contrast, Naïve Bayes and AdaBoost exhibited the lowest classification performance due to their strong assumptions about feature independence and sensitivity to noisy data. The findings highlight the potential of machine learning in obesity classification and underscore the need for advanced predictive models to enhance public health monitoring and intervention strategies.

Downloads

Published

2025-03-25

Citation Check