Master's in Statistics Oral Defense: "Classification Trees and Rule-Based Modeling Using the C5.0 Algorithm for Self-Image Across Sex and Race in Saint Louis"

Rohan Shirali

Abstract: I use the C5.0 algorithm to create classification trees and rule-based models to analyze my study population. Specifically, I model a binary self-image variable as a function of sex, age, race, zip code, and a ratio of reported versus measured BMI (body mass index), and a multi-level categorical weight description variable as a function of sex, age, race, zip code, BMI ratio, and weight strategy. I compare the performance of the C5.0 algorithm with and without rules and boosting for independent and grouped categories, for both the binary and multi-level outcome. This comparison is limited due to sample size constraints. Ultimately, C5.0 performed best when modeling the binary variable and using either rules or boosting for independent categories.

 

Advisor: Todd Kuffner