Anticipated Date of Graduation

Spring 2024

Document Type

Thesis

Degree Name

Master of Science in Mathematical Sciences

Department

Mathematical Sciences

First Advisor

Doug Darbro

Abstract

Discriminant analysis is a statistical technique used to classify data into different classes. Many studies have compared different methods used to classify data as regards their performance. This study compares Linear Discriminant Analysis and Quadratic Discriminant Analysis under varying conditions of normality and the equality of covariance matrices. More precisely, this study seeks to determine which of the two techniques is better when classifying datasets with different properties of normality and equality of covariance matrices and aims to determine whether normality and equality of covariance matrices influence the prediction performance of each method. This study processes online stores’ customer sales data. Though the data processed was randomly generated, it was close to reality, since the data generation took into account different aspects like the mean and standard deviations of purchases of a particular type of product for a given period. By varying such parameters as the mean and the standard deviation, approximate real-world datasets were obtained. These datasets were processed using LDA and QDA for classification and the ROC-AUC score was used as the performance metric for each method. By statistically comparing these metrics, information was obtained concerning which method performed better under certain conditions. The results indicate that LDA performs better than QDA when classifying online stores’ customers based solely on their purchasing habits, but also reveal an insensitivity of LDA to changes in both normality and equality of covariance matrices. With these results, businesses with online stores will be able to choose wisely which classification method to use depending on the type of distribution contained in the dataset.

Share

COinS