The Impacts of User Experience Metrics on Click-Through Rate (CTR) in Digital Advertising: A Machine Learning Approach
Keywords:
Click-Through Rate, Feature Importance, Machine Learning, Online Advertising, Personalization, SMOTE, User BehaviorAbstract
The prediction of Click-Through Rate (CTR) in digital advertising serves as a critical metric for both advertisers and publishers, as it directly impacts the effectiveness and profitability of online advertising campaigns. We investigate the factors influencing Click-Through Rate (CTR) in online advertising. In contrast to previous research that often focused on more traditional variables like ad placement and device type, this study introduces new user experience metrics to predict Click-Through Rate (CTR) in online advertising. Specifically, we incorporated features such as Personalization, Intrusiveness, Mobile Optimization, Loading Time, Brand Awareness, Scroll Length, and Ad Fatigue. These features were selected to capture a broader range of user experiences and interactions with online advertisements. The dataset is preprocessed through capping extreme values and label encoding for categorical variables. Given the imbalanced nature of the dataset used, the Synthetic Minority Over-sampling Technique (SMOTE) is applied to balance the classes. Logistic regression, decision trees, and random forests, are trained and evaluated on both the original and SMOTE-balanced datasets. Correlation analysis reveals significant relationships, such as a positive correlation between CTR and Personalization (0.47), and a negative correlation with Intrusiveness (-0.38). Feature importance analysis further highlights the critical role of Personalization, with a score of 0.25, in predicting CTR. The study further explored the performance of machine learning models, finding that logistic regression, decision trees, and random forests exhibited strong predictive capabilities, particularly when trained on balanced data. Feature engineering had a mixed impact, negatively affecting the performance of logistic regression but not significantly impacting decision trees and random forests. The practical significance of our findings in digital advertising initiatives was discussed.