Supervised Machine Learning for Detecting Malicious URLs: An Evaluation of Different Models

Authors

Keywords:

Accuracy, Classification, Extra Trees Classifier, Malicious URLs, Random Forest Classifier, Supervised machine learning

Abstract

Malicious URLs are often used to distribute malware, steal personal information, or engage in phishing attacks. Traditional approaches for identifying these URLs are often ineffective, and as such, researchers are exploring new methods to address this problem. In this study, we investigate the use of supervised machine learning models to detect malicious URLs. Our dataset consisted of 651191 URLs, which were classified into four different categories: Benign, defacement, phishing, and malware. We employed several machine-learning algorithms, including Decision Tree, Random Forest, Ada Boost, K Neighbors, SGD, Extra Trees, and Gaussian NB, to evaluate their ability to classify URLs into these categories accurately. Our results show that the accuracy scores range from 0.789548 to 0.914718, indicating that the models perform reasonably well in detecting malicious URLs. The Random Forest Classifier and Extra Trees Classifier achieved the highest accuracy scores of 0.914718 and 0.914711, respectively, indicating that they performed the best on the dataset. In contrast, the Gaussian NB model had the lowest accuracy score of 0.789548, suggesting that it performed the worst on the dataset. This research demonstrates that supervised machine learning models can effectively detect malicious URLs. The results indicate that Random Forest and Extra Trees classifiers may be particularly useful for this task. This research may provide a foundation for further development and improvement of machine learning-based systems for detecting malicious URLs, enhancing online security for individuals and organizations.

Joan telo research

Downloads

Published

2022-11-15

How to Cite

Telo, J. (2022). Supervised Machine Learning for Detecting Malicious URLs: An Evaluation of Different Models. Sage Science Review of Applied Machine Learning, 5(2), 30–46. Retrieved from https://journals.sagescience.org/index.php/ssraml/article/view/55