Intrusion Detection with Supervised Machine Learning using SMOTE for Imbalanced Datasets
Keywords:
Imbalanced datasets, Intrusion detection, Machine learning, NSLKDD dataset, SMOTE algorithmAbstract
The This research study explores the importance of intrusion detection and the use of machine learning algorithms for detecting intrusions. One of the major issues faced in intrusion detection is the problem of imbalanced datasets, where one class is significantly underrepresented compared to the others. To address this issue, the study uses the NSLKDD dataset, which contains five classes: Benign, DOS, Probe, R2L, and U2R. The study employs three machine learning methods, namely Decision tree, KNN, and Linear SVC, to classify the different classes of intrusions. The error rates of these classifiers were measured before and after applying the Synthetic Minority Over-sampling Technique (SMOTE) algorithm, which is a popular technique for balancing imbalanced datasets. Before applying SMOTE, the Linear SVC algorithm had the highest error rate (0.278389), followed by the Decision tree algorithm (0.237890), and KNN algorithm (0.242060). This indicates that the Decision tree algorithm was the best performing model among the three classifiers before applying SMOTE. However, after applying SMOTE, the KNN algorithm had the lowest error rate (0.231281), followed by the Linear SVC algorithm (0.234608), and the Decision tree algorithm (0.244100). This indicates that the KNN algorithm was the best performing model among the three classifiers after applying SMOTE. This study demonstrates the effectiveness of machine learning algorithms in detecting intrusions, particularly in addressing the issue of imbalanced datasets using SMOTE. The results suggest that the KNN algorithm is the most effective in terms of reducing error rates after applying SMOTE, followed by the Linear SVC algorithm. The findings of this study may have implications for developing more accurate intrusion detection systems in the future.