DDoS attack detection with feature engineering and machine learning: the framework and performance evaluation

Aamir, Muhammad; Zaidi, Syed Mustafa Ali

doi:10.1007/s10207-019-00434-1

DDoS attack detection with feature engineering and machine learning: the framework and performance evaluation

Regular contribution
Published: 11 April 2019

Volume 18, pages 761–785, (2019)
Cite this article

International Journal of Information Security Aims and scope Submit manuscript

Muhammad Aamir¹ &
Syed Mustafa Ali Zaidi¹

2789 Accesses
45 Citations
3 Altmetric
Explore all metrics

Abstract

This paper applies an organized flow of feature engineering and machine learning to detect distributed denial-of-service (DDoS) attacks. Feature engineering has a focus to obtain the datasets of different dimensions with significant features, using feature selection methods of backward elimination, chi2, and information gain scores. Different supervised machine learning models are applied on the feature-engineered datasets to demonstrate the adaptability of datasets for machine learning under optimal tuning of parameters within given sets of values. The results show that substantial feature reduction is possible to make DDoS detection faster and optimized with minimal performance hit. The paper proposes a strategic-level framework which incorporates the necessary elements of feature engineering and machine learning with a defined flow of experimentation. The models are also validated with cross-validation and evaluated for area-under-curve analyses. It provides comprehensive solutions which can be trusted to avoid the overfitting and collinearity problems of data while detecting DDoS attacks. In the case study of DDoS datasets, K-nearest neighbors algorithm overall exhibits the best performance followed by support vector machine, whereas low-dimensional datasets of discrete feature types perform better under the Random Forest model as compared to high dimensions with numerical features. The accuracy scores of dataset with the lowest number of features remain competitive with other datasets under all machine learning models, leading to a substantially reduced processing overhead. The experiments show that approximately 68% reduction in the feature space is possible with an impact of only about 0.03% on accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Cybersecurity data science: an overview from machine learning perspective

Article Open access 01 July 2020

Machine Learning for Intelligent Data Analysis and Automation in Cybersecurity: Current and Future Prospects

Article Open access 19 September 2022

A systematic literature review for network intrusion detection system (IDS)

Article 27 March 2023

Notes

The wrapper method applies a regressor on the identified feature and validates its significance with accuracies of regression. Another approach is the filter method which directly selects the features based on statistical scores without any validation approach of regression.
https://www.researchgate.net/publication/292945336_Detecting_Distributed_Denial_of_Service_Attacks_Using_Data_Mining_Techniques
The data curves are supposed to be normal distributions where sample size > 30. As the population mean is not known and we are only dealing with samples, the t-statistic test is applied
Epoch is a round of completion when all the records of a dataset have been fed to the neural network. If epoch No. 1 is just completed, all the records will again be fed in epoch No. 2 and so on. It is not necessary that all the records are fed simultaneously or sequentially in one batch for an epoch. This part is driven by the batch size parameter which is configurable for ANN.
Making a change in kernel according to the underlying data is termed as ‘kernel trick.’
CART is one of the decision tree algorithms. There is a bunch of others including ID3, C4.5, MARS (multivariate adaptive regression splines) etc.

References

Mitrokotsa, A., Douligeris, C.: Denial of Service Attacks, Network Security: Current Status and Future Directions, pp. 117–134. Wiley, Hoboken (2006)
Google Scholar
Zhang, L., Yu, S., Wu, D., Watters, P.: A survey on latest botnet attack and defense. In: 10th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), IEEE, pp. 53–60 (2011)
State of the Internet Security—Q4 2017, Report from Akamai, 4(4), (2018)
Nagesh, K., Sumathy, R., Devakumar, P., Sathiyamurthy, K.: A survey on denial of service attacks and preclusions. In: International conference on informatics and analytics, p. 118 (2016)
KDD Cup 1999 Dataset. http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html
CAIDA DDoS Attack 2007 Dataset. http://www.caida.org/data/passive/ddos-20070804_dataset.xml
CAIDA Anonymized Internet Traces 2008 Dataset. http://www.caida.org/data/passive/passive_2008_dataset.xml
Tavallaee, M., Bagheri, E., Lu, W., Ghorbani, A.A.: A detailed analysis of the KDD CUP 99 data set. In: Symposium on Computational Intelligence for Security and Defense Applications (CISDA), IEEE, pp. 1–6 (2009)
ISOT Botnet Dataset. https://www.uvic.ca/engineering/ece/isot/datasets/index.php
The Honeynet Project. http://www.honeynet.org/chapters/france
Shiravi, A., Shiravi, H., Tavallaee, M., Ghorbani, A.A.: Toward developing a systematic approach to generate benchmark datasets for intrusion detection. Comput. Secur. 31(3), 357–374 (2012)
Article Google Scholar
Moustafa, N., Slay, J.: UNSW-NB15: a comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set). In: Military Communications and Information Systems Conference (MilCIS), pp. 1–6 (2015)
Gao, Y., Feng, Y., Kawamoto, J., Sakurai, K.: A machine learning based approach for detecting DRDoS attacks and its performance evaluation. In: 11th Asia Joint Conference on Information Security (AsiaJCIS), pp. 80–86 (2016)
Singh, N.A., Singh, K.J., De, T.: Distributed denial of service attack detection using Naive Bayes classifier through info gain feature selection. In: International Conference on Informatics and Analytics, p. 54 (2016)
Azab, A., Alazab, M., Aiash, M.: Machine learning based botnet identification traffic. In: Trustcom/BigDataSE/I SPA, IEEE, pp. 1788–1794 (2016)
Yusof, A.R., Udzir, N.I., Selamat, A., Hamdan, H., Abdullah, M.T.: Adaptive feature selection for denial of services (DoS) attack. In: IEEE Conference on Application, Information and Network Security (AINS), IEEE, pp. 81–84 (2017)
Singh, K.J., De, T.: Efficient classification of DDoS attacks using an ensemble feature selection algorithm. J. Intell. Syst (2017). https://doi.org/10.1515/jisys-2017-0472
Khan, S., Gani, A., Wahab, A.W.A., Singh, P.K.: Feature selection of Denial-of-Service attacks using entropy and granular computing. Arab. J. Sci. Eng. 43(2), 499–508 (2018)
Article Google Scholar
Alejandre, F.V., Corts, N.C., Anaya, E.A.: Feature selection to detect botnets using machine learning algorithms. In: International Conference on Electronics, Communications and Computers (CONIELECOMP), pp. 1–7 (2017)
Al-Hawawreh, M.S.: SYN flood attack detection in cloud environment based on TCP/IP header statistical features. In: 8th International Conference on Information Technology (ICIT), pp. 236–243 (2017)
Li, J., Liu, Y., Gu, L.: DDoS attack detection based on neural network. In: 2nd International Symposium on Aware Computing (ISAC), pp. 196–199 (2010)
Agrawal, P.K., Gupta, B.B., Jain, S., Pattanshetti, M.K.: Estimating Strength of a DDoS Attack in Real Time Using ANN Based Scheme, Computer Networks and Intelligent Computing, pp. 301–310. Springer, Berlin (2011)
Google Scholar
Gupta, B.B., Joshi, R.C., Misra, M., Jain, A., Juyal, S., Prabhakar, R., Singh, A.K.: Predicting Number of Zombies in a DDoS Attack Using ANN Based Scheme, Information Technology and Mobile Communication, pp. 117–122. Springer, Berlin (2011)
Google Scholar
Bansal, A., Mahapatra, S.: A comparative analysis of machine learning techniques for botnet detection. In: 10th International Conference on Security of Information and Networks, pp. 91–98 (2017)
Lu, L., Feng, Y., Sakurai, K.: C&C session detection using random forest. In: 11th International Conference on Ubiquitous Information Management and Communication, p. 34 (2017)
Zekri, M., El Kafhali, S., Aboutabit, N., Saadi, Y.: DDoS attack detection using machine learning techniques in cloud computing environments. In: 3rd International Conference of Cloud Computing Technologies and Applications (CloudTech), pp. 1–7 (2017)
Yuan, X., Li, C., Li, X.: DeepDefense: identifying DDoS attack via deep learning. In: International Conference on Smart Computing (SMARTCOMP), IEEE, pp. 1–8 (2017)
Alkasassbeh, M., Al-Naymat, G., Hassanat, A.B., Almseidin, M.: Detecting distributed denial of service attacks using data mining techniques. Int. J. Adv. Comput. Sci. Appl. 7(1), 436–445 (2016)
Google Scholar
Singh, K., Singh, P., Kumar, K.: Application layer HTTP-GET flood DDoS attacks: research landscape and challenges. Comput. Secur. 65, 344–372 (2017)
Article Google Scholar
Tripathi, N., Hubballi, N.: Slow rate denial of service attacks against HTTP/2 and detection. Comput. Secur. 72, 255–272 (2018)
Article Google Scholar
Jonker, M., King, A., Krupp, J., Rossow, C., Sperotto, A., Dainotti, A.: Millions of targets under attack: a macroscopic characterization of the DoS ecosystem. In: Internet Measurement Conference, pp. 100–113 (2017)
Aamir, M., Zaidi, M.A.: A survey on DDoS attack and defense strategies: from traditional schemes to current techniques. Interdiscip. Inf. Sci. 19(2), 173–200 (2013)
Google Scholar
Shakeel, F., Sabhitha, A.S., Sharma, S.: Exploratory review on class imbalance problem: an overview. In: 8th International Conference on Computing, Communication and Networking Technologies (ICCCNT), pp. 1–8 (2017)
Idhammad, M., Afdel, K., Belouch, M.: Semi-supervised machine learning approach for DDoS detection. Appl. Intell. 48, 1–16 (2018)
Article Google Scholar
Haixiang, G., Yijing, L., Shang, J., Mingyun, G., Yuanyue, H., Bing, G.: Learning from class-imbalanced data: review of methods and applications. Expert Syst. Appl. 73, 220–239 (2017)
Article Google Scholar
Miller, S., Busby-Earle, C.: The role of machine learning in botnet detection. In: 11th International Conference for Internet Technology and Secured Transactions (ICITST), pp. 359–364 (2016)
Kirubavathi, G., Anitha, R.: Botnet detection via mining of traffic flow characteristics. Comput. Electr. Eng. 50, 91–101 (2016)
Article Google Scholar
Osanaiye, O., Choo, K.-K.R., Dlodlo, M.: Analysing feature selection and classification techniques for DDoS detection in cloud. In: Proceedings of Southern Africa Telecommunication (2016)
Larose, D.T., Larose, C.D.: k-Nearest neighbor algorithm. Discovering Knowledge in Data: an Introduction to Data Mining, 2nd edn, pp. 149–164. John Wiley & Sons (2014)
Wu, X., et al.: Top 10 algorithms in data mining. Knowl. Inf. Syst. 14(1), 1–37 (2008)
Article Google Scholar
Suthaharan, S.: Support Vector Machine, Machine Learning Models and Algorithms for Big Data Classification, pp. 207–235. Springer, Berlin (2016)
Book MATH Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Article MATH Google Scholar
Nielsen, M.A.: Neural Networks and Deep Learning. Determination Press (2015). http://neuralnetworksanddeeplearning.com/
Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier neural networks. In: 14th International Conference on Artificial Intelligence and Statistics, pp. 315–323 (2011)
scikit-learn: Data science library for Python. https://pypi.org/project/scikit-learn/
TensorFlow: Open source ML platform. https://www.tensorflow.org/
Loh, W.-Y.: Classification and regression trees. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 1(1), 14–23 (2011)
Article Google Scholar
Bradley, A.P.: The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognit. 30(7), 1145–1159 (1997)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Shaheed Zulfikar Ali Bhutto Institute of Science & Technology (SZABIST), Karachi, Pakistan
Muhammad Aamir & Syed Mustafa Ali Zaidi

Authors

Muhammad Aamir
View author publications
You can also search for this author in PubMed Google Scholar
Syed Mustafa Ali Zaidi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Muhammad Aamir.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Aamir, M., Zaidi, S.M.A. DDoS attack detection with feature engineering and machine learning: the framework and performance evaluation. Int. J. Inf. Secur. 18, 761–785 (2019). https://doi.org/10.1007/s10207-019-00434-1

Download citation

Published: 11 April 2019
Issue Date: December 2019
DOI: https://doi.org/10.1007/s10207-019-00434-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

DDoS attack detection with feature engineering and machine learning: the framework and performance evaluation

Abstract

Access this article

Similar content being viewed by others

Cybersecurity data science: an overview from machine learning perspective

Machine Learning for Intelligent Data Analysis and Automation in Cybersecurity: Current and Future Prospects

A systematic literature review for network intrusion detection system (IDS)

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

DDoS attack detection with feature engineering and machine learning: the framework and performance evaluation

Abstract

Access this article

Similar content being viewed by others

Cybersecurity data science: an overview from machine learning perspective

Machine Learning for Intelligent Data Analysis and Automation in Cybersecurity: Current and Future Prospects

A systematic literature review for network intrusion detection system (IDS)

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation