Skip to main content
Log in

DDoS attack detection with feature engineering and machine learning: the framework and performance evaluation

  • Regular contribution
  • Published:
International Journal of Information Security Aims and scope Submit manuscript

Abstract

This paper applies an organized flow of feature engineering and machine learning to detect distributed denial-of-service (DDoS) attacks. Feature engineering has a focus to obtain the datasets of different dimensions with significant features, using feature selection methods of backward elimination, chi2, and information gain scores. Different supervised machine learning models are applied on the feature-engineered datasets to demonstrate the adaptability of datasets for machine learning under optimal tuning of parameters within given sets of values. The results show that substantial feature reduction is possible to make DDoS detection faster and optimized with minimal performance hit. The paper proposes a strategic-level framework which incorporates the necessary elements of feature engineering and machine learning with a defined flow of experimentation. The models are also validated with cross-validation and evaluated for area-under-curve analyses. It provides comprehensive solutions which can be trusted to avoid the overfitting and collinearity problems of data while detecting DDoS attacks. In the case study of DDoS datasets, K-nearest neighbors algorithm overall exhibits the best performance followed by support vector machine, whereas low-dimensional datasets of discrete feature types perform better under the Random Forest model as compared to high dimensions with numerical features. The accuracy scores of dataset with the lowest number of features remain competitive with other datasets under all machine learning models, leading to a substantially reduced processing overhead. The experiments show that approximately 68% reduction in the feature space is possible with an impact of only about 0.03% on accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18

Similar content being viewed by others

Notes

  1. The wrapper method applies a regressor on the identified feature and validates its significance with accuracies of regression. Another approach is the filter method which directly selects the features based on statistical scores without any validation approach of regression.

  2. https://www.researchgate.net/publication/292945336_Detecting_Distributed_Denial_of_Service_Attacks_Using_Data_Mining_Techniques

  3. The data curves are supposed to be normal distributions where sample size > 30. As the population mean is not known and we are only dealing with samples, the t-statistic test is applied

  4. Epoch is a round of completion when all the records of a dataset have been fed to the neural network. If epoch No. 1 is just completed, all the records will again be fed in epoch No. 2 and so on. It is not necessary that all the records are fed simultaneously or sequentially in one batch for an epoch. This part is driven by the batch size parameter which is configurable for ANN.

  5. Making a change in kernel according to the underlying data is termed as ‘kernel trick.’

  6. CART is one of the decision tree algorithms. There is a bunch of others including ID3, C4.5, MARS (multivariate adaptive regression splines) etc.

References

  1. Mitrokotsa, A., Douligeris, C.: Denial of Service Attacks, Network Security: Current Status and Future Directions, pp. 117–134. Wiley, Hoboken (2006)

    Google Scholar 

  2. Zhang, L., Yu, S., Wu, D., Watters, P.: A survey on latest botnet attack and defense. In: 10th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), IEEE, pp. 53–60 (2011)

  3. State of the Internet Security—Q4 2017, Report from Akamai, 4(4), (2018)

  4. Nagesh, K., Sumathy, R., Devakumar, P., Sathiyamurthy, K.: A survey on denial of service attacks and preclusions. In: International conference on informatics and analytics, p. 118 (2016)

  5. KDD Cup 1999 Dataset. http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html

  6. CAIDA DDoS Attack 2007 Dataset. http://www.caida.org/data/passive/ddos-20070804_dataset.xml

  7. CAIDA Anonymized Internet Traces 2008 Dataset. http://www.caida.org/data/passive/passive_2008_dataset.xml

  8. Tavallaee, M., Bagheri, E., Lu, W., Ghorbani, A.A.: A detailed analysis of the KDD CUP 99 data set. In: Symposium on Computational Intelligence for Security and Defense Applications (CISDA), IEEE, pp. 1–6 (2009)

  9. ISOT Botnet Dataset. https://www.uvic.ca/engineering/ece/isot/datasets/index.php

  10. The Honeynet Project. http://www.honeynet.org/chapters/france

  11. Shiravi, A., Shiravi, H., Tavallaee, M., Ghorbani, A.A.: Toward developing a systematic approach to generate benchmark datasets for intrusion detection. Comput. Secur. 31(3), 357–374 (2012)

    Article  Google Scholar 

  12. Moustafa, N., Slay, J.: UNSW-NB15: a comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set). In: Military Communications and Information Systems Conference (MilCIS), pp. 1–6 (2015)

  13. Gao, Y., Feng, Y., Kawamoto, J., Sakurai, K.: A machine learning based approach for detecting DRDoS attacks and its performance evaluation. In: 11th Asia Joint Conference on Information Security (AsiaJCIS), pp. 80–86 (2016)

  14. Singh, N.A., Singh, K.J., De, T.: Distributed denial of service attack detection using Naive Bayes classifier through info gain feature selection. In: International Conference on Informatics and Analytics, p. 54 (2016)

  15. Azab, A., Alazab, M., Aiash, M.: Machine learning based botnet identification traffic. In: Trustcom/BigDataSE/I SPA, IEEE, pp. 1788–1794 (2016)

  16. Yusof, A.R., Udzir, N.I., Selamat, A., Hamdan, H., Abdullah, M.T.: Adaptive feature selection for denial of services (DoS) attack. In: IEEE Conference on Application, Information and Network Security (AINS), IEEE, pp. 81–84 (2017)

  17. Singh, K.J., De, T.: Efficient classification of DDoS attacks using an ensemble feature selection algorithm. J. Intell. Syst (2017). https://doi.org/10.1515/jisys-2017-0472

  18. Khan, S., Gani, A., Wahab, A.W.A., Singh, P.K.: Feature selection of Denial-of-Service attacks using entropy and granular computing. Arab. J. Sci. Eng. 43(2), 499–508 (2018)

    Article  Google Scholar 

  19. Alejandre, F.V., Corts, N.C., Anaya, E.A.: Feature selection to detect botnets using machine learning algorithms. In: International Conference on Electronics, Communications and Computers (CONIELECOMP), pp. 1–7 (2017)

  20. Al-Hawawreh, M.S.: SYN flood attack detection in cloud environment based on TCP/IP header statistical features. In: 8th International Conference on Information Technology (ICIT), pp. 236–243 (2017)

  21. Li, J., Liu, Y., Gu, L.: DDoS attack detection based on neural network. In: 2nd International Symposium on Aware Computing (ISAC), pp. 196–199 (2010)

  22. Agrawal, P.K., Gupta, B.B., Jain, S., Pattanshetti, M.K.: Estimating Strength of a DDoS Attack in Real Time Using ANN Based Scheme, Computer Networks and Intelligent Computing, pp. 301–310. Springer, Berlin (2011)

    Google Scholar 

  23. Gupta, B.B., Joshi, R.C., Misra, M., Jain, A., Juyal, S., Prabhakar, R., Singh, A.K.: Predicting Number of Zombies in a DDoS Attack Using ANN Based Scheme, Information Technology and Mobile Communication, pp. 117–122. Springer, Berlin (2011)

    Google Scholar 

  24. Bansal, A., Mahapatra, S.: A comparative analysis of machine learning techniques for botnet detection. In: 10th International Conference on Security of Information and Networks, pp. 91–98 (2017)

  25. Lu, L., Feng, Y., Sakurai, K.: C&C session detection using random forest. In: 11th International Conference on Ubiquitous Information Management and Communication, p. 34 (2017)

  26. Zekri, M., El Kafhali, S., Aboutabit, N., Saadi, Y.: DDoS attack detection using machine learning techniques in cloud computing environments. In: 3rd International Conference of Cloud Computing Technologies and Applications (CloudTech), pp. 1–7 (2017)

  27. Yuan, X., Li, C., Li, X.: DeepDefense: identifying DDoS attack via deep learning. In: International Conference on Smart Computing (SMARTCOMP), IEEE, pp. 1–8 (2017)

  28. Alkasassbeh, M., Al-Naymat, G., Hassanat, A.B., Almseidin, M.: Detecting distributed denial of service attacks using data mining techniques. Int. J. Adv. Comput. Sci. Appl. 7(1), 436–445 (2016)

    Google Scholar 

  29. Singh, K., Singh, P., Kumar, K.: Application layer HTTP-GET flood DDoS attacks: research landscape and challenges. Comput. Secur. 65, 344–372 (2017)

    Article  Google Scholar 

  30. Tripathi, N., Hubballi, N.: Slow rate denial of service attacks against HTTP/2 and detection. Comput. Secur. 72, 255–272 (2018)

    Article  Google Scholar 

  31. Jonker, M., King, A., Krupp, J., Rossow, C., Sperotto, A., Dainotti, A.: Millions of targets under attack: a macroscopic characterization of the DoS ecosystem. In: Internet Measurement Conference, pp. 100–113 (2017)

  32. Aamir, M., Zaidi, M.A.: A survey on DDoS attack and defense strategies: from traditional schemes to current techniques. Interdiscip. Inf. Sci. 19(2), 173–200 (2013)

    Google Scholar 

  33. Shakeel, F., Sabhitha, A.S., Sharma, S.: Exploratory review on class imbalance problem: an overview. In: 8th International Conference on Computing, Communication and Networking Technologies (ICCCNT), pp. 1–8 (2017)

  34. Idhammad, M., Afdel, K., Belouch, M.: Semi-supervised machine learning approach for DDoS detection. Appl. Intell. 48, 1–16 (2018)

    Article  Google Scholar 

  35. Haixiang, G., Yijing, L., Shang, J., Mingyun, G., Yuanyue, H., Bing, G.: Learning from class-imbalanced data: review of methods and applications. Expert Syst. Appl. 73, 220–239 (2017)

    Article  Google Scholar 

  36. Miller, S., Busby-Earle, C.: The role of machine learning in botnet detection. In: 11th International Conference for Internet Technology and Secured Transactions (ICITST), pp. 359–364 (2016)

  37. Kirubavathi, G., Anitha, R.: Botnet detection via mining of traffic flow characteristics. Comput. Electr. Eng. 50, 91–101 (2016)

    Article  Google Scholar 

  38. Osanaiye, O., Choo, K.-K.R., Dlodlo, M.: Analysing feature selection and classification techniques for DDoS detection in cloud. In: Proceedings of Southern Africa Telecommunication (2016)

  39. Larose, D.T., Larose, C.D.: k-Nearest neighbor algorithm. Discovering Knowledge in Data: an Introduction to Data Mining, 2nd edn, pp. 149–164. John Wiley & Sons (2014)

  40. Wu, X., et al.: Top 10 algorithms in data mining. Knowl. Inf. Syst. 14(1), 1–37 (2008)

    Article  Google Scholar 

  41. Suthaharan, S.: Support Vector Machine, Machine Learning Models and Algorithms for Big Data Classification, pp. 207–235. Springer, Berlin (2016)

    Book  MATH  Google Scholar 

  42. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)

    Article  MATH  Google Scholar 

  43. Nielsen, M.A.: Neural Networks and Deep Learning. Determination Press (2015). http://neuralnetworksanddeeplearning.com/

  44. Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier neural networks. In: 14th International Conference on Artificial Intelligence and Statistics, pp. 315–323 (2011)

  45. scikit-learn: Data science library for Python. https://pypi.org/project/scikit-learn/

  46. TensorFlow: Open source ML platform. https://www.tensorflow.org/

  47. Loh, W.-Y.: Classification and regression trees. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 1(1), 14–23 (2011)

    Article  Google Scholar 

  48. Bradley, A.P.: The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognit. 30(7), 1145–1159 (1997)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Muhammad Aamir.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Aamir, M., Zaidi, S.M.A. DDoS attack detection with feature engineering and machine learning: the framework and performance evaluation. Int. J. Inf. Secur. 18, 761–785 (2019). https://doi.org/10.1007/s10207-019-00434-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10207-019-00434-1

Keywords

Navigation