Skip to main content

Abstract

Over the years, artificial neural networks have been applied successfully in many areas including IT security. Yet, neural networks can only process continuous input data. This is particularly challenging for security-related non-continuous data like system calls. This work focuses on four different options to preprocess sequences of system calls so that they can be processed by neural networks. These input options are based on one-hot encoding and learning word2vec and GloVe representations of system calls. As an additional option, we analyze if the mapping of system calls to their respective kernel modules is an adequate generalization step for (a) replacing system calls or (b) enhancing system call data with additional information regarding their context. However, when performing such preprocessing steps it is important to ensure that no relevant information is lost during the process. The overall objective of system call based intrusion detection is to categorize sequences of system calls as benign or malicious behavior. Therefore, this scenario is used to evaluate the different input options as a classification task. The results show that each of the four different methods is valid when preprocessing input data, but the use of kernel modules only is not recommended because too much information is being lost during the mapping process.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Creech, G., Hu, J.: A semantic approach to host-based intrusion detection systems using contiguous and discontiguous system call patterns. IEEE Trans. Comput. 63(4), 807–819 (2014)

    Article  MathSciNet  Google Scholar 

  2. Forrest, S., Hofmeyr, S.A., Somayaji, A., Longstaff, T.A.: A sense of self for unix processes. In: IEEE Symposium on Security and Privacy, pp. 120–128. IEEE (1996)

    Google Scholar 

  3. Hofmeyr, S.A., Forrest, S., Somayaji, A.: Intrusion detection using sequences of system calls. J. Comput. Secur. 6(3), 151–180 (1998)

    Article  Google Scholar 

  4. Kim, G., Yi, H., Lee, J., Paek, Y., Yoon, S.: LSTM-based system-call language modeling and robust ensemble method for designing host-based intrusion detection systems. arXiv preprint arXiv:1611.01726, pp. 1–12 (2016)

  5. Kolosnjaji, B., Zarras, A., Webster, G., Eckert, C.: Deep learning for classification of malware system call sequences. In: Australasian Joint Conference on Artificial Intelligence (AI), pp. 137–149. Springer (2016)

    Google Scholar 

  6. Sharma, A., Pujari, A.K., Paliwal, K.K.: Intrusion detection using text processing techniques with a kernel based similarity measure. Comput. Secur. 26(7–8), 488–495 (2007)

    Article  Google Scholar 

  7. Murtaza, S.S., Khreich, W., Hamou-Lhadj, A., Gagnon, S.: A trace abstraction approach for host-based anomaly detection. In: IEEE Symposium on Computational Intelligence for Security and Defense Applications (CISDA), pp. 170–177. IEEE (2015)

    Google Scholar 

  8. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  9. Creech, G., Hu, J.: Generation of a new IDS test dataset: time to retire the KDD collection. In: IEEE Wireless Communications and Networking Conference (WCNC), pp. 4487–4492. IEEE (2013)

    Google Scholar 

  10. Eskin, E., Lee, W., Stolfo, S.J.: Modeling system calls for intrusion detection with dynamic window sizes. In: DARPA Information Survivability Conference & Exposition II (DISCEX), vol. 1, pp. 165–175. IEEE (2001)

    Google Scholar 

  11. Hoang, X.D., Hu, J.: An efficient hidden markov model training scheme for anomaly intrusion detection of server applications based on system calls. In: IEEE International Conference on Networks (ICon), vol. 2, pp. 470–474. IEEE (2004)

    Google Scholar 

  12. Kosoresow, A.P., Hofmeyer, S.: Intrusion detection via system call traces. IEEE Softw. 14(5), 35–42 (1997)

    Article  Google Scholar 

  13. Eskin, E., Arnold, A., Prerau, M., Portnoy, L., Stolfo, S.: A geometric framework for unsupervised anomaly detection: detecting intrusions in unlabeled data. Appl. Data Min. Comput. Secur. 6, 77–102 (2002)

    Google Scholar 

  14. Wang, Y., Wong, J., Miner, A.: Anomaly intrusion detection using one class SVM. In: IEEE SMC Information Assurance Workshop, pp. 358–364. IEEE (2004)

    Google Scholar 

  15. Chawla, A., Lee, B., Fallon, S., Jacob, P.: Host based intrusion detection system with combined CNN/RNN model. In: International Workshop on AI in Security, pp. 9–18 (2018)

    Google Scholar 

  16. Xie, M., Hu, J., Yu, X., Chang, E.: Evaluating host-based anomaly detection systems: application of the frequency-based algorithms to ADFA-LD. In: International Conference on Network and System Security, pp. 542–549. Springer (2014)

    Google Scholar 

  17. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)

    Google Scholar 

  18. Pennington, J., Socher, R., Manning, C.: GloVe: global vectors for word representation. In: Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)

    Google Scholar 

  19. Ring, M., Landes, D., Dallmann, A., Hotho, A.: IP2Vec: learning similarities between IP adresses. In: Workshop on Data Mining for Cyber Security (DMCS), International Conference on Data Mining Workshops (ICDMW), pp. 657–666. IEEE (2017)

    Google Scholar 

  20. [Online] DARPA 1998/1999 Dataset. https://www.ll.mit.edu/r-d/datasets. Accessed 14 Mar 2019

  21. [Online] UNM Dataset. https://www.cs.unm.edu/~immsec/systemcalls.htm. Accessed 14 Nov 2018

  22. Warrender, C., Forrest, S., Pearlmutter, B.: Detecting intrusions using system calls: alternative data models. In: IEEE Symposium on Security and Privacy, pp. 133–145. IEEE (1999)

    Google Scholar 

  23. McHugh, J.: Testing intrusion detection systems: a critique of the 1998 and 1999 DARPA intrusion detection system evaluations as performed by Lincoln laboratory. ACM Trans. Inform. Syst. Secur. (TISSEC) 3(4), 262–294 (2000)

    Article  Google Scholar 

  24. He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)

    Article  Google Scholar 

  25. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)

    MathSciNet  MATH  Google Scholar 

  26. Gers, F.A., Schraudolph, N.N., Schmidhuber, J.: Learning precise timing with LSTM recurrent networks. J. Mach. Learn. Res. 3, 115–143 (2002)

    MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

S.W. is funded by the Bavarian State Ministry of Science and Arts in the framework of the Centre Digitization.Bavaria (ZD.B). S.W. and M.R. are further supported by the BayWISS Consortium Digitization. Last but not least, we gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan Xp GPU used for this research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sarah Wunderlich .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wunderlich, S., Ring, M., Landes, D., Hotho, A. (2020). Comparison of System Call Representations for Intrusion Detection. In: Martínez Álvarez, F., Troncoso Lora, A., Sáez Muñoz, J., Quintián, H., Corchado, E. (eds) International Joint Conference: 12th International Conference on Computational Intelligence in Security for Information Systems (CISIS 2019) and 10th International Conference on EUropean Transnational Education (ICEUTE 2019). CISIS ICEUTE 2019 2019. Advances in Intelligent Systems and Computing, vol 951. Springer, Cham. https://doi.org/10.1007/978-3-030-20005-3_2

Download citation

Publish with us

Policies and ethics