Comparison of System Call Representations for Intrusion Detection

Wunderlich, Sarah; Ring, Markus; Landes, Dieter; Hotho, Andreas

doi:10.1007/978-3-030-20005-3_2

Sarah Wunderlich¹⁹,
Markus Ring¹⁹,
Dieter Landes¹⁹ &
…
Andreas Hotho²⁰

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 951))

Included in the following conference series:

661 Accesses
5 Citations

Abstract

Over the years, artificial neural networks have been applied successfully in many areas including IT security. Yet, neural networks can only process continuous input data. This is particularly challenging for security-related non-continuous data like system calls. This work focuses on four different options to preprocess sequences of system calls so that they can be processed by neural networks. These input options are based on one-hot encoding and learning word2vec and GloVe representations of system calls. As an additional option, we analyze if the mapping of system calls to their respective kernel modules is an adequate generalization step for (a) replacing system calls or (b) enhancing system call data with additional information regarding their context. However, when performing such preprocessing steps it is important to ensure that no relevant information is lost during the process. The overall objective of system call based intrusion detection is to categorize sequences of system calls as benign or malicious behavior. Therefore, this scenario is used to evaluate the different input options as a classification task. The results show that each of the four different methods is valid when preprocessing input data, but the use of kernel modules only is not recommended because too much information is being lost during the mapping process.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Creech, G., Hu, J.: A semantic approach to host-based intrusion detection systems using contiguous and discontiguous system call patterns. IEEE Trans. Comput. 63(4), 807–819 (2014)
Article MathSciNet Google Scholar
Forrest, S., Hofmeyr, S.A., Somayaji, A., Longstaff, T.A.: A sense of self for unix processes. In: IEEE Symposium on Security and Privacy, pp. 120–128. IEEE (1996)
Google Scholar
Hofmeyr, S.A., Forrest, S., Somayaji, A.: Intrusion detection using sequences of system calls. J. Comput. Secur. 6(3), 151–180 (1998)
Article Google Scholar
Kim, G., Yi, H., Lee, J., Paek, Y., Yoon, S.: LSTM-based system-call language modeling and robust ensemble method for designing host-based intrusion detection systems. arXiv preprint arXiv:1611.01726, pp. 1–12 (2016)
Kolosnjaji, B., Zarras, A., Webster, G., Eckert, C.: Deep learning for classification of malware system call sequences. In: Australasian Joint Conference on Artificial Intelligence (AI), pp. 137–149. Springer (2016)
Google Scholar
Sharma, A., Pujari, A.K., Paliwal, K.K.: Intrusion detection using text processing techniques with a kernel based similarity measure. Comput. Secur. 26(7–8), 488–495 (2007)
Article Google Scholar
Murtaza, S.S., Khreich, W., Hamou-Lhadj, A., Gagnon, S.: A trace abstraction approach for host-based anomaly detection. In: IEEE Symposium on Computational Intelligence for Security and Defense Applications (CISDA), pp. 170–177. IEEE (2015)
Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Creech, G., Hu, J.: Generation of a new IDS test dataset: time to retire the KDD collection. In: IEEE Wireless Communications and Networking Conference (WCNC), pp. 4487–4492. IEEE (2013)
Google Scholar
Eskin, E., Lee, W., Stolfo, S.J.: Modeling system calls for intrusion detection with dynamic window sizes. In: DARPA Information Survivability Conference & Exposition II (DISCEX), vol. 1, pp. 165–175. IEEE (2001)
Google Scholar
Hoang, X.D., Hu, J.: An efficient hidden markov model training scheme for anomaly intrusion detection of server applications based on system calls. In: IEEE International Conference on Networks (ICon), vol. 2, pp. 470–474. IEEE (2004)
Google Scholar
Kosoresow, A.P., Hofmeyer, S.: Intrusion detection via system call traces. IEEE Softw. 14(5), 35–42 (1997)
Article Google Scholar
Eskin, E., Arnold, A., Prerau, M., Portnoy, L., Stolfo, S.: A geometric framework for unsupervised anomaly detection: detecting intrusions in unlabeled data. Appl. Data Min. Comput. Secur. 6, 77–102 (2002)
Google Scholar
Wang, Y., Wong, J., Miner, A.: Anomaly intrusion detection using one class SVM. In: IEEE SMC Information Assurance Workshop, pp. 358–364. IEEE (2004)
Google Scholar
Chawla, A., Lee, B., Fallon, S., Jacob, P.: Host based intrusion detection system with combined CNN/RNN model. In: International Workshop on AI in Security, pp. 9–18 (2018)
Google Scholar
Xie, M., Hu, J., Yu, X., Chang, E.: Evaluating host-based anomaly detection systems: application of the frequency-based algorithms to ADFA-LD. In: International Conference on Network and System Security, pp. 542–549. Springer (2014)
Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
Google Scholar
Pennington, J., Socher, R., Manning, C.: GloVe: global vectors for word representation. In: Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)
Google Scholar
Ring, M., Landes, D., Dallmann, A., Hotho, A.: IP2Vec: learning similarities between IP adresses. In: Workshop on Data Mining for Cyber Security (DMCS), International Conference on Data Mining Workshops (ICDMW), pp. 657–666. IEEE (2017)
Google Scholar
[Online] DARPA 1998/1999 Dataset. https://www.ll.mit.edu/r-d/datasets. Accessed 14 Mar 2019
[Online] UNM Dataset. https://www.cs.unm.edu/~immsec/systemcalls.htm. Accessed 14 Nov 2018
Warrender, C., Forrest, S., Pearlmutter, B.: Detecting intrusions using system calls: alternative data models. In: IEEE Symposium on Security and Privacy, pp. 133–145. IEEE (1999)
Google Scholar
McHugh, J.: Testing intrusion detection systems: a critique of the 1998 and 1999 DARPA intrusion detection system evaluations as performed by Lincoln laboratory. ACM Trans. Inform. Syst. Secur. (TISSEC) 3(4), 262–294 (2000)
Article Google Scholar
He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)
Article Google Scholar
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
MathSciNet MATH Google Scholar
Gers, F.A., Schraudolph, N.N., Schmidhuber, J.: Learning precise timing with LSTM recurrent networks. J. Mach. Learn. Res. 3, 115–143 (2002)
MathSciNet MATH Google Scholar

Download references

Acknowledgements

S.W. is funded by the Bavarian State Ministry of Science and Arts in the framework of the Centre Digitization.Bavaria (ZD.B). S.W. and M.R. are further supported by the BayWISS Consortium Digitization. Last but not least, we gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan Xp GPU used for this research.

Author information

Authors and Affiliations

Coburg University of Applied Sciences and Arts, Coburg, Germany
Sarah Wunderlich, Markus Ring & Dieter Landes
Data Mining and Information Retrieval Group, University of Würzburg, Würzburg, Germany
Andreas Hotho

Authors

Sarah Wunderlich
View author publications
You can also search for this author in PubMed Google Scholar
Markus Ring
View author publications
You can also search for this author in PubMed Google Scholar
Dieter Landes
View author publications
You can also search for this author in PubMed Google Scholar
Andreas Hotho
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sarah Wunderlich .

Editor information

Editors and Affiliations

Data Science and Big Data Lab, Pablo de Olavide University, Seville, Spain
Francisco Martínez Álvarez
Data Science and Big Data Lab, Pablo de Olavide University, Seville, Spain
Alicia Troncoso Lora
University of Salamanca, Salamanca, Spain
José António Sáez Muñoz
Department of Industrial Engineering, University of A Coruña, A Coruña, Spain
Héctor Quintián
University of Salamanca, Salamanca, Spain
Emilio Corchado

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wunderlich, S., Ring, M., Landes, D., Hotho, A. (2020). Comparison of System Call Representations for Intrusion Detection. In: Martínez Álvarez, F., Troncoso Lora, A., Sáez Muñoz, J., Quintián, H., Corchado, E. (eds) International Joint Conference: 12th International Conference on Computational Intelligence in Security for Information Systems (CISIS 2019) and 10th International Conference on EUropean Transnational Education (ICEUTE 2019). CISIS ICEUTE 2019 2019. Advances in Intelligent Systems and Computing, vol 951. Springer, Cham. https://doi.org/10.1007/978-3-030-20005-3_2

Download citation

DOI: https://doi.org/10.1007/978-3-030-20005-3_2
Published: 28 April 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-20004-6
Online ISBN: 978-3-030-20005-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics