Skip to main content

Anonymization of System Logs for Preserving Privacy and Reducing Storage

  • Conference paper
  • First Online:
Advances in Information and Communication Networks (FICC 2018)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 887))

Included in the following conference series:

Abstract

System logs constitute valuable information for analysis and diagnosis of systems behavior. The analysis is highly time-consuming for large log volumes. For many parallel computing centers, outsourcing the analysis of system logs (syslogs) to third parties is the only option. Therefore, a general analysis and diagnosis solution is needed. Such a solution is possible only through the syslog analysis from multiple computing systems. The data within syslogs can be sensitive, thus obstructing the sharing of syslogs across institutions, third party entities, or in the public domain. This work proposes a new method for the anonymization of syslogs that employs de-identification and encoding to provide fully shareable system logs. In addition to eliminating the sensitive data within the test logs, the proposed anonymization method provides 25% performance improvement in post-processing of the anonymized syslogs, and more than 80% reduction in their required storage space.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://doc.zih.tu-dresden.de/hpc-wiki/bin/view/Compendium/SystemTaurus.

References

  1. Cranor, L., Rabin, T., Shmatikov, V., Vadhan, S., Weitzner, D.: Towards a privacy research roadmap for the computing community. ArXiv e-prints (2016)

    Google Scholar 

  2. Redman, T.C.: Data Driven: Profiting from Your Most Important Business Asset. Harvard Business Press (2008)

    Google Scholar 

  3. European Commission Decision. http://eur-lex.europa.eu/legal-content/en/ALL/?uri=CELEX:32000D0520. Accessed 06 June 2017

  4. General data protection regulation. http://gdpr-info.eu/art-4-gdpr/. Accessed 06 June 2017

  5. Sweeney, L.: Simple demographics often identify people uniquely. Carnegie Mellon University, Data Privacy (2000, working paper)

    Google Scholar 

  6. Dahlberg, R., Pulls, T.: Standardized Syslog Processing : Revisiting Secure Reliable Data Transfer and Message Compression, Karlstad, Sweden (2016)

    Google Scholar 

  7. New rsyslog 7.4.0. http://www.rsyslog.com/7-4-0-the-new-stable/. Accessed 06 June 2017

  8. Logstash, centralize, transform and stash your data. http://www.elastic.co/products/logstash. Accessed 06 June 2017

  9. Sanjappa, S., Ahmed, M.: Analysis of logs by using logstash. In: Proceedings of the 5th International Conference on Frontiers in Intelligent Computing: Theory and Applications, pp. 579–585. Springer, Singapore (2017)

    Google Scholar 

  10. Loggy, log management. http://www.loggly.com/. Accessed 06 June 2017

  11. Siem, log management, compliance. http://www.logsign.com/. Accessed 06 June 2017

  12. Blazing-fast log management and server monitoring. http://www.scalyr.com. Accessed 06 June 2017

  13. Gholami, A., Laure, E., Somogyi, P., Spjuth, O., Niazi, S., Dowling, J.: Privacy-preservation for publishing sample availability data with personal identifiers. J. Med. Bioeng. 4(2) (2015)

    Google Scholar 

  14. Templ, M., Kowarik, A., Meindl, B.: Statistical disclosure control methods for anonymization of microdata and risk estimation. http://cran.r-project.org/web/packages/sdcMicro/index.html. Accessed 06 June 2017

  15. Dai, C., Ghinita, G., Bertino, E., Byun, J.-W., Li, N.: TIAMAT: a tool for interactive analysis of microdata anonymization techniques. Proc. VLDB Endow. 2(2), 1618–1621 (2009)

    Article  Google Scholar 

  16. Ciglic, M., Eder, J., Koncilia, C.: Anonymization of data sets with null values. Trans. Large-Scale Data-Knowl.-Cent.Ed Syst. XXIV: Spec. Issue Database-Expert.-Syst. Appl., 193–220 (2016)

    Google Scholar 

  17. UTD anonymization toolbox. http://cs.utdallas.edu/dspl/cgi-bin/toolbox. Accessed 06 June 2017

  18. Xiao, X., Wang, G., Gehrke, J.: Interactive anonymization of sensitive data. In: Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data, SIGMOD 2009, pp. 1051–1054. ACM, New York (2009)

    Google Scholar 

  19. Machanavajjhala, A., Gehrke, J., Kifer, D., Venkitasubramaniam, M.: L-diversity: privacy beyond k-anonymity. In: 22nd International Conference on Data Engineering (ICDE 2006), pp. 24–24, April 2006

    Google Scholar 

  20. Meyerson, A., Williams, R.: On the complexity of optimal k-anonymity. In: Proceedings of the Twenty-Third ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2004, pp. 223–228. ACM, New York (2004)

    Google Scholar 

  21. Bayardo, R.J., Agrawal, R.: Data privacy through optimal k-anonymization. In: 21st International Conference on Data Engineering (ICDE 2005), pp. 217–228, April 2005

    Google Scholar 

  22. Rath, C.: Usable privacy-aware logging for unstructured log entries. In: 11th International Conference on Availability, Reliability and Security (ARES), pp. 272–277, August 2016

    Google Scholar 

  23. Privacy-aware logging made easy. http://github.com/nobecutan/privacy-aware-logging. Accessed 06 June 2017

  24. The syslog protocol. http://tools.ietf.org/html/rfc5424. Accessed 06 June 2017

  25. Ghiasvand, S., Ciorba, F.M.: Toward resilience in HPC: a prototype to analyze and predict system behavior. In: Poster at International Supercomputing Conference (ISC), June 2017

    Google Scholar 

  26. Demonstration of annonymization and event pattern detection. https://www.ghiasvand.net/u/paloodeh. Accessed 06 June 2017

  27. Alakuijala, J., Kliuchnikov, E., Szabadka, Z., Vandevenne, L.: Comparison of brotli, deflate, zopfli, lzma, lzham and bzip2 compression algorithms. http://cran.r-project.org/web/packages/brotli/vignettes/brotli-2015-09-22.pdf. Accessed 06 June 2017

  28. Collin, L.: A quick benchmark: Gzip vs. Bzip2 vs. LZMA. http://tukaani.org/lzma/benchmarks.html. Accessed 06 June 2017

  29. Quick benchmark: Gzip vs Bzip2 vs LZMA vs XZ vs LZ4 vs LZO. http://www.ghiasvand.net/u/compression. Accessed 06 June 2017

  30. Mahoney, M.: 10 gb compression benchmark. http://mattmahoney.net/dc/10gb.html. Accessed 06 June 2017

  31. Bertoni, G., Daemen, J., Peeters, M., Van Assche, G.: The KECCAK SHA-3 submission. http://keccak.noekeon.org/Keccak-submission-3.pdf. Accessed 06 June 2017

  32. Fluhrer, S.: Comments on FIPS-202. http://csrc.nist.gov/groups/ST/hash/sha-3/documents/fips202_comments/Fluhrer_Comments_Draft_FIPS_202.pdf. Accessed 06 June 2017

  33. Yoo, A.B., Jette, M.A., Grondona, M.: SLURM: simple Linux utility for resource management. In: Proceedings of 9th International Workshop on Job Scheduling Strategies for Parallel Processing, pp. 44–60. Springer, Heidelberg (2003)

    Google Scholar 

  34. Terms of use of the HPC systems at the ZIH, Technical University Dresden, Germany. http://doc.zih.tu-dresden.de/hpc-wiki/pub/Compendium/TermsOfUse/HPC-Nutzungsbedingungen_20160901.pdf. Accessed 06 June 2017

  35. Order for the Information Technology Facilities and Services and for the Information Security of the Technical University of Dresden (IT-Regulations), Germany. http://www.verw.tu-dresden.de/amtbek/PDF-Dateien/2016-12/sonstO05.01.2016.pdf. Accessed 06 June 2017

  36. Order for the Establishment and Operation of an Identity Management System at the Technical University of Dresden, Germany. http://www.verw.tu-dresden.de/AmtBek/PDF-Dateien/2011-05/sonstO26.07.2011.pdf. Accessed 06 June 2017

  37. Information leaflet on IT resources, Technical University Dresden, Germany. http://tu-dresden.de/zih/dienste/service-katalog/zugangsvoraussetzung/merkblatt?set_language=en. Accessed 06 June 2017

  38. Ghiasvand, S., Ciorba, F.M., Tschüter, R., Nagel, W.E.: Analysis of node failures in high performance computers based on system logs. In: Poster at International Conference for High Performance Computing, Networking, Storage and Analysis (SC15) (2015)

    Google Scholar 

  39. Fournier-Viger, P., Lin, J. C., Vo, B., Truong, T.C., Zhang, J., Le, H.B.: A survey of itemset mining. Wiley Interdisc. Rew.: Data Mining and Knowledge Discovery 7(4) (2017). https://doi.org/10.1002/widm.1207

Download references

Acknowledgement

This work is in part supported by the German Research Foundation (DFG) in the Cluster of Excellence “Center for Advancing Electronics Dresden” (cfaed). The authors also thank Holger Mickler and the administration team of Technical University of Dresden, Germany for their support in collecting the monitoring information on the Taurus high performance computing cluster.

Disclaimer. References to legal excerpts and regulations in this work are provided only to clarify the proposed approach and to enhance explanation. In no event will authors of this work be liable for any incidental, indirect, consequential, or special damages of any kind, based on the information in these references.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Siavash Ghiasvand .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ghiasvand, S., Ciorba, F.M. (2019). Anonymization of System Logs for Preserving Privacy and Reducing Storage. In: Arai, K., Kapoor, S., Bhatia, R. (eds) Advances in Information and Communication Networks. FICC 2018. Advances in Intelligent Systems and Computing, vol 887. Springer, Cham. https://doi.org/10.1007/978-3-030-03405-4_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-03405-4_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-03404-7

  • Online ISBN: 978-3-030-03405-4

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics