Skip to main content

Email Classification Techniques—A Review

  • Conference paper
  • First Online:
Data Science and Intelligent Applications

Part of the book series: Lecture Notes on Data Engineering and Communications Technologies ((LNDECT,volume 52))

Abstract

Email has become a significant correspondence medium. Official, personal, social and promotional and other messages hit our mail box every day. From the research, it has discovered that the normal office specialist gets 121 messages for every day. Now and then because of flooding of messages in inbox, a portion of the some mails stay unattended, so on the off chance that messages are characterized into top need folders, at that point, the issue of unattended or unanswered mail will be tackled. In this paper, we identified the key features of email classification are temporal, behavioral, single email multinomial valued, content and local and global features. Also datasets, techniques and tools in various email classification like spam, phishing, multifolder and machine generated email classification were studied. Different email classifiers provide different mechanisms for classification. Challenges in email classification are discussed. From the study, it is found that J48 classification algorithm works the best for spam and ham email classification. In comparison with various email service provider, Microsoft Outlook filters the mail based on many criteria.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Mujtaba G, Shuib L, Raj RG, Majeed N, Al-Garadi MA (2017) Email classification research trends: review and open Issues. IEEE Access 5:9044–9064

    Article  Google Scholar 

  2. Alsmadi I, Alhami I (2015) Clustering and classification of email contents. J King Saud Univ Comput Inf Sci 27(1):46–57

    Google Scholar 

  3. Youn S, McLeod D (2007) A comparative study for email classification. Advances and innovations in systems, computing sciences and software engineering. Springer, Dordrecht, pp. 387–391

    Google Scholar 

  4. Tang G, Pei J, Luk WS (2014) Email mining: tasks, common techniques, and tools. Knowl Inf Syst 41(1):1–31

    Article  Google Scholar 

  5. Ailon N, Karnin ZS, Liberty E, Maarek Y (2013) Threading machine generated email. In: Proceedings of 6th ACM international conference on web search data mining, pp 405–414

    Google Scholar 

  6. Smadi S, Aslam N, Zhang L, Alasem R, Hossain MA (2016) Detection of phishing emails using data mining algorithms. In: 9th international conference on software, knowledge, information management and applications

    Google Scholar 

  7. Şentürk Ş, Yerli E, Soǧukpnar İ (2017) Email phishing detection and prevention by using data mining techniques. In: 2nd international conference on computer science and engineering (UBMK), pp 707–712

    Google Scholar 

  8. Aski AS, Sourati NK (2016) Proposed efficient algorithm to filter spam using machine learning techniques. Pacific Sci Rev A Nat Sci Eng 18(2):145–149

    Google Scholar 

  9. Chae MK, Alsadoon A, Prasad PWC, Sreedharan S (2017) Spam filtering email classification (SFECM) using gain and graph mining algorithm. In: 2nd international conference on anti-cyber crimes, pp 217–222

    Google Scholar 

  10. Bekkerman R, McCallum A, Huang G (2004) Automatic categorization of email into folders: benchmark experiments on Enron and SRI corpora. Science 80(418):1–23

    Google Scholar 

  11. Kanja S. Editing training data for multi-label classification with the k-nearest neighbor rule. https://www.hds.utc.fr/~tdenoeux/dokuwiki/_media/en/publi/paaa2015.pdf

  12. Di Castro D (2018) Automated extractions for machine generated mail. In: WWW ’18 companion: the 2018 web conference companion, vol 2, pp 655–662

    Google Scholar 

  13. Sun Y, Garcia-Pueyo L, Wendt JB, Najork M, Broder A (2019) Learning effective embeddings for machine generated emails with applications to email category prediction. In: Proceedings—2018 IEEE international conference on big data (Big Data), vol ii, pp 1846–1855

    Google Scholar 

  14. Brutlag JD, Meek C (2000) Challenges of the email domain for text classification. In: Proceedings of the seventeenth international conference on machine learning

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Namrata Shroff .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Shroff, N., Sinhgala, A. (2021). Email Classification Techniques—A Review. In: Kotecha, K., Piuri, V., Shah, H., Patel, R. (eds) Data Science and Intelligent Applications. Lecture Notes on Data Engineering and Communications Technologies, vol 52. Springer, Singapore. https://doi.org/10.1007/978-981-15-4474-3_21

Download citation

Publish with us

Policies and ethics