Skip to main content

Multilingual Entity Matching

  • Conference paper
  • First Online:
Advanced Information Networking and Applications (AINA 2019)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 926))

  • 1799 Accesses

Abstract

The aim of this paper is to explore methods of multilingual entity matching. Name matching is currently the main technique used for entity resolution. When dealing with entities having features recorded in different languages and with different alphabets the basic approaches have serious limitation. The basic name matching approach using string comparison metrics is enriched with phonetic rules and with relational information. The results show that the approach using transliteration enhanced by phonetic matching provides with the best performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.wikidata.org/.

References

  1. Adafre, S.F., de Rijke, M.: Discovering missing links in Wikipedia. In: Proceedings of the 3rd international workshop on Link discovery, pp. 90–97. ACM (2005)

    Google Scholar 

  2. Bach, S.H., Broecheler, M., Huang, B., Getoor, L.: Hinge-loss markov random fields and probabilistic soft logic. arXiv preprint arXiv:1505.04406 (2015)

  3. Beider, A.: Beider-morse phonetic matching: an alternative to Soundex with fewer false hits. Avotaynu: Int. Rev. Jewish Geneal. (2008)

    Google Scholar 

  4. Bhattacharya, I., Getoor, L.: Collective entity resolution in relational data. ACM Trans. Knowl. Discov. Data (TKDD) 1(1), 5 (2007)

    Article  Google Scholar 

  5. Bilenko, M., Mooney, R., Cohen, W., Ravikumar, P., Fienberg, S.: Adaptive name matching in information integration. IEEE Intell. Syst. 18(5), 16–23 (2003)

    Article  Google Scholar 

  6. Christen, P.: Data Matching: Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection. Springer, Heidelberg (2012)

    Book  Google Scholar 

  7. Cohen, W., Ravikumar, P., Fienberg, S.: A comparison of string metrics for matching names and records. In: KDD Workshop on Data Cleaning and Object Consolidation, vol. 3, pp. 73–78 (2003)

    Google Scholar 

  8. Dong, X., Halevy, A., Madhavan, J.: Reference reconciliation in complex information spaces. In: Proceedings of the 2005 ACM SIGMOD international conference on Management of data, pp. 85–96. ACM (2005)

    Google Scholar 

  9. Goergen, A., Ashida, S., Skapinsky, K., De Heer, H., Wilkinson, A., Koehly, L.: Knowledge is power: improving family health history knowledge of diabetes and heart disease among multigenerational mexican origin families. Public Health Genomics 19(2), 93–101 (2016)

    Article  Google Scholar 

  10. Han, X., Sun, L., Zhao, J.: Collective entity linking in web text: a graph-based method. In: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 765–774. ACM (2011)

    Google Scholar 

  11. Kimmig, A., Bach, S., Broecheler, M., Huang, B., Getoor, L.: A short introduction to probabilistic soft logic. In: Proceedings of the NIPS Workshop on Probabilistic Programming: Foundations and Applications, pp. 1–4 (2012)

    Google Scholar 

  12. Kouki, P., Pujara, J., Marcum, C., Koehly, L., Getoor, L.: Collective entity resolution in familial networks. In: 2017 IEEE International Conference on Data Mining (ICDM), pp. 227–236. IEEE (2017)

    Google Scholar 

  13. Levenshtein, V.: Binary codes capable of correcting spurious insertions and deletion of ones. Probl. Inf. Transm. 1(1), 8–17 (1965)

    Google Scholar 

  14. Li, C.W.C.: Foreign names into native tongues: how to transfer sound between languages-transliteration, phonological translation, nativization, and implications for translation theory. Target. Int. J. Transl. Stud. 19(1), 45–68 (2007)

    Google Scholar 

  15. Mokotoff, G.: Soundexing and genealogy (2007). http://www.avotaynu.com/soundex.html

  16. Moore, G.B.: Accessing Individual Records from Personal Data Files Using Non-unique Identifiers, vol. 13. US Department of Commerce, National Bureau of Standards (1977)

    Google Scholar 

  17. Patman, F., Thompson, P.: Names: a new frontier in text mining. In: International Conference on Intelligence and Security Informatics, pp. 27–38. Springer (2003)

    Google Scholar 

  18. Peng, T., Li, L., Kennedy, J.: A comparison of techniques for name matching. GSTF J. Comput. (JoC) 2(1), 55–61 (2012)

    Google Scholar 

  19. Philips, L.: The double metaphone search algorithm. C/C++ Users J. 18(6), 38–43 (2000)

    MathSciNet  Google Scholar 

  20. Russell, R.: Index. US Patent 1,261,167 (1918). https://www.google.com/patents/US1261167

  21. Saıs, F., Pernelle, N., Rousset, M.C.: L2R: a logical method for reference reconciliation. In: Proceedings of the AAAI, pp. 329–334 (2007)

    Google Scholar 

  22. Singla, P., Domingos, P.: Entity resolution with markov logic. In: Sixth International Conference on Data Mining, ICDM 2006, pp. 572–582. IEEE (2006)

    Google Scholar 

  23. Winkler, W.E.: The state of record linkage and current research problems. In: Statistical Research Division, US Census Bureau. Citeseer (1999)

    Google Scholar 

  24. Winkler, W.E.: Overview of record linkage and current research directions. In: Bureau of the Census. Citeseer (2006)

    Google Scholar 

  25. Yermolovich, D.: Imena sobstvennyye na styke yazykov i kultur [proper names across languages and cultures]. R. Valent, Moscow (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ilgiz Mustafin .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Mustafin, I., Frunza, MC., Lee, J. (2020). Multilingual Entity Matching. In: Barolli, L., Takizawa, M., Xhafa, F., Enokido, T. (eds) Advanced Information Networking and Applications. AINA 2019. Advances in Intelligent Systems and Computing, vol 926. Springer, Cham. https://doi.org/10.1007/978-3-030-15032-7_68

Download citation

Publish with us

Policies and ethics