An Efficient Semantic Document Similarity Calculation Method Based on Double-Relations in Gene Ontology

Hu, Jingyu; Li, Meijing; Zhang, Zijun; Li, Kaitong

doi:10.1007/978-981-13-9710-3_19

Jingyu Hu⁷,
Meijing Li⁷,
Zijun Zhang⁷ &
…
Kaitong Li⁷

Part of the book series: Smart Innovation, Systems and Technologies ((SIST,volume 157))

631 Accesses

Abstract

Semantic text mining is a challenging research topic in recent years. Many types of research focus on measuring the similarity of two documents with ontologies such as Medical Subject Headings (Mesh) and Gene Ontology (GO). However, most of the researches considered the single relationship in an ontology. To represent the document comprehensively, a semantic document similarity calculation method is proposed, based on utilizing Average Maximum Match algorithm with double-relations in GO. In the experiment, the results show that the double-relations based similarity calculation method is better than traditional semantic similarity measurements.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Danushka, B., Georgios, K., Sophia, A.: A cross-lingual similarity measure for detecting biomedical term translations. PLoS One 10(6), 7–15 (2015)
Google Scholar
Spasić, I., Ananiadou, S.: A flexible measure of contextual similarity for biomedical terms. In: Pacific Symposium on Biocomputing Pacific Symposium on Biocomputing, pp. 197–208 (2005)
Google Scholar
Rey-Long, L.: Passage-based bibliographic coupling: an inter-article similarity measure for biomedical articles. PLoS One 10(10), 6–10 (2015)
Google Scholar
Chen, C., Hsieh, S., Weng, Y.: Semantic similarity measure in biomedical domain leverage Web Search Engine. In: 2010 Annual International Conference of the IEEE Engineering in Medicine and Biology (2010)
Google Scholar
Wu, Z., Palmer, M.: Verbs semantics and lexical selection. In: Proceedings of the 32nd Annual Meeting of the Associations for Computational Linguistics (ACL’94), pp. 133–138 (1994)
Google Scholar
Leacock, C., Chodorow, M.: Filling in a sparse training space for word sense identification. In: Proceedings of the 32nd Annual Meeting of the Associations for Computational Linguistics (ACL94), pp. 248–256 (1994)
Google Scholar
Li, Y., Bandar, Z., McLean, D.: An approach for measuring semantic similarity between words using multiple information sources. IEEE Trans. Knowl. Data Eng. Bioinform. 15(4), 871–882 (2003)
Article Google Scholar
Choudhury, J., Kimtani, D.K., Chakrabarty, A.: Text clustering using a word net-based knowledge-base and the Lesk algorithm. Int. J. Comput. Appl. 48(21), 20–24 (2012)
Google Scholar
Lord, P., Stevens, R., Brass, A., Goble, C.: Investigating semantic similarity measures across the gene ontology: the relationship between sequence and annotation. Bioinformatics 19(10), 1275–1283 (2003)
Article Google Scholar
Resnik, O.: Semantic similarity in a taxonomy: an information-based measure and its application to problems of ambiguity and natural language. J. Artif. Intell. Res. Bibliometr. 19(11), 95–130 (1999)
Article Google Scholar
Lin, D.: Principle-based parsing without overgeneration. In: 31st Annual Meeting of the Association for Computational Linguistics, pp. 112–120. Association for Computational Linguistics, USA (1993)
Google Scholar
Zhang, X., Jing, L., Hu, X., et al.: A comparative study of ontology based term similarity measures on PubMed document clustering. In: International Conference on Database Systems, pp. 115–126. Springer, Berlin, Heidelberg (2007)
Google Scholar
Jing, Z., Yuxuan, S., Shengwen, P., Xuhui, L., Hiroshi, M., Shanfeng, Z.: MeSHSim: an R/Bioconductor package for measuring semantic similarity over MeSH headings and MEDLINE documents. J. Bioinform. Comput. (2015) (BioMed Central)
Google Scholar
Logeswari, S., Kandhasamy, P.: Designing a semantic similarity measure for biomedical document clustering. J. Med. Imaging Health Inform. 5(6), 1163–1170 (2015)
Article Google Scholar
The Gene Ontology Resource Home. http://geneontology.org/. Accessed 27 Feb 2019
Wang, J.Z., Du, Z., Payattakool, R., Yu, P.S., Chen, C.F.: A new method to measure the semantic similarity of go terms. Bioinformatics 23(10), 1274–1281 (2007)
Article Google Scholar
Zare, H., Shooshtari, P., Gupta, A., Brinkman, R.: Data reduction for spectral clustering to analyze high throughput flow cytometry data. BMC Bioinform. (2010)
Google Scholar
Dongen, V.: A cluster algorithm for graphs. In: Information Systems, pp. 1–40. CWI (2000)
Google Scholar
Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD’96 Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, pp. 226–231 (1996)
Google Scholar
MacKay, D.: An example inference task: clustering. In: Information Theory, Inference and Learning Algorithms, pp. 284–292. Cambridge University Press (2003)
Google Scholar
Robertson, S.: Understanding inverse document frequency: on theoretical arguments for IDF. J. Doc. 60(5), 503–520 (2004)
Article Google Scholar

Download references

Acknowledgements

This study was supported by the National Natural Science Foundation of China (61702324).

Author information

Authors and Affiliations

College of Information Engineering, Shanghai Maritime University, Shanghai, China
Jingyu Hu, Meijing Li, Zijun Zhang & Kaitong Li

Authors

Jingyu Hu
View author publications
You can also search for this author in PubMed Google Scholar
Meijing Li
View author publications
You can also search for this author in PubMed Google Scholar
Zijun Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Kaitong Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Meijing Li .

Editor information

Editors and Affiliations

College of Computer Science and Engineering, Shandong University of Science and Technology, Qingdao Shi, Shandong, China
Jeng-Shyang Pan
Northeast Electric Power University, Chuanying Qu, Jilin, China
Jianpo Li
Swinburne University of Technology, Hawthorn, Melbourne, Australia
Pei-Wei Tsai
Centre for Artificial Intelligence, University of Technology Sydney, Sydney, NSW, Australia
Lakhmi C. Jain

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hu, J., Li, M., Zhang, Z., Li, K. (2020). An Efficient Semantic Document Similarity Calculation Method Based on Double-Relations in Gene Ontology. In: Pan, JS., Li, J., Tsai, PW., Jain, L. (eds) Advances in Intelligent Information Hiding and Multimedia Signal Processing. Smart Innovation, Systems and Technologies, vol 157. Springer, Singapore. https://doi.org/10.1007/978-981-13-9710-3_19

Download citation

DOI: https://doi.org/10.1007/978-981-13-9710-3_19
Published: 11 July 2019
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-9709-7
Online ISBN: 978-981-13-9710-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics