A unified latent variable model for contrastive opinion mining

Ibeke, Ebuka; Lin, Chenghua; Wyner, Adam; Barawi, Mohamad Hardyman

doi:10.1007/s11704-018-7073-5

A unified latent variable model for contrastive opinion mining

Research Article
Published: 30 August 2019

Volume 14, pages 404–416, (2020)
Cite this article

Frontiers of Computer Science Aims and scope Submit manuscript

Ebuka Ibeke¹,
Chenghua Lin¹,
Adam Wyner¹ &
…
Mohamad Hardyman Barawi¹

360 Accesses
8 Citations
Explore all metrics

Abstract

There are large and growing textual corpora in which people express contrastive opinions about the same topic. This has led to an increasing number of studies about contrastive opinion mining. However, there are several notable issues with the existing studies. They mostly focus on mining contrastive opinions from multiple data collections, which need to be separated into their respective collections beforehand. In addition, existing models are opaque in terms of the relationship between topics that are extracted and the sentences in the corpus which express the topics; this opacity does not help us understand the opinions expressed in the corpus. Finally, contrastive opinion is mostly analysed qualitatively rather than quantitatively. This paper addresses these matters and proposes a novel unified latent variable model (contraLDA), which: mines contrastive opinions from both single and multiple data collections, extracts the sentences that project the contrastive opinion, and measures the strength of opinion contrastiveness towards the extracted topics. Experimental results show the effectiveness of our model in mining contrasted opinions, which outperformed our baselines in extracting coherent and informative sentiment-bearing topics. We further show the accuracy of our model in classifying topics and sentiments of textual data, and we compared our results to five strong baselines.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A survey on sentiment analysis methods, applications, and challenges

Article 07 February 2022

Mayur Wankhade, Annavarapu Chandra Sekhara Rao & Chaitanya Kulkarni

Latent Dirichlet allocation (LDA) and topic modeling: models, applications, a survey

Article 28 November 2018

Hamed Jelodar, Yongli Wang, … Liang Zhao

Sentiment Analysis in the Age of Generative AI

Article Open access 05 March 2024

Jan Ole Krugmann & Jochen Hartmann

References

Fang Y, Si L, Somasundaram N, Yu Z. Mining contrastive opinions on political texts using cross-perspective topic model. In: Proceedings of the International Conference on Web Search and Data Mining. 2012, 63–72
Google Scholar
Trabelsi A, Zaïane O R. A joint topic viewpoint model for contention analysis. In: Proceedings of the International Conference on Applications of Natural Language to Data Bases/Information Systems. 2014, 114–125
Google Scholar
Lerman K, McDonald R. Contrastive summarization: an experiment with consumer reviews. In: Proceedings of the HLT Annual Conference of the North American Chapter of the Association for Computational Linguistics. 2009, 113–116
Google Scholar
Paul M, Girju R. Cross-cultural analysis of blogs and forums with mixed-collection topic models. In: Proceedings of the ACL Conference on Empirical Methods in Natural Language Processing. 2009, 1408–1417
Google Scholar
Elahi M F, Monachesi P. An examination of cross-cultural similarities and differences from social media data with respect to language use. In: Proceedings of the 8th International Conference on Language Resources and Evaluation. 2012, 4080–4086
Google Scholar
Ibeke E, Lin C, Wyner A, Barawi M H. Extracting and understanding contrastive opinion through topic relevant sentences. In: Proceedings of the 8th International Joint Conference on Natural Language Processing. 2017, 395–400
Google Scholar
Barawi M H, Lin C, Siddharthan A. Automatically labelling sentiment-bearing topics with descriptive sentence labels. In: Proceedings of the 22nd International Conference on Natural Language and Information Systems. 2017, 299–312
Google Scholar
Zhai C, Velivelli A, Yu B. A cross-collection mixture model for comparative text mining. In: Proceedings of the International Conference on Knowledge Discovery and Data Mining. 2004, 743–748
Google Scholar
Ahmed A, Xing E P. Staying informed: supervised and semi-supervised multi-view topical analysis of ideological perspective. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing. 2010, 1140–1150
Google Scholar
Mukherjee A, Liu B. Mining contentions from discussions and debates. In: Proceedings of the International Conference on Knowledge Discovery and Data Mining. 2012, 841–849
Google Scholar
Thonet T, Cabanac G, Boughanem M, Pinel-Sauvagnat K. VODUM: a topic model unifying viewpoint, topic and opinion discovery. In: Proceedings of the European Conference on Information Retrieval. 2016, 533–545
Google Scholar
Paul M J, Zhai C, Girju R. Summarizing contrastive viewpoints in opinionated text. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing. 2010, 66–76
Google Scholar
Paul M J, Girju R. A two-dimensional topic-aspect model for discovering multi-faceted topics. Urbana, 2010, 51(61801): 36
Google Scholar
Guo J, Lu Y, Mori T, Blake C. Expert-guided contrastive opinion summarization for controversial issues. In: Proceedings of the 24th International Conference on World Wide Web Companion. 2015, 1105–1110
Chapter Google Scholar
Ren Z, de Rijke M. Summarizing contrastive themes via hierarchical non-parametric processes. In: Proceedings of the 38th SIGIR International Conference on Research and Development in Information Retrieval. 2015, 93–102
Google Scholar
He L, Li W, Zhuge H. Exploring differential topic models for comparative summarization of scientific papers. In: Proceedings of COLING International Conference on Computational Linguistics. 2016, 1028–1038
Google Scholar
Nakasaki H, Kawaba M, Utsuro T, Fukuhara T. Mining cross-lingual/cross-cultural differences in concerns and opinions in blogs. In: Proceedings of the 22nd International Conferee on Computer Processing of Oriental Languages. Language Technology for the Knowledge-based Economy. 2009, 213–224
Chapter Google Scholar
Guo H, Zhu H, Guo Z, Zhang X, Su Z. Opinionlt: a text mining system for cross-lingual opinion analysis. In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management. 2010, 1199–1208
Google Scholar
Gutiérrez E D, Shutova E, Lichtenstein P, de Melo G, Gilardi L. Detecting cross-cultural differences using a multil, 2016, 4: 47–60
Google Scholar
Lin C, He Y. Joint sentiment/topic model for sentiment analysis. In: Proceedings of the 18th ACM conference on Information and Knowledge Management. 2009, 375–384
Google Scholar
Lin C, He Y, Everson R, Ruger S. Weakly supervised joint sentiment-topic detection from text. Journal of IEEE Transactions on Knowledge and Data Engineering, 2012, 24(6): 1134–1145
Article Google Scholar
Wallach H, Mimno D, McCallum A. Rethinking LDA: why priors matter. Advances in Neural Information Processing Systems, 2009, 22: 1973–1981
Google Scholar
Minka T. Estimating a dirichlet distribution. Technical Report, 2003
Google Scholar
Ibeke E, Lin C, Coe C, Wyner A, Liu D, Barawi M H, Yusof N F A. A curated corpus for sentiment-topic analysis. In: Proceedings of the LREC 2016 Workshop on Emotion and Sentiment Analysis. 2016, 32–39
Google Scholar
Chang J, Gerrish S, Wang C, Boyd-Graber J L, Jordan L, Blei D M. Reading tea leaves: how humans interpret topic models. In: Proceedings of the 22nd Conference on Neural Information Processing Systems. 2009, 288–296
Google Scholar
Mimno D, Wallach H M, Talley E, Leenders M, McCallum A. Optimizing semantic coherence in topic models. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing. 2011, 262–272
Google Scholar
Bouma G. Normalized (pointwise) mutual information in collocation extraction. In: Proceedings of the German Society for Computational Linguistics and Language Technology. 2009, 31–40
Google Scholar
Newman D, Lau J H, Grieser K, Baldwin T. Automatic evaluation of topic coherence. In: Proceedings of the HLT Annual Conference of the North American Chapter of the Association for Computational Linguistics. 2010, 100–108
Google Scholar
Aletras N, Stevenson M. Evaluating topic coherence using distributional semantics. In: Proceedings of the 10th International Conference on Computational Semantics. 2013, 13–22
Google Scholar
Steyvers M, Griffiths T. Probabilistic topic models. Handbook of Latent Semantic Analysis, 2007, 427(7): 424–440
Google Scholar
Cano A B, He Y, Xu R. Automatic labelling of topic models learned from Twitter by summarisation. In: Proceedings of the 52nd Annual Meeting of Association for Computational Linguists. 2014, 618–624
Google Scholar
Ramage D, Hall D, Nallapati R, Manning C D. Labeled LDA: a supervised topic model for credit attribution in multi-labeled corpora. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing. 2009, 248–256
Google Scholar
Mcauliffe J D, Blei D M. Supervised topic models. In: Proceedings of the 21st Annual Conference on in Neural Information Processing Systems. 2008, 121–128
Google Scholar
Kim Y. Convolutional neural networks for sentence classification. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. 2014, 1746–1751
Google Scholar
Ganu G, Marian A, Elhadad N. URSA-user review structure analysis: understanding online reviewing trends. DCS Technical Report: Cite-seer, 2010
Google Scholar

Download references

Acknowledgments

This work was supported by the award made by the UK Engineering and Physical Sciences Research Council (EP/P005810/1).

Author information

Authors and Affiliations

Department of Computing Science, University of Aberdeen, Aberdeen, AB24 3FX, UK
Ebuka Ibeke, Chenghua Lin, Adam Wyner & Mohamad Hardyman Barawi

Authors

Ebuka Ibeke
View author publications
You can also search for this author in PubMed Google Scholar
Chenghua Lin
View author publications
You can also search for this author in PubMed Google Scholar
Adam Wyner
View author publications
You can also search for this author in PubMed Google Scholar
Mohamad Hardyman Barawi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chenghua Lin.

Additional information

Ebuka Ibeke is currently a PhD student at the University of Aberdeen and a Part-time Lecturer at the School of Computing Science and Digital Media, Robert Gordon University, UK. He received his BSc degree (2.1 division) in Computer Science from Nnamdi Azikiwe University, Nigeria in 2005 and an MSc degree (1st class Honours) in Information Engineering from Robert Gordon University, UK in 2011. His research interests include sentiment analysis, topic modelling, argumentation mining, and natural language generation/processing.

Chenghua Lin received the BEng degree in electrical engineering and automation from Beihang University, China in 2006, the MEng degree (First class Honours) in electronic engineering from the University of Reading (2007), and the PhD degree in computer science from the University of Exeter (2011). Currently, he is a SICSA Senior Lecturer (Associate Professor) in Computing Science at the University of Aberdeen, UK. His current research interests include integration of machine learning and natural language processing for sentiment analysis, intention mining, text summarisation, and natural language generation.

Adam Wyner is a Lecturer in Computing Science at the University of Aberdeen, UK. He has a PhD in Linguistics (Cornell University, USA, 1994) on the formal syntax and semantics of adverbial modification as well as a second PhD in Computer Science (King's College London, UK, 2008) on the representation and automation of legal concepts for e-contracting. He has worked as a research associate on two EU projects to formalise the law and argumentation to automatically process and to support policymaking. He is currently a co-investigator in a project to make historical legal texts machine readable. He has numerous publications on legal informatics, text analysis, language processing, and argumentation.

Mohamad Hardyman Barawi is currently a PhD student at the University of Aberdeen, UK and a Lecturer at the University Malaysia Sarawak, Malaysia. Prior to the academic post, he was operation support speciahst for Hewlett Packard Malaysia. He received his BSc degree (Hons) in Computer Science from University Putra Malaysia in 2003 and an MLIS degree in Library and Information Science from University of Malaya, Malaysia in 2006. His research interests include topic modelling, sentiment analysis, and text summarisation.

Electronic supplementary material