KNN-Based Pseudo-supervised RCNN Framework for Text Clustering

Chen, Zhi; Guo, Wu

doi:10.1007/978-3-030-32591-6_10

KNN-Based Pseudo-supervised RCNN Framework for Text Clustering

Zhi Chen¹⁸ &
Wu Guo¹⁸

Conference paper
First Online: 07 November 2019

1296 Accesses
1 Citations

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1075))

Abstract

This paper explores the application of recurrent convolutional neural networks (RCNN) to text clustering, an unsupervised task in natural language processing (NLP). The RCNN is trained with pseudo-labels that are generated by pre-clustering on unsupervised document representations. To enhance the quality of pseudo-labels, the K-Nearest Neighbors (KNN) algorithm is used to select training samples for the neural network. After the deep feature representations of all documents have been obtained using the trained RCNN, the agglomerative hierarchical clustering (AHC) algorithm is used to cluster them. The experimental results on two public databases show that the proposed approach significantly boosts the performance of text clustering.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 219.00; Price excludes VAT (USA)

Softcover Book: USD 279.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manag. 24(5), 513–523 (1988)
Article Google Scholar
Hofmann, T.: Unsupervised learning by probabilistic latent semantic analysis. Mach. Learn. 42(1–2), 177–196 (2001)
Article Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
MATH Google Scholar
Larochelle, H., Lauly, S.: A neural autoregressive topic model. In: Advances in Neural Information Processing Systems, pp. 2708–2716 (2012)
Google Scholar
Kim, Y.: Convolutional neural networks for sentence classification (2014). arXiv Preprint arXiv:1408.5882
Kalchbrenner, N., Grefenstette, E., Blunsom, P.: A convolutional neural network for modelling sentences (2014). arXiv Preprint arXiv:1404.2188
Wang, S., Huang, M., Deng, Z.: Densely connected CNN with multi-scale feature attention for text classification. In: IJCAI, pp. 4468–4474 (2018)
Google Scholar
Tang, D., Qin, B., Liu, T.: Document modeling with gated recurrent neural network for sentiment classification. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 1422–1432 (2015)
Google Scholar
Lee, J.Y., Dernoncourt, F.: Sequential short-text classification with recurrent and convolutional neural networks (2016). arXiv Preprint arXiv:1603.03827
Wang, B.: Disconnected recurrent neural networks for text categorization. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, pp. 2311–2320 (2018)
Google Scholar
Lai, S., Xu, L., Liu, K., et al.: Recurrent convolutional neural networks for text classification. In: Twenty-Ninth AAAI Conference on Artificial Intelligence (2015)
Google Scholar
Wen, Y., Zhang, W., Luo, R., et al.: Learning text representation using recurrent convolutional neural network with highway layers (2016). arXiv Preprint arXiv:1606.06905
Data Mining and Knowledge Discovery Handbook (2005)
Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Cieri, C., Miller, D., Walker, K.: The Fisher Corpus: a resource for the next generations of speech-to-text. In: LREC, vol. 4, pp. 69–71 (2004)
Google Scholar
Xu, J., Xu, B., Wang, P., et al.: Self-taught convolutional neural networks for short text clustering. Neural Netw. 88, 22–31 (2017)
Article Google Scholar
Kesiraju, S., Burget, L., Szöke, I., et al.: Learning document representations using subspace multinomial model. In: INTERSPEECH, pp. 700–704 (2016)
Google Scholar
Nguyen, D.Q., Billingsley, R., Du, L., et al.: Improving topic models with latent feature word representations. Trans. Assoc. Comput. Linguist. 3, 299–313 (2015)
Article Google Scholar
Xie, P., Xing, E.P.: Integrating document clustering and topic modeling (2013). arXiv Preprint arXiv:1309.6874
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization (2014). arXiv Preprint arXiv:1412.6980
Xie, J., Girshick, R., Farhadi, A.: Unsupervised deep embedding for clustering analysis. In: International Conference on Machine Learning, pp. 478–487 (2016)
Google Scholar
Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

National Engineering Laboratory for Speech and Language Information Processing, University of Science and Technology of China, Hefei, China
Zhi Chen & Wu Guo

Authors

Zhi Chen
View author publications
You can also search for this author in PubMed Google Scholar
Wu Guo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Zhi Chen or Wu Guo .

Editor information

Editors and Affiliations

The University of Aizu, Aizuwakamatsu, Japan
Yong Liu
School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore, Singapore
Lipo Wang
Computer Science and Mathematics, University of Sao Paulo, Ribeirao Preto, Brazil
Liang Zhao
School of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, China
Zhengtao Yu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, Z., Guo, W. (2020). KNN-Based Pseudo-supervised RCNN Framework for Text Clustering. In: Liu, Y., Wang, L., Zhao, L., Yu, Z. (eds) Advances in Natural Computation, Fuzzy Systems and Knowledge Discovery. ICNC-FSKD 2019. Advances in Intelligent Systems and Computing, vol 1075. Springer, Cham. https://doi.org/10.1007/978-3-030-32591-6_10

Download citation

DOI: https://doi.org/10.1007/978-3-030-32591-6_10
Published: 07 November 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-32590-9
Online ISBN: 978-3-030-32591-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics