Near-Duplicate Video Cleansing Method Based on Locality Sensitive Hashing and the Sorted Neighborhood Method

  • Ou YeEmail author
  • Zhanli Li
  • Yun Zhang
Conference paper
Part of the EAI/Springer Innovations in Communication and Computing book series (EAISICC)


With the wide utilization of intelligent video surveillance technology, increasing amounts of near-duplicate video has been generated, which seriously affects the data quality of the video data set. Cleaning this dirty data automatically from the video data set has become an important issue that needs to be urgently resolved. In this chapter, a near-duplicate video cleansing method based on locality sensitive hashing (LSH) and the sorted neighborhood method (SNM) is presented in an attempt to solve the above problem. First, the speeded-up robust feature is extracted from the video and then the sorted candidate set is built by using LSH; on this basis, the near-duplicate videos are cleaned by using the SNM. Finally, the simulation experiments are implemented to show that the presented method in this chapter is effective, which can be used to clean near-duplicate videos automatically and improve video data quality.


Data quality Dirty data Video cleansing Near-duplicate video LSH SNM 



This work was supported in part by the Shannxi Provincial Department of Education special scientific research project (No.16JK1505).


  1. 1.
    Wang, W., & Zhang, L. (2013). Application and research of security data mining techniques in coal mine mobile video monitoring system (in Chinese). Coal Technology, 9, 101–103.Google Scholar
  2. 2.
    Chikkerur, S., Sundaram, V., Reisslein, M., et al. (2011). Objective video quality assessment methods: A classification, review, and performance comparison. IEEE Transactions on Broadcasting, 57(2), 165–182.CrossRefGoogle Scholar
  3. 3.
    Ringler, A. T., Hagerty, M. T., Holland, J., et al. (2015). The data quality analyzer: A quality control program for seismic data. Computers & Geosciences, 76, 96–111.CrossRefGoogle Scholar
  4. 4.
    Kim, W., Choi, B. J., Hong, E. K., et al. (2003). A taxonomy of dirty data. Data Mining and Knowledge Discovery, 7(1), 81–99.MathSciNetCrossRefGoogle Scholar
  5. 5.
    Wu, X., Ngo, C., Hauptmann, A., et al. (2009). Real-time near-duplicate elimination for web video search with content and context. IEEE Transactions on Multimedia, 11(2), 196–207.CrossRefGoogle Scholar
  6. 6.
    Huang, Z., Shen, H. T., Shao, J., Zhou, X., & Cui, B. (2009). Bounded coordinate system indexing for real time video clip search. ACM Transactions on Information Systems, 27(3), 17–33.CrossRefGoogle Scholar
  7. 7.
    Huang, Z., Hu, B., Cheng, H., Shen, H. T., Liu, H., & Zhou, X. (2010). Mining near-duplicate graph for cluster-based reranking of web video search results. ACM Transactions on Information Systems, 28(4), 22.CrossRefGoogle Scholar
  8. 8.
    Zhou, X., Chen, L., Bouguettaya, A., Xiao, N., & Taylor, J. A. (2009). An efficient near duplicate video shot detection method using shot-based interest points. IEEE Transactions on Multimedia, 11(5), 879–891.CrossRefGoogle Scholar
  9. 9.
    Liu, J., Huang, Z., Cai, H., et al. (2013). Near-duplicate video retrieval: Current research and future trends. ACM Computing Surveys, 45(4), 44–46.CrossRefGoogle Scholar
  10. 10.
    Rahm, E., & Do, H. H. (2000). Data cleaning: Problems and current approach. IEEE Data Engineering Bulletin, 23(4), 3–13.Google Scholar
  11. 11.
    Minnich, A., Abu-El-Rub, N., Gokhale, M., et al. (2016). Clear view: Data cleaning for online review mining. In IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM) (pp. 555–558). San Francisco: IEEE Press.CrossRefGoogle Scholar
  12. 12.
    Zobel, J., & Hoad, T. C. (2006). Detection of video sequences using compact signatures. ACM Transactions on Information Systems, 24(1), 1–50.CrossRefGoogle Scholar
  13. 13.
    Douze, M., Jégou, H., & Schmid, C. (2010). An image-based approach to video copy detection with spatiotemporal post-filtering. IEEE Transactions on Multimedia, 12(4), 257–266.CrossRefGoogle Scholar
  14. 14.
    Liu, S., Zhu, M., & Zheng, Q. (2010). A detection method for near duplicate video clips based on content similarity (in Chinese). Journal of University of Science and Technology of China, 40(11), 1130–1135.Google Scholar
  15. 15.
    Wang, H., & Liu, X. (2012). Near-duplicate web video detection based on locality sensitive hashing (in Chinese). Application Research of Computers, 29(5), 1954–1958.Google Scholar
  16. 16.
    Liu, D., & Zhu, M. (2013). A fast algorithm for near-duplicate video detection (in Chinese). Journal of Chinese Computer Systems, 34(6), 1400–1406.Google Scholar
  17. 17.
    Liu, D., & Zhu, M. (2015). A computationally efficient algorithm for large scale near-duplicate video detection. In International Conference on Multimedia Modeling (MMM 2015) (pp. 481–490). Basel: Springer.Google Scholar
  18. 18.
    Bay, H., Tuytelaars, T., & Van Gool, L. (2008). Speeded-up robust features (SURF). Computer Vision and Image Understanding, 110(3), 346–359.CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.School of Computer Science and Technology, Xi’an University of Science and TechnologyXi’anChina

Personalised recommendations