Skip to main content
Log in

Meta-path-based outlier detection in heterogeneous information network

  • Research Article
  • Published:
Frontiers of Computer Science Aims and scope Submit manuscript

Abstract

Mining outliers in heterogeneous networks is crucial to many applications, but challenges abound. In this paper, we focus on identifying meta-path-based outliers in heterogeneous information network (HIN), and calculate the similarity between different types of objects. We propose a meta-path-based outlier detection method (MPOutliers) in heterogeneous information network to deal with problems in one go under a unified framework. MPOutliers calculates the heterogeneous reachable probability by combining different types of objects and their relationships. It discovers the semantic information among nodes in heterogeneous networks, instead of only considering the network structure. It also computes the closeness degree between nodes with the same type, which extends the whole heterogeneous network. Moreover, each node is assigned with a reliable weighting to measure its authority degree. Substantial experiments on two real datasets (AMiner and Movies dataset) show that our proposed method is very effective and efficient for outlier detection.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Hawkins D M. Identification of Outliers. 1st ed. Berlin: Springer, 1980

    Book  MATH  Google Scholar 

  2. Dalmia A, Gupta M, Varma V. Query-based evolutionary graph cuboid outlier detection. In: Proceedings of the 16th International Conference on Data Mining Workshops. 2016, 85–92

    Google Scholar 

  3. Kaur R, Singh S. A survey of data mining and social network analysis based anomaly detection techniques. Egyptian Informatics Journal, 2016, 17(2): 199–216

    Article  Google Scholar 

  4. Shi C, Li, Y, Zhang J, Sun Y, Yu P S. A survey of heterogeneous information network analysis. IEEE Transactions on Knowledge and Data Engineering, 2017, 29(1): 17–37

    Article  Google Scholar 

  5. Pio G, Serafino F, Malerba D, Ceci M. Multi-type clustering and classification from heterogeneous networks. Information Sciences, 2018, 425: 107–126

    Article  MathSciNet  Google Scholar 

  6. Wu S, Wang S. Information-theoretic outlier detection for large-scale categorical data. IEEE Transactions on Knowledge and Data Engineering, 2013, 25(3): 589–602

    Article  Google Scholar 

  7. Vijayarani S, Jothi P. A hybrid clustering algorithm for outlier detection in data streams. International Journal of Grid and Distributed Computing, 2016, 9(11): 285–295

    Article  Google Scholar 

  8. Dai H, Zhu F, Lim E P, Pang H. Detecting anomaly collections using extreme feature ranks. Data Mining and Knowledge Discovery, 2015, 29(3): 689–731

    Article  MathSciNet  MATH  Google Scholar 

  9. Rasheed F, Alhajj R. A framework for periodic outlier pattern detection in time-series sequences. IEEE Transactions on Cybernetics, 2014, 44(5): 569–582

    Article  Google Scholar 

  10. Gupta M, Gao J, Aggarwal C, Han J. Community distribution outlier detection in heterogeneous information networks. In: Proceedings of European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. 2013, 557–573

    Google Scholar 

  11. Gupta M, Gao J, Yan X, Cam H, Han J. On detecting association-based clique outliers in heterogeneous information networks. In: Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. 2013, 108–115

    Google Scholar 

  12. Gupta M, Mallya A, Roy S, Cho J H D, Han J. Local learning for mining outlier subgraphs from network datasets. In: Proceedings of the 2014 SIAM International Conference on Data Mining. 2014, 73–81

    Chapter  Google Scholar 

  13. Gao J, Liang F, Fan W, Wang C, Sun Y, Han J. On community outliers and their efficient detection in information networks. In: Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2010, 813–822

    Google Scholar 

  14. Yao Z, Mark P, Rabbat M. Anomaly detection using proximity graph and PageRank algorithm. IEEE Transactions on Information Forensics and Security, 2012, 7(4): 1288–1300

    Article  Google Scholar 

  15. Radovanovic M, Nanopoulos A, Ivanovic M. Reverse nearest neighbors in unsupervised distance-based outlier detection. IEEE Transactions on Knowledge and Data Engineering, 2015, 27(5): 1369–1382

    Article  Google Scholar 

  16. Gupta M, Gao J, Aggarwal C C, Han J. Outlier Detection for Temporal Data. San Rafael, California: Morgan & Claypool Publishers, 2014

    Book  MATH  Google Scholar 

  17. Gupta M, Gao J, Aggarwal C C, Han J. Outlier detection for temporal data: a survey. IEEE Transactions on Data and Engineering, 2014, 26(9): 2250–2267

    Article  MATH  Google Scholar 

  18. Zhang J, Li H, Gao Q, Wang H, Luo Y. Detecting anomalies from big network traffic data using an adaptive detection approach. Information Sciences, 2015, 318: 91–110

    Article  MathSciNet  Google Scholar 

  19. Aggarwal C C, Zhao Y, Yu P S. Outlier detection in graph streams. In: Proceedings of International Conference on Data Engineering. 2011, 399–409

    Google Scholar 

  20. Akoglu L, Tong H, Koutra D. Graph based anomaly detection and description: a survey. Data Mining and Knowledge Discovery, 2015, 29(3): 626–688

    Article  MathSciNet  Google Scholar 

  21. Yin S N, Kang H S, Kim S R. Clustering algorithm based on outlier detection for anomaly intrusion detection. Journal of Internet Technology, 2016, 17(2): 291–299

    Google Scholar 

  22. Gupta M, Gao J, Sun Y, Han J. Integrating community matching and outlier detection for mining evolutionary community outliers. In: Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2012, 859–867

    Google Scholar 

  23. Zhuang H, Zhang J, Brova G, Tang J, Cam H, Yan, X, Han J. Mining query-based subnetwork outliers in heterogeneous information networks. In: Proceedings of IEEE International Conference on Data Mining. 2014, 1127–1132

    Google Scholar 

  24. Kuck J, Zhuang H, Yan X, Cam H, Han J. Query-based outlier detection in heterogeneous information networks. In: Proceedings of the 18th International Conference on Extending Database Technology. 2015, 325–336

    Google Scholar 

  25. Kim S, Cho N W, Lee Y J, Kang S H, Kim T. Application of densitybased outlier detection to database activity monitoring. Information Systems Frontiers, 2013, 15(1): 55–65

    Article  Google Scholar 

  26. Liu S, Chen L, Ni L M. Anomaly detection from incomplete data. ACM Transactions on Knowledge Discovery from Data, 2014, 9(2): 11

    Article  Google Scholar 

  27. Rahmani A, Afra S, Zarour O. Graph-based approach for outlier detection in sequential data and its application on stock market and weather data. Knowledge-based Systems, 2014, 61: 89–97

    Article  Google Scholar 

  28. Cao X, Zheng Y, Shi C, Li J, Wu B. Link prediction in schema-rich heterogeneous information network. In: Proceedings of the 20th Pacific-Asia Conference on Knowledge Discovery and Data Mining. 2016, 449–460

    Chapter  Google Scholar 

  29. Shi C, Kong X, Huang Y, Yu P S. HeteSim: a general framework for relevance measure in heterogeneous networks. IEEE Transactions on Knowledge and Data Engineering, 2014, 26(10): 2479–2492

    Article  Google Scholar 

  30. Shi C, Liu J, Zhuang F, Yu P S, Wu B. Integrating heterogeneous information via flexible regularization framework for recommendation. Knowledge and Information Systems, 2016, 49(3): 835–859

    Article  Google Scholar 

  31. Gunes I, Gunduz-Oguducu S, Cataltepe Z. Link prediction using time series of neighborhood-based node similarity scores. Data Mining and Knowledge Discovery, 2016, 30(1): 147–180

    Article  MathSciNet  MATH  Google Scholar 

  32. Sun Y, Han J, Yan X, Yu P S, Wu T. PathSim: meta path-based top-k similarity search in heterogeneous information networks. In: Proceedings of International Conference on Very Large Databases. 2011, 992–1003

    Google Scholar 

  33. Tang J, Zhang J, Yao L, Li J, Zhang L, Su Z. ArnetMiner: extraction and mining of academic social networks. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge discovery and Data Mining. 2008: 990–998

    Chapter  Google Scholar 

  34. Peng T, Liu L. Focused crawling enhanced by CBP-SLC. Knowledge-based Systems, 2013, 51: 15–26

    Article  Google Scholar 

Download references

Acknowledgments

This work was supported by the National Natural Science Foundation of China (Grant Nos. 61872163 and 61806084), China Postdoctoral Science Foundation project (2018M631872), and Jilin Provincial Education Department project (JJKH20190160KJ).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lu Liu.

Additional information

Lu Liu received her PhD in computer science from Jilin University, China in 2017. She is currently a postdoctor at the College of Software, Jilin University, China. Her research interests include data mining, Web mining, and machine learning.

Shang Wang received her PhD in computer science from Jilin University, China in 2010. She is currently a visiting scholar in New Jersey Institute of Technology, USA. Her research interests include image processing and computer graphics.

Electronic supplementary material

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, L., Wang, S. Meta-path-based outlier detection in heterogeneous information network. Front. Comput. Sci. 14, 388–403 (2020). https://doi.org/10.1007/s11704-018-7289-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11704-018-7289-4

Keywords

Navigation