Skip to main content

GPU_MF_SGD: A Novel GPU-Based Stochastic Gradient Descent Method for Matrix Factorization

  • Conference paper
  • First Online:
Advances in Information and Communication Networks (FICC 2018)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 887))

Included in the following conference series:

Abstract

Recommender systems are used in most of nowadays applications. Providing real-time suggestions with high accuracy is considered as one of the most crucial challenges that face them. Matrix factorization (MF) is an effective technique for recommender systems as it improves the accuracy. Stochastic Gradient Descent (SGD) for MF is the most popular approach used to speed up MF. SGD is a sequential algorithm, which is not trivial to be parallelized, especially for large-scale problems. Recently, many researches have proposed parallel methods for parallelizing SGD. In this research, we propose GPU_MF_SGD, a novel GPU-based method for large-scale recommender systems. GPU_MF_SGD utilizes Graphics Processing Unit (GPU) resources by ensuring load balancing and linear scalability, and achieving coalesced access of global memory without preprocessing phase. Our method demonstrates 3.1X–5.4X speedup over the most state-of-the-art GPU method, CuMF_SGD.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ricci, F., et al.: Recommender Systems Handbook. Springer, New York (2011)

    Book  Google Scholar 

  2. Ekstrand, M.D., et al.: Collaborative filtering recommender systems. Found. Trends Hum. Comput. Interact. 4(2), 81–173 (2011)

    Article  Google Scholar 

  3. Poriya, A., et al.: Non-personalized recommender systems and user-based collaborative recommender systems. Int. J. Appl. Inf. Syst. 6(9), 22–27 (2014)

    Google Scholar 

  4. Aamir, M., Bhusry, M.: Recommendation system: state of the art approach. Int. J. Comput. Appl. 120, 25–32 (2015)

    Google Scholar 

  5. Recommender System. https://en.wikipedia.org/wiki/Recommender_system. Accessed 11 July 2017

  6. Jin, J., et al.: GPUSGD: a GPU-accelerated stochastic gradient descent algorithm for matrix factorization. Concurr. Comput. Pract. Exp. 28, 3844–3865 (2016)

    Article  Google Scholar 

  7. Xie, X., et al.: CuMF_SGD: parallelized stochastic gradient descent for matrix factorization on GPUs. In: Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing. ACM (2017)

    Google Scholar 

  8. Li, H., et al.: MSGD: a novel matrix factorization approach for large-scale collaborative filtering recommender systems on GPUs. IEEE Trans. Parallel Distrib. Syst. 29(7), 1530–1544 (2018)

    Article  Google Scholar 

  9. Nassar, M.A., El-Sayed, L.A.A., Taha, Y.: Efficient parallel stochastic gradient descent for matrix factorization using GPU. In: 2016 11th International Conference for Internet Technology and Secured Transactions (ICITST). IEEE (2016)

    Google Scholar 

  10. Wen, Z.: Recommendation system based on collaborative filtering. In: CS229 Lecture Notes, Stanford University, December 2008

    Google Scholar 

  11. Leskovec, J., et al.: Mining of Massive Datasets, Chap. 9, pp. 307–340. Cambridge University Press, Cambridge (2014)

    Google Scholar 

  12. Koren, Y., Bell, R., Volinsky, C.: Matrix factorization techniques for recommender systems. Computer 42(8), 30–37 (2009)

    Article  Google Scholar 

  13. Kaleem, R., et al.: Stochastic gradient descent on GPUs. In: Proceedings of the 8th Workshop on General Purpose Processing Using GPUs, pp. 81–89 (2015)

    Google Scholar 

  14. Konstan, J.A., Riedl, J.: Recommender systems: from algorithms to user experience. User Model. User Adap. Inter. 22(1), 101–123 (2012)

    Article  Google Scholar 

  15. Anastasiu, D.C., et al.: Big Data and Recommender Systems (2016)

    Google Scholar 

  16. Melville, P., Sindhwani, V.: Recommender systems. In: Sammut, C., Webb, G.I. (eds.) Encyclopedia of Machine Learning, pp. 829–838. Springer, New York (2011)

    Google Scholar 

  17. Kant, V., Bharadwaj, K.K.: Enhancing recommendation quality of content-based filtering through collaborative predictions and fuzzy similarity measures. J. Proc. Eng. 38, 939–944 (2012)

    Article  Google Scholar 

  18. Ma, A., et al.: A FPGA-based accelerator for neighborhood-based collaborative filtering recommendation algorithms. In: Proceedings of IEEE International Conference on Cluster Computing, pp. 494–495, September 2015

    Google Scholar 

  19. Anthony, V., Ayala, A., et al.: Speeding up collaborative filtering with parametrized preprocessing. In: Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Sydney, Australia, August 2015

    Google Scholar 

  20. Gates, M., et al.: Accelerating collaborative filtering using concepts from high performance computing. In: IEEE International Conference in Big Data (Big Data) (2015)

    Google Scholar 

  21. Wang, Z., et al.: A CUDA-enabled parallel implementation of collaborative filtering. Proc. Comput. Sci. 30, 66–74 (2014)

    Article  Google Scholar 

  22. Gemulla, R., Nijkamp, E., Haas, P.J., Sismanis, Y.: Large-scale matrix factorization with distributed stochastic gradient descent. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM (2011)

    Google Scholar 

  23. Chin, W.-S., et al.: A fast parallel stochastic gradient method for matrix factorization in shared memory systems. ACM Trans. Intell. Syst. Technol. 6(1), 2 (2015)

    Article  Google Scholar 

  24. Zastrau, D., Edelkamp, S.: Stochastic gradient descent with GPGPU. In: Proceedings of the 35th Annual German Conference on Advances in Artificial Intelligence (KI’12), pp. 193–204 (2012)

    Google Scholar 

  25. Shah, A., Majumdar, A.: Accelerating low-rank matrix completion on GPUs. In: Proceedings of International Conference on Advances in Computing, Communications and Informatics, December 2014

    Google Scholar 

  26. Kato, K., Hosino, T.: Singular value decomposition for collaborative filtering on a GPU. IOP Conf. Ser. Mater. Sci. Eng. 10(1), 012017 (2010)

    Article  Google Scholar 

  27. Foster, B., et al.: A GPU-based approximate SVD algorithm. In: Proceedings of the 9th International Conference on Parallel Processing and Applied Mathematics, vol. 1, pp. 569–578. Springer, Berlin (2012)

    Google Scholar 

  28. Yu, H.-F., et al.: Parallel matrix factorization for recommender systems. Knowl. Inf. Syst. 41(3), 793–819 (2014)

    Article  Google Scholar 

  29. Yu, H.F., Hsieh, C.J., et al.: Scalable coordinate descent approaches to parallel matrix factorization for recommender systems. In: Proceedings of the IEEE 12th International Conference on Data Mining, pp. 765–774 (2012)

    Google Scholar 

  30. Yun, H., Yu, H.-F., Hsieh, C.-J., Vishwanathan, S.V.N., Dhillon, I.: NOMAD: non-locking, stochastic multi-machine algorithm for asynchronous and decentralized matrix completion. Proc. VLDB Endow. 7(11), 975–986 (2014)

    Article  Google Scholar 

  31. Yang, X., et al.: High performance coordinate descent matrix factorization for recommender systems. In: Proceedings of the Computing Frontiers Conference. ACM (2017)

    Google Scholar 

  32. Zadeh, R., et al.: Matrix completion via alternating least square (ALS). In: CME 323 Lecture Notes, Stanford University, Spring (2016)

    Google Scholar 

  33. Tan, W., Cao, L., Fong, L.: Faster and cheaper: parallelizing large-scale matrix factorization on GPUs. In: Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing, HPDC 2016 (2016)

    Google Scholar 

  34. Aberger, C.R.: Recommender: An Analysis of Collaborative Filtering Techniques (2016)

    Google Scholar 

  35. Papamakarios, G.: Comparison of Modern Stochastic Optimization Algorithms (2014)

    Google Scholar 

  36. Toulis, P., Airoldi, E., Rennie, J.: Statistical analysis of stochastic gradient methods for generalized linear models. In: International Conference on Machine Learning, pp. 667–675 (2014)

    Google Scholar 

  37. Toulis, P., Tran, D., Airoldi, E.: Towards stability and optimality in stochastic gradient descent. In: Artificial Intelligence and Statistics, pp. 1290–1298 (2016)

    Google Scholar 

  38. Zhou, Y., Wilkinson, D., et al.: Large-scale parallel collaborative filtering for the Netflix prize. In: Proceedings of International Conference on Algorithmic Aspects in Information and Management (2008)

    Google Scholar 

  39. Xie, X., Tan, W., Fong, L.L., Liang, Y.: Cumf_sgd: fast and scalable matrix factorization (2016). arXiv preprint arXiv:1610.05838. https://github.com/cuMF/cumf_sgd

  40. Tang, K.: Collaborative filtering with batch stochastic gradient descent, July 2015. http://www.its.caltech.edu/~ktang/CS179/index.html

  41. Niu, F., et al.: HOGWILD!: a lock-free approach to parallelizing stochastic gradient descent. In: Advances in Neural Information Processing Systems, pp. 693–701, June 2011

    Google Scholar 

  42. Gemulla, R., et al.: Large-scale matrix factorization with distributed stochastic gradient descent. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 69–77 (2011)

    Google Scholar 

  43. Zhang, H., Hsieh, C.-J., Akella, V.: Hogwild++: a new mechanism for decentralized asynchronous stochastic gradient descent. In: 2016 IEEE 16th International Conference on Data Mining (ICDM), pp. 629–638. IEEE (2016)

    Google Scholar 

  44. Zhang, C., Ré, C.: Dimmwitted: a study of main-memory statistical analytics. Proc. VLDB Endow. 7(12), 1283–1294 (2014)

    Article  Google Scholar 

  45. Udell, M., et al.: Generalized low rank models. Found. Trends Mach. Learn. 9(1), 1–118 (2016)

    Article  Google Scholar 

  46. CUDA C Programming Guide. http://docs.nvidia.com/cuda/cuda-c-programming-guide/#axzz4FH9nydq8. Accessed 5 Sept 2016

  47. Nunna, K.C., et al.: A survey on big data processing infrastructure: evolving role of FPGA. Int. J. Big Data Intell. 2(3), 145–156 (2015)

    Google Scholar 

  48. Nassar, M.A., El-Sayed, L.A.A.: Radix-4 modified interleaved modular multiplier based on sign detection. In: International Conference on Computer Science and Information Technology, pp. 413–423. Springer, Berlin (2012)

    Google Scholar 

  49. Nassar, M.A., El-Sayed, L.A.A.: Efficient interleaved modular multiplication based on sign detection. In: IEEE/ACS 12th International Conference of Computer Systems and Applications (AICCSA), November 2015

    Google Scholar 

  50. Karydi, E., et al.: Parallel and distributed collaborative filtering: a survey. J. ACM Comput. Surv. 49(2), 37 (2016)

    Google Scholar 

  51. Ma, X., Wang, C., Yu, Q., Li, X., Zhou, X.: A FPGA-based accelerator for neighborhood-based collaborative filtering recommendation algorithms. In: 2015 IEEE International Conference on Cluster Computing (CLUSTER), pp. 494–495. IEEE (2015)

    Google Scholar 

  52. http://www.nvidia.com/object/tesla-k80.html. Accessed 22 July 2017

  53. Lathia, N.: Evaluating collaborative filtering over time. Ph.D. thesis (2010)

    Google Scholar 

  54. Sparse Matrix. https://en.wikipedia.org/wiki/Sparse_matrix#Storing_a_sparse_matrix. Accessed 12 Feb 2017

  55. http://supercomputingblog.com/cuda/cudamemoryandcachearchitecture/. Accessed 26 June 2017

  56. GPU memory types – performance comparison. https://www.microway.com/hpc-tech-tips/gpu-memory-types. Accessed 5 Sept 2015

  57. Pankratius, V., et al.: Fundamentals of Multicore Software Development. CRC Press, Boca Raton (2011)

    Book  Google Scholar 

  58. del Mundo, C., Feng, W.: Enabling efficient intra-warp communication for fourier transforms in a many-core architecture. In: Proceedings of the 2013 ACM/IEEE International Conference on Supercomputing (2013)

    Google Scholar 

  59. Han, T.D., Abdelrahman, T.S.: Reducing branch divergence in GPU programs. In: Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units, p. 3. ACM (2011)

    Google Scholar 

  60. Harper, F.M., Konstan, J.A.: The MovieLens datasets: history and context. ACM Trans. Interact. Intell. Syst. 5(4), 19 (2016)

    Article  Google Scholar 

  61. Gower, S.: Netflix prize and SVD, pp. 1–10. http://buzzard.ups.edu/courses/2014spring/420projects/math420-UPS-spring-2014-gower-netflix-SVD.pdf (2014)

  62. Bennett, J., Lanning, S.: The Netflix prize. In: Proceedings of KDD Cup and Workshop, p. 35 (2007)

    Google Scholar 

  63. Dror, G., Koenigstein, N., Koren, Y., Weimer, M.: The Yahoo! music dataset and KDD-Cup’11. In: Proceedings of KDD Cup 2011, pp. 3–18 (2012)

    Google Scholar 

  64. Zheng, L.: Performance evaluation of latent factor models for rating prediction. Ph.D. dissertation, University of Victoria (2015)

    Google Scholar 

  65. Low, Y., et al.: GraphLab: a new parallel framework for machine learning. In: Proceedings of the Twenty-Sixth Annual Conference on Uncertainty in Artificial Intelligence, UAI-10, pp. 340–349, July 2010

    Google Scholar 

  66. Chin, W.-S., et al.: A learning-rate schedule for stochastic gradient methods to matrix factorization. In: PAKDD, pp. 442–455 (2015)

    Google Scholar 

  67. https://hpc.bibalex.org/. Accessed July 2017

  68. https://slurm.schedmd.com/. Accessed July 2017

  69. Shani, G., Gunawardana, A.: Evaluating recommendation systems. In: Ricci, F., Rokach, L., Shapira, B., Kantor, P. (eds.) Recommender Systems Handbook, pp. 257–297. Springer, Boston (2011)

    Chapter  Google Scholar 

  70. Ginger, T., Bochkov, Y.: Predicting business ratings on yelp report (2015). http://cs229.stanford.edu/proj2015/013_report.pdf

  71. Hwu, W.: Efficient host-device data transfer. In: Lecture Notes, University of Illinois at Urbana-Champaign, December 2014

    Google Scholar 

  72. Bhatnagar, A.: Accelerating a movie recommender system using VirtualCL on a heterogeneous GPU cluster. Master thesis, July 2015

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohamed A. Nassar .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Nassar, M.A., El-Sayed, L.A.A., Taha, Y. (2019). GPU_MF_SGD: A Novel GPU-Based Stochastic Gradient Descent Method for Matrix Factorization. In: Arai, K., Kapoor, S., Bhatia, R. (eds) Advances in Information and Communication Networks. FICC 2018. Advances in Intelligent Systems and Computing, vol 887. Springer, Cham. https://doi.org/10.1007/978-3-030-03405-4_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-03405-4_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-03404-7

  • Online ISBN: 978-3-030-03405-4

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics