Abstract
Recommender systems are used in most of nowadays applications. Providing real-time suggestions with high accuracy is considered as one of the most crucial challenges that face them. Matrix factorization (MF) is an effective technique for recommender systems as it improves the accuracy. Stochastic Gradient Descent (SGD) for MF is the most popular approach used to speed up MF. SGD is a sequential algorithm, which is not trivial to be parallelized, especially for large-scale problems. Recently, many researches have proposed parallel methods for parallelizing SGD. In this research, we propose GPU_MF_SGD, a novel GPU-based method for large-scale recommender systems. GPU_MF_SGD utilizes Graphics Processing Unit (GPU) resources by ensuring load balancing and linear scalability, and achieving coalesced access of global memory without preprocessing phase. Our method demonstrates 3.1X–5.4X speedup over the most state-of-the-art GPU method, CuMF_SGD.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Ricci, F., et al.: Recommender Systems Handbook. Springer, New York (2011)
Ekstrand, M.D., et al.: Collaborative filtering recommender systems. Found. Trends Hum. Comput. Interact. 4(2), 81–173 (2011)
Poriya, A., et al.: Non-personalized recommender systems and user-based collaborative recommender systems. Int. J. Appl. Inf. Syst. 6(9), 22–27 (2014)
Aamir, M., Bhusry, M.: Recommendation system: state of the art approach. Int. J. Comput. Appl. 120, 25–32 (2015)
Recommender System. https://en.wikipedia.org/wiki/Recommender_system. Accessed 11 July 2017
Jin, J., et al.: GPUSGD: a GPU-accelerated stochastic gradient descent algorithm for matrix factorization. Concurr. Comput. Pract. Exp. 28, 3844–3865 (2016)
Xie, X., et al.: CuMF_SGD: parallelized stochastic gradient descent for matrix factorization on GPUs. In: Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing. ACM (2017)
Li, H., et al.: MSGD: a novel matrix factorization approach for large-scale collaborative filtering recommender systems on GPUs. IEEE Trans. Parallel Distrib. Syst. 29(7), 1530–1544 (2018)
Nassar, M.A., El-Sayed, L.A.A., Taha, Y.: Efficient parallel stochastic gradient descent for matrix factorization using GPU. In: 2016 11th International Conference for Internet Technology and Secured Transactions (ICITST). IEEE (2016)
Wen, Z.: Recommendation system based on collaborative filtering. In: CS229 Lecture Notes, Stanford University, December 2008
Leskovec, J., et al.: Mining of Massive Datasets, Chap. 9, pp. 307–340. Cambridge University Press, Cambridge (2014)
Koren, Y., Bell, R., Volinsky, C.: Matrix factorization techniques for recommender systems. Computer 42(8), 30–37 (2009)
Kaleem, R., et al.: Stochastic gradient descent on GPUs. In: Proceedings of the 8th Workshop on General Purpose Processing Using GPUs, pp. 81–89 (2015)
Konstan, J.A., Riedl, J.: Recommender systems: from algorithms to user experience. User Model. User Adap. Inter. 22(1), 101–123 (2012)
Anastasiu, D.C., et al.: Big Data and Recommender Systems (2016)
Melville, P., Sindhwani, V.: Recommender systems. In: Sammut, C., Webb, G.I. (eds.) Encyclopedia of Machine Learning, pp. 829–838. Springer, New York (2011)
Kant, V., Bharadwaj, K.K.: Enhancing recommendation quality of content-based filtering through collaborative predictions and fuzzy similarity measures. J. Proc. Eng. 38, 939–944 (2012)
Ma, A., et al.: A FPGA-based accelerator for neighborhood-based collaborative filtering recommendation algorithms. In: Proceedings of IEEE International Conference on Cluster Computing, pp. 494–495, September 2015
Anthony, V., Ayala, A., et al.: Speeding up collaborative filtering with parametrized preprocessing. In: Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Sydney, Australia, August 2015
Gates, M., et al.: Accelerating collaborative filtering using concepts from high performance computing. In: IEEE International Conference in Big Data (Big Data) (2015)
Wang, Z., et al.: A CUDA-enabled parallel implementation of collaborative filtering. Proc. Comput. Sci. 30, 66–74 (2014)
Gemulla, R., Nijkamp, E., Haas, P.J., Sismanis, Y.: Large-scale matrix factorization with distributed stochastic gradient descent. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM (2011)
Chin, W.-S., et al.: A fast parallel stochastic gradient method for matrix factorization in shared memory systems. ACM Trans. Intell. Syst. Technol. 6(1), 2 (2015)
Zastrau, D., Edelkamp, S.: Stochastic gradient descent with GPGPU. In: Proceedings of the 35th Annual German Conference on Advances in Artificial Intelligence (KI’12), pp. 193–204 (2012)
Shah, A., Majumdar, A.: Accelerating low-rank matrix completion on GPUs. In: Proceedings of International Conference on Advances in Computing, Communications and Informatics, December 2014
Kato, K., Hosino, T.: Singular value decomposition for collaborative filtering on a GPU. IOP Conf. Ser. Mater. Sci. Eng. 10(1), 012017 (2010)
Foster, B., et al.: A GPU-based approximate SVD algorithm. In: Proceedings of the 9th International Conference on Parallel Processing and Applied Mathematics, vol. 1, pp. 569–578. Springer, Berlin (2012)
Yu, H.-F., et al.: Parallel matrix factorization for recommender systems. Knowl. Inf. Syst. 41(3), 793–819 (2014)
Yu, H.F., Hsieh, C.J., et al.: Scalable coordinate descent approaches to parallel matrix factorization for recommender systems. In: Proceedings of the IEEE 12th International Conference on Data Mining, pp. 765–774 (2012)
Yun, H., Yu, H.-F., Hsieh, C.-J., Vishwanathan, S.V.N., Dhillon, I.: NOMAD: non-locking, stochastic multi-machine algorithm for asynchronous and decentralized matrix completion. Proc. VLDB Endow. 7(11), 975–986 (2014)
Yang, X., et al.: High performance coordinate descent matrix factorization for recommender systems. In: Proceedings of the Computing Frontiers Conference. ACM (2017)
Zadeh, R., et al.: Matrix completion via alternating least square (ALS). In: CME 323 Lecture Notes, Stanford University, Spring (2016)
Tan, W., Cao, L., Fong, L.: Faster and cheaper: parallelizing large-scale matrix factorization on GPUs. In: Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing, HPDC 2016 (2016)
Aberger, C.R.: Recommender: An Analysis of Collaborative Filtering Techniques (2016)
Papamakarios, G.: Comparison of Modern Stochastic Optimization Algorithms (2014)
Toulis, P., Airoldi, E., Rennie, J.: Statistical analysis of stochastic gradient methods for generalized linear models. In: International Conference on Machine Learning, pp. 667–675 (2014)
Toulis, P., Tran, D., Airoldi, E.: Towards stability and optimality in stochastic gradient descent. In: Artificial Intelligence and Statistics, pp. 1290–1298 (2016)
Zhou, Y., Wilkinson, D., et al.: Large-scale parallel collaborative filtering for the Netflix prize. In: Proceedings of International Conference on Algorithmic Aspects in Information and Management (2008)
Xie, X., Tan, W., Fong, L.L., Liang, Y.: Cumf_sgd: fast and scalable matrix factorization (2016). arXiv preprint arXiv:1610.05838. https://github.com/cuMF/cumf_sgd
Tang, K.: Collaborative filtering with batch stochastic gradient descent, July 2015. http://www.its.caltech.edu/~ktang/CS179/index.html
Niu, F., et al.: HOGWILD!: a lock-free approach to parallelizing stochastic gradient descent. In: Advances in Neural Information Processing Systems, pp. 693–701, June 2011
Gemulla, R., et al.: Large-scale matrix factorization with distributed stochastic gradient descent. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 69–77 (2011)
Zhang, H., Hsieh, C.-J., Akella, V.: Hogwild++: a new mechanism for decentralized asynchronous stochastic gradient descent. In: 2016 IEEE 16th International Conference on Data Mining (ICDM), pp. 629–638. IEEE (2016)
Zhang, C., Ré, C.: Dimmwitted: a study of main-memory statistical analytics. Proc. VLDB Endow. 7(12), 1283–1294 (2014)
Udell, M., et al.: Generalized low rank models. Found. Trends Mach. Learn. 9(1), 1–118 (2016)
CUDA C Programming Guide. http://docs.nvidia.com/cuda/cuda-c-programming-guide/#axzz4FH9nydq8. Accessed 5 Sept 2016
Nunna, K.C., et al.: A survey on big data processing infrastructure: evolving role of FPGA. Int. J. Big Data Intell. 2(3), 145–156 (2015)
Nassar, M.A., El-Sayed, L.A.A.: Radix-4 modified interleaved modular multiplier based on sign detection. In: International Conference on Computer Science and Information Technology, pp. 413–423. Springer, Berlin (2012)
Nassar, M.A., El-Sayed, L.A.A.: Efficient interleaved modular multiplication based on sign detection. In: IEEE/ACS 12th International Conference of Computer Systems and Applications (AICCSA), November 2015
Karydi, E., et al.: Parallel and distributed collaborative filtering: a survey. J. ACM Comput. Surv. 49(2), 37 (2016)
Ma, X., Wang, C., Yu, Q., Li, X., Zhou, X.: A FPGA-based accelerator for neighborhood-based collaborative filtering recommendation algorithms. In: 2015 IEEE International Conference on Cluster Computing (CLUSTER), pp. 494–495. IEEE (2015)
http://www.nvidia.com/object/tesla-k80.html. Accessed 22 July 2017
Lathia, N.: Evaluating collaborative filtering over time. Ph.D. thesis (2010)
Sparse Matrix. https://en.wikipedia.org/wiki/Sparse_matrix#Storing_a_sparse_matrix. Accessed 12 Feb 2017
http://supercomputingblog.com/cuda/cudamemoryandcachearchitecture/. Accessed 26 June 2017
GPU memory types – performance comparison. https://www.microway.com/hpc-tech-tips/gpu-memory-types. Accessed 5 Sept 2015
Pankratius, V., et al.: Fundamentals of Multicore Software Development. CRC Press, Boca Raton (2011)
del Mundo, C., Feng, W.: Enabling efficient intra-warp communication for fourier transforms in a many-core architecture. In: Proceedings of the 2013 ACM/IEEE International Conference on Supercomputing (2013)
Han, T.D., Abdelrahman, T.S.: Reducing branch divergence in GPU programs. In: Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units, p. 3. ACM (2011)
Harper, F.M., Konstan, J.A.: The MovieLens datasets: history and context. ACM Trans. Interact. Intell. Syst. 5(4), 19 (2016)
Gower, S.: Netflix prize and SVD, pp. 1–10. http://buzzard.ups.edu/courses/2014spring/420projects/math420-UPS-spring-2014-gower-netflix-SVD.pdf (2014)
Bennett, J., Lanning, S.: The Netflix prize. In: Proceedings of KDD Cup and Workshop, p. 35 (2007)
Dror, G., Koenigstein, N., Koren, Y., Weimer, M.: The Yahoo! music dataset and KDD-Cup’11. In: Proceedings of KDD Cup 2011, pp. 3–18 (2012)
Zheng, L.: Performance evaluation of latent factor models for rating prediction. Ph.D. dissertation, University of Victoria (2015)
Low, Y., et al.: GraphLab: a new parallel framework for machine learning. In: Proceedings of the Twenty-Sixth Annual Conference on Uncertainty in Artificial Intelligence, UAI-10, pp. 340–349, July 2010
Chin, W.-S., et al.: A learning-rate schedule for stochastic gradient methods to matrix factorization. In: PAKDD, pp. 442–455 (2015)
https://hpc.bibalex.org/. Accessed July 2017
https://slurm.schedmd.com/. Accessed July 2017
Shani, G., Gunawardana, A.: Evaluating recommendation systems. In: Ricci, F., Rokach, L., Shapira, B., Kantor, P. (eds.) Recommender Systems Handbook, pp. 257–297. Springer, Boston (2011)
Ginger, T., Bochkov, Y.: Predicting business ratings on yelp report (2015). http://cs229.stanford.edu/proj2015/013_report.pdf
Hwu, W.: Efficient host-device data transfer. In: Lecture Notes, University of Illinois at Urbana-Champaign, December 2014
Bhatnagar, A.: Accelerating a movie recommender system using VirtualCL on a heterogeneous GPU cluster. Master thesis, July 2015
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Nassar, M.A., El-Sayed, L.A.A., Taha, Y. (2019). GPU_MF_SGD: A Novel GPU-Based Stochastic Gradient Descent Method for Matrix Factorization. In: Arai, K., Kapoor, S., Bhatia, R. (eds) Advances in Information and Communication Networks. FICC 2018. Advances in Intelligent Systems and Computing, vol 887. Springer, Cham. https://doi.org/10.1007/978-3-030-03405-4_18
Download citation
DOI: https://doi.org/10.1007/978-3-030-03405-4_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-03404-7
Online ISBN: 978-3-030-03405-4
eBook Packages: EngineeringEngineering (R0)