Skip to main content

Amenable Sparse Network Investigator

  • Conference paper
  • First Online:
Intelligent Computing (SAI 2023)

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 711))

Included in the following conference series:

Abstract

We present “Amenable Sparse Network Investigator" (ASNI) algorithm that utilizes a novel pruning strategy based on a sigmoid function that induces sparsity level globally over the course of one single round of training. The ASNI algorithm fulfills both tasks that current state-of-the-art strategies can only do one of them. The ASNI algorithm has two subalgorithms: 1) ASNI-I, 2) ASNI-II. ASNI-I learns an accurate sparse off-the-shelf network only in one single round of training. ASNI-II learns a sparse network and an initialization that is quantized, compressed, and from which the sparse network is trainable. The learned initialization is quantized since only two numbers are learned for initialization of nonzero parameters in each layer L. Thus, quantization levels for the initialization of the entire network is 2L. Also, the learned initialization is compressed because it is a set consisting of 2L numbers. The special sparse network that can be trained from such a quantized and compressed initialization is called amenable. For example, in order to initialize more than 25 million parameters of an amenable ResNet-50, only 2\(\,\times \,\)54 numbers are needed. To the best of our knowledge, there is no other algorithm that can learn a quantized and compressed initialization from which the network is still trainable and is able to solve both pruning tasks. Our numerical experiments show that there is a quantized and compressed initialization from which the learned sparse network can be trained and reach to an accuracy on a par with the dense version. This is one step ahead towards learning an ideal network that is sparse and quantized in a very few levels of quantization. We experimentally show that these 2L levels of quantization are concentration points of parameters in each layer of the learned sparse network by ASNI-I. In other words, we show experimentally that for each layer of a deep neural network (DNN) there are two distinct normal-like distributions whose means can be used for initialization of an amenable network. To corroborate the above, we have performed a series of experiments utilizing networks such as ResNets, VGG-style, small convolutional, and fully connected ones on ImageNet, CIFAR10, and MNIST datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 219.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 279.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    \(\ell _0\) is not mathematically a norm because for any norm \(\Vert \cdot \Vert \) and \(\alpha \in \mathbb {R}\), \(\Vert \alpha \boldsymbol{\theta } \Vert = \vert \alpha \vert \Vert \boldsymbol{\theta }\Vert \), while \(\Vert \alpha \boldsymbol{\theta } \Vert _0 = \vert \alpha \vert \Vert \boldsymbol{\theta }\Vert _0\) if and only if \(\vert \alpha \vert = 1\).

  2. 2.

    \(\boldsymbol{1}_{A}(x)=\{ 1\text { if } x\in A \text {,} 0 \text { if } x\notin A\}\).

References

  1. Achille, A., Rovere, M., Soatto, S.: Critical learning periods in deep networks. In: International Conference on Learning Representations (2018)

    Google Scholar 

  2. Damadi, S., Shen, J.: Convergence of the mini-batch SIHT algorithm. arXiv preprint arXiv:2209.14536 (2022)

  3. Davis, G., Mallat, S.: Adaptive nonlinear approximations. PhD thesis, New York University, Graduate School of Arts and Science (1994)

    Google Scholar 

  4. Denil, M., Shakibi, B., Dinh, L., Ranzato, M., De Freitas, N.: Predicting parameters in deep learning. In: Advances in Neural Infomation Processing Systems, pp. 2148–2156 (2013)

    Google Scholar 

  5. Dettmers, T., Zettlemoyer, L.: Sparse networks from scratch: faster training without losing performance. arXiv preprint arXiv:1907.04840 (2019)

  6. Evci, U., Gale T., Menick, J., Samuel Castro, P., Elsen, E.: Rigging the lottery: making all tickets winners. In: International Conference on Machine Learning, pp. 2943–2952. PMLR (2020)

    Google Scholar 

  7. Frankle, J., Carbin, M.: The lottery ticket hypothesis: finding sparse, trainable neural networks. arXiv preprint arXiv:1803.03635 (2018)

  8. Frankle, J., Dziugaite, G.K., Roy, D., Carbin, M.: Linear mode connectivity and the lottery ticket hypothesis. In: International Conference on Machine Learning, pp. 3259–3269. PMLR (2020)

    Google Scholar 

  9. Frankle, J., Dziugaite, G.K., Roy, D., Carbin, M.: Stabilizing the lottery ticket hypothesis. arXiv preprint arXiv:1903.01611 (2019)

  10. Frankle, J., Dziugaite, G.K., Roy, D., Carbin, M.: Pruning neural networks at initialization: why are we missing the mark? arXiv preprint arXiv:2009.08576 (2020)

  11. Gale, T., Elsen, E., Hooker, S.: The state of sparsity in deep neural networks. arXiv preprint arXiv:1902.09574 (2019)

  12. Ghadimi, S., Lan, G.: Stochastic first-and zeroth-order methods for nonconvex stochastic programming. SIAM J. Optim. 23(4), 2341–2368 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  13. Guo, Y., Yao, A., Chen, Y.: Dynamic network surgery for efficient DNNs. arXiv preprint arXiv:1608.04493 (2016)

  14. Han, S., Mao, H., Dally, W.J.: Deep compression: compressing deep neural networks with pruning, trained quantization and Huffman coding. arXiv preprint arXiv:1510.00149 (2015)

  15. Han, S., Pool, J., Tran, J., Dally, W.: Learning both weights and connections for efficient neural network. In: Advances in Neural Infomation Processing Systems, pp. 1135–1143 (2015)

    Google Scholar 

  16. José Hanson, S., Pratt, L.Y.: Comparing biases for minimal network construction with back-propagation. In: Advances in Neural Infomation Processing Systems, pp. 177–185, 1989

    Google Scholar 

  17. Hassibi, B., Stork, D.G.: Second order derivatives for network pruning: optimal brain surgeon. In: Advances in Neural Infomation Processing Systems, pp. 164–171 (1993)

    Google Scholar 

  18. He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1026–1034 (2015)

    Google Scholar 

  19. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

    Google Scholar 

  20. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

  21. Krizhevsky, A., Nair, V., Hinton, G.: Cifar-10 and cifar-100 datasets. https://www.cs.toronto.edu/kriz/cifar.html6(1), 1 (2009)

  22. Kusupati, A., et al.: Soft threshold weight reparameterization for learnable sparsity. arXiv preprint arXiv:2002.03231 (2020)

  23. LeCun, Y.: The MNIST database of handwritten digits. http://yann.lecun.com/exdb/mnist/ (1998)

  24. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)

    Article  Google Scholar 

  25. LeCun, Y., Denker, J.S., Solla, S.A.: Optimal brain damage. In: Advances in Neural Information Processing Systems, pp. 598–605 (1990)

    Google Scholar 

  26. Lee, N., Ajanthan, T., Torr, P.H.S.: SNIP: single-shot network pruning based on connection sensitivity. arXiv preprint arXiv:1810.02340 (2018)

  27. Li, H., Kadav, A., Durdanovic, I., Samet, H., Graf, H.P.: Pruning filters for efficient convnets. arXiv preprint arXiv:1608.08710 (2016)

  28. Liu, Z., Sun, M., Zhou, T., Huang, G., Darrell, T.: Rethinking the value of network pruning. arXiv preprint arXiv:1810.05270 (2018)

  29. Louizos, C., Welling, M., Kingma, D.P.: Learning sparse neural networks through \( l_0 \) regularization. arXiv preprint arXiv:1712.01312 (2017)

  30. Luo, J.-H., Wu, J., Lin, W.: ThiNet: a filter level pruning method for deep neural network compression. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5058–5066 (2017)

    Google Scholar 

  31. Mocanu, D.C., Mocanu, E., Stone, P., Nguyen, P.H., Gibescu, M., Liotta, A.: Scalable training of artificial neural networks with adaptive sparse connectivity inspired by network science. Nat. Commun. 9(1), 1–12 (2018)

    Google Scholar 

  32. Molchanov, P., Tyree, S., Karras, T., Aila, T., Kautz, J.: Pruning convolutional neural networks for resource efficient inference. arXiv preprint arXiv:1611.06440 (2016)

  33. Narang, S., Elsen, E., Diamos, G., Sengupta, S.: Exploring sparsity in recurrent neural networks. arXiv preprint arXiv:1704.05119 (2017)

  34. Balas Kausik Natarajan: Sparse approximate solutions to linear systems. SIAM J. Comput. 24(2), 227–234 (1995)

    Article  MathSciNet  Google Scholar 

  35. Nesterov, Y.E.: A method for solving the convex programming problem with convergence rate o (1/k2). In: Dokl. akad. nauk Sssr, vol. 269, pp. 543–547 (1983)

    Google Scholar 

  36. Paszke, A., et al.: PyTorch: an imperative style, high-performance deep learning library. arXiv preprint arXiv:1912.01703 (2019)

  37. Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Stat., 400–407 (1951)

    Google Scholar 

  38. Russakovsky, O., Deng, J., Hao, S., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)

    Article  MathSciNet  Google Scholar 

  39. Savarese, P., Silva, H., Maire, M.: Winning the lottery with continuous sparsification. arXiv preprint arXiv:1912.04427 (2019)

  40. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)

  41. Tanaka, H., Kunin, D., Yamins, D.L.K., Ganguli, S.: Pruning neural networks without any data by iteratively conserving synaptic flow. arXiv preprint arXiv:2006.05467 (2020)

  42. Tartaglione, E., Lepsøy, S., Fiandrotti, A., Francini, G.: Learning sparse neural networks via sensitivity-driven regularization. arXiv preprint arXiv:1810.11764 (2018)

  43. Wang, C., Zhang, G., Grosse, R.: Picking winning tickets before training by preserving gradient flow. arXiv preprint arXiv:2002.07376 (2020)

  44. Wen, W., Wu, C., Wang, Y., Chen, Y., Li, H.: Learning structured sparsity in deep neural networks. arXiv preprint arXiv:1608.03665 (2016)

  45. You, H., et al.: Drawing early-bird tickets: towards more efficient training of deep networks. arXiv preprint arXiv:1909.11957 (2019)

  46. Zhu, M., Gupta, S.: To prune, or not to prune: exploring the efficacy of pruning for model compression. arXiv preprint arXiv:1710.01878 (2017)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Saeed Damadi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Damadi, S., nouri, E., Pirsiavash, H. (2023). Amenable Sparse Network Investigator. In: Arai, K. (eds) Intelligent Computing. SAI 2023. Lecture Notes in Networks and Systems, vol 711. Springer, Cham. https://doi.org/10.1007/978-3-031-37717-4_27

Download citation

Publish with us

Policies and ethics