Abstract
In Cursive languages like Urdu, segmentation of handwritten text lines is such a problem because of context sensitivity, diagonality of text etc. In this work, we presented a simple and robust line segmentation algorithm for Urdu handwritten and printed text. In the proposed line segmentation algorithm, modified header and baseline detection method are used. This technique purely depends on the counting pixels approach. Which efficiently segment Urdu handwritten and printed text lines along with skew detection. Handwritten and printed Urdu text dataset is manually generated for evaluating algorithm. Dataset consists of 80 pages having 687 handwritten Urdu text lines and printed dataset consist of 48 pages having 495 printed text lines. The algorithm performed significantly well on printed documents and handwritten Urdu text documents with well-separated lines and moderately well on a document containing overlapping words.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Ganai, A.F., Lone, F.R.: Character segmentation for Nastaleeq URDU OCR: a review. In: International Conference on Electrical, Electronics, and Optimization Techniques (ICEEOT). IEEE (2016)
Hussain, S., Ali, S.: Nastalique segmentation-based approach for Urdu OCR. Int. J. Doc. Anal. Recogn. (IJDAR) 18(4), 357–374 (2015)
Rehman, A., Saba, T.: Off-line cursive script recognition: current advances, comparisons and remaining problems. Artif. Intell. Rev. 37(4), 261–288 (2012)
Saba, T., Rehman, A., Sulong, G.: Cursive script segmentation with neural confidence. Int J. Innov. Comput. Inf. Control (IJICIC) 7(7), 1–10 (2011)
Palakollu, S., Dhir, R., Rani, R.: A new technique for line segmentation of handwritten hindi text. Spec. Issue Int. J. Comput. Appl. 0975–8887 (2011)
Amin, A.: Segmentation of printed Arabic text. In: International Conference on Advances in Pattern Recognition. Springer (2001)
Mandal, R., Manna, N.: Handwritten english character segmentation by baseline pixel burst method (BPBM). Adv. Model. Anal. B 57(1), 31–46 (2014)
Din, I.U., et al.: Line and ligature segmentation in printed Urdu document images. J. Appl. Environ. Biol. Sci. 6(3S), 114–120 (2016)
Naz, S., et al.: The optical character recognition of Urdu-like cursive scripts. Pattern Recogn. 47(3), 1229–1248 (2014)
Lehal, G.S.: Ligature segmentation for Urdu OCR. In: 2013 12th International Conference on Document Analysis and Recognition (ICDAR). IEEE (2013)
Adiguzel, H., Sahin, E., Duygulu, P.: A hybrid for line segmentation in handwritten documents. In: 2012 International Conference on Frontiers in Handwriting Recognition (ICFHR). IEEE (2012)
Javed, S.T., Hussain, S.: Segmentation based urdu nastalique OCR. In: Iberoamerican Congress on Pattern Recognition. Springer (2013)
Muaz, A.: Urdu optical character recognition system MS thesis. Diss. National University of Computer & Emerging Sciences
Rana, A., Lehal, G.S.: Offline Urdu OCR using ligature based segmentation for Nastaliq Script. Indian J. Sci. Technol. 8(35) (2015)
Saabni, R., Asi, A., El-Sana, J.: Text line extraction for historical document images. Pattern Recogn. Lett. 35, 23–33 (2014)
Brodić, D.: Text line segmentation with water flow algorithm based on power function. J. Electr. Eng. 66(3), 132–141 (2015)
Bal, A., Saha, R.: An improved method for handwritten document analysis using segmentation, baseline recognition and writing pressure detection. Procedia Comput. Sci. 93, 403–415 (2016)
Vishwas, H., Thomas, B.A., Naveena, C.: Text line segmentation of unconstrained handwritten kannada historical script documents. In: Proceedings of International Conference on Cognition and Recognition. Springer (2018)
Peng, G., et al.: Text line segmentation using Viterbi algorithm for the palm leaf manuscripts of Dai. In: 2016 International Conference on Audio, Language and Image Processing (ICALIP). IEEE (2016)
Pastor-Pellicer, J., et al.: Complete system for text line extraction using convolutional neural networks and watershed transform. In: 2016 12th IAPR Workshop on Document Analysis Systems (DAS). IEEE (2016)
Vo, Q.N., Lee, G.: Dense prediction for text line segmentation in handwritten document images. In: 2016 IEEE International Conference on Image Processing (ICIP). IEEE (2016)
Ateeq, T., et al.: Ensemble-classifiers-assisted detection of cerebral microbleeds in brain MRI. Comput. Electr. Eng. (2018)
Kalsoom, A., et al.: A dimensionality reduction-based efficient software fault prediction using Fisher linear discriminant analysis (FLDA). J. Supercomputing, 1–35 (2018)
Khan, S., et al.: Optimized gabor feature extraction for mass classification using cuckoo search for big data e-healthcare. J. Grid Comput. 1–16 (2018)
Nazir, F., et al.: Social media signal detection using tweets volume, hashtag, and sentiment analysis. Multimedia Tools and Appl. 1–34 (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Malik, S.A., Maqsood, M., Aadil, F., Khan, M.F. (2020). An Efficient Segmentation Technique for Urdu Optical Character Recognizer (OCR). In: Arai, K., Bhatia, R. (eds) Advances in Information and Communication. FICC 2019. Lecture Notes in Networks and Systems, vol 70. Springer, Cham. https://doi.org/10.1007/978-3-030-12385-7_11
Download citation
DOI: https://doi.org/10.1007/978-3-030-12385-7_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-12384-0
Online ISBN: 978-3-030-12385-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)