Download PDFOpen PDF in browserA Framework for Word Segmentation in Images using Density-based Clustering10 pages•Published: March 9, 2020AbstractWord recognition is to identify words in images of printed or handwritten documents. It is especially challenging to recognize words from cursive handwriting documents. In this paper, we present a framework of using density-based clustering for word segmentation in printed or handwritten documents, including cursive handwriting. First, we performed various strategies for data preprocessing, including converting images to B/W images, adjusting the tilted images, and removing the background noises. K-means clustering and/or neighborhood density are used in finding parameters for the preprocessing steps. The preprocessing has shown to be very effective. For the word segmentation, we proposed density-based clustering to segment the words using multiple steps, including blurring, plotting, and clustering. We also developed a system for the framework, including preprocessing and clustering functionalities. Our approach works very well for printed documents. It works reasonably well for handwriting documents if words are relatively far from each other. The performance on handwriting documents can be further improved by using line segmentation.Keyphrases: density based clustering, handwriting recognition, word segmentation In: Gordon Lee and Ying Jin (editors). Proceedings of 35th International Conference on Computers and Their Applications, vol 69, pages 187-196.
|