Download PDFOpen PDF in browser

A Framework for Word Segmentation in Images using Density-based Clustering

10 pagesPublished: March 9, 2020

Abstract

Word recognition is to identify words in images of printed or handwritten documents. It is especially challenging to recognize words from cursive handwriting documents. In this paper, we present a framework of using density-based clustering for word segmentation in printed or handwritten documents, including cursive handwriting. First, we performed various strategies for data preprocessing, including converting images to B/W images, adjusting the tilted images, and removing the background noises. K-means clustering and/or neighborhood density are used in finding parameters for the preprocessing steps. The preprocessing has shown to be very effective. For the word segmentation, we proposed density-based clustering to segment the words using multiple steps, including blurring, plotting, and clustering. We also developed a system for the framework, including preprocessing and clustering functionalities. Our approach works very well for printed documents. It works reasonably well for handwriting documents if words are relatively far from each other. The performance on handwriting documents can be further improved by using line segmentation.

Keyphrases: density based clustering, handwriting recognition, word segmentation

In: Gordon Lee and Ying Jin (editors). Proceedings of 35th International Conference on Computers and Their Applications, vol 69, pages 187-196.

BibTeX entry
@inproceedings{CATA2020:Framework_Word_Segmentation_Images,
  author    = {Hui Guo and Qin Ding},
  title     = {A Framework for Word Segmentation in Images using Density-based Clustering},
  booktitle = {Proceedings of 35th International Conference on Computers and Their Applications},
  editor    = {Gordon Lee and Ying Jin},
  series    = {EPiC Series in Computing},
  volume    = {69},
  publisher = {EasyChair},
  bibsource = {EasyChair, https://easychair.org},
  issn      = {2398-7340},
  url       = {/publications/paper/JsFJ},
  doi       = {10.29007/hq3n},
  pages     = {187-196},
  year      = {2020}}
Download PDFOpen PDF in browser