TY - GEN
T1 - Text detection and recognition in real world images
AU - Saabni, Raid
AU - Zwilling, Moti
PY - 2012
Y1 - 2012
N2 - Detecting and recognizing texts in real world images such as sign boards and advertisements is an important part of computer vision applications. The complexity of the problem comes out of many factors such as nonuniform background, different languages and fonts, and non consistent text alignment and orientation. In this paper, we present a novel approach to detect characters and words in real-world images. The presented approach decompose the gray level image into sequence of images, each one includes pixels with gray level values from different disjoint ranges. This decomposition enables extracting connected components representing characters or other non textual objects separated from their neighborhood background. An interpolation of two classes of features translated to histograms is used by a support vector machine to classify and collect the textual objects generating the textual zones. The Shape Context Descriptor [1], is used by the Earth Movers Distance(EMD) method to recognize the characters within the image. The recognized characters are fed to heuristic rule based system to determine words and give final results. To optimize the speed of the system, we follow the embedding of the EMD metric presented in [22] to a normed space to enable fast approximation of the κ-Nearest Neighbors using Local Sensitivity Hashing functions(LSH). Experiments show that our algorithm can detect and recognize text regions from the ICDAR 2005 datasets [17] with high rates.
AB - Detecting and recognizing texts in real world images such as sign boards and advertisements is an important part of computer vision applications. The complexity of the problem comes out of many factors such as nonuniform background, different languages and fonts, and non consistent text alignment and orientation. In this paper, we present a novel approach to detect characters and words in real-world images. The presented approach decompose the gray level image into sequence of images, each one includes pixels with gray level values from different disjoint ranges. This decomposition enables extracting connected components representing characters or other non textual objects separated from their neighborhood background. An interpolation of two classes of features translated to histograms is used by a support vector machine to classify and collect the textual objects generating the textual zones. The Shape Context Descriptor [1], is used by the Earth Movers Distance(EMD) method to recognize the characters within the image. The recognized characters are fed to heuristic rule based system to determine words and give final results. To optimize the speed of the system, we follow the embedding of the EMD metric presented in [22] to a normed space to enable fast approximation of the κ-Nearest Neighbors using Local Sensitivity Hashing functions(LSH). Experiments show that our algorithm can detect and recognize text regions from the ICDAR 2005 datasets [17] with high rates.
KW - Earth movers distance
KW - Embedding
KW - Local sensitivity hashing
KW - Text detection
KW - Word searching
KW - κ-nearest neighbor
UR - http://www.scopus.com/inward/record.url?scp=84874258410&partnerID=8YFLogxK
U2 - 10.1109/ICFHR.2012.279
DO - 10.1109/ICFHR.2012.279
M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???
AN - SCOPUS:84874258410
SN - 9780769547749
T3 - Proceedings - International Workshop on Frontiers in Handwriting Recognition, IWFHR
SP - 443
EP - 448
BT - Proceedings - 13th International Conference on Frontiers in Handwriting Recognition, ICFHR 2012
T2 - 13th International Conference on Frontiers in Handwriting Recognition, ICFHR 2012
Y2 - 18 September 2012 through 20 September 2012
ER -