© The Institution of Engineering and Technology
Proper partitioning of feature space into text and background regions is very important in document image binarisation. This study presents an iterative classification algorithm that efficiently partitions a two-dimensional feature space into text and background regions. It uses the result of Niblack's binarisation algorithm as training data and employs its characteristics to define classification rules. In each iteration, it labels only some points of the feature space, which can be classified reliably and leaves the classification of other points to the next iterations. The classification result of a point in current iteration affects the classification of its neighbours in the next iterations and makes them more probable to be classified correctly. After a few iterations, it partitions the feature space into two regions associated with the text and background pixels. After partitioning, two global thresholding methods were used as an extra text class refinement to make the proposed algorithm robust against bleeding-through and shadow-through degradations. Finally, each pixel is labelled as either text or background according to its corresponding region in the feature space. The authors’ binarisation algorithm demonstrated superior performance against six well-known algorithms on three datasets. It is appropriate for various types of degraded images.
References
-
-
1)
-
http://www.mediateam.oulu.fi/MTDB/.
-
2)
-
Y. Yang ,
H. Yan
.
An adaptive logical method for binarisation of degraded document images.
Pattern Recognit.
,
787 -
807
-
3)
-
H.H. Oh ,
K.T. Lim ,
S.I. Hien
.
An improved binarization algorithm based on a water flow model for document image with inhomogeneous backgrounds.
Pattern Recognit.
,
2612 -
2625
-
4)
-
N. Otsu
.
A threshold selection method from gray-level histograms.
IEEE Trans. Syst. Man Cyber.
,
62 -
66
-
5)
-
Badekas, E., Papamarkos, N.: `Automatic evaluation of document binarization results', Proc. Tenth Iberoamerican Congress on Pattern Recognition, 2005, p. 1005–1014.
-
6)
-
Su, B., Lu, S., Tan, C.L.: `A self-training learning document binarization framework', 20thInt. Conf. on Pattern Recognition, 2010, p. 3187–3190.
-
7)
-
http://users.iit.demokritos.gr/~bgat/DIBCO2009/benchmark\.
-
8)
-
M. Kamel ,
A. Zhao
.
Extraction of binary character/graphics images from grayscale document images.
CVGIP, Graph. Models Image Process.
,
3 ,
203 -
217
-
9)
-
X. Ye ,
M. Cheriet ,
C.Y. Suen
.
Stroke-model-based character extraction from gray-level document images.
IEEE Trans. Image Process.
,
1152 -
1161
-
10)
-
C.H. Chou ,
W.H. Lin ,
F. Chang
.
A binarization method with learning-built rules for document images produced by cameras.
Pattern Recognit.
,
1518 -
1530
-
11)
-
A. Dawoud ,
M.S. Kamel
.
Iterative multimodel subimage binarization for handwritten character segmentation.
IEEE Trans. Image Process.
,
1223 -
1230
-
12)
-
W. Niblack
.
(1986)
An introduction to digital image processing.
-
13)
-
Lu, S., Tan, C.L.: `Binarization of badly illuminated document images through shading estimation and compensation', Nineth Int. Conf. on Document Analysis and Recognition, 2007, Brazil, p. 312–316.
-
14)
-
J.R. Parker
.
Gray level thresholding in badly illuminated images.
IEEE Trans. Pattern Anal. Mach. Intell.
,
8 ,
813 -
819
-
15)
-
J. Sauvola ,
M. Pietikäinen
.
Adaptive document image binarization.
Pattern Recognit.
,
2 ,
225 -
236
-
16)
-
I. Kim ,
D. Jung ,
R. Park
.
Document image binarization based on topographic analysis using a water flow model.
Pattern Recognit.
,
1 ,
265 -
277
-
17)
-
Y. Liu ,
S.N. Srihari
.
Document image binarization based on texture features.
IEEE Trans. Pattern Anal. Mach. Intell.
,
5 ,
540 -
544
-
18)
-
Y. Chen ,
G. Leedham
.
Decompose algorithm for thresholding degraded historical document images.
IEE Proc. Vis. Image Signal Process.
,
702 -
714
-
19)
-
Gatos, B., Ntirogiannis, K., Pratikakis, I.: `ICDAR 2009 document image binarization contest (DIBCO 2009)', Tenth Int. Conf. on Document Analysis and Recognition, 2009, p. 1375–1382.
-
20)
-
Bernsen, J.: `Dynamic thresholding of grey-level images', Proc. Eighth Int. Conf. on Pattern Recognition, 1986, Paris, France, p. 1251–1255.
-
21)
-
J.M. White ,
G.D. Rohrer
.
Imager segmentation for optical character recognition and other applications requiring character image extraction.
IBM J. Res. Dev.
,
400 -
411
-
22)
-
Y. Solihin ,
C.G. Leedham
.
Integral ratio: a new class of global thresholding techniques for handwriting images.
IEEE Trans. Pattern Anal. Mach. Intell.
,
8 ,
761 -
768
-
23)
-
B. Gatos ,
I. Pratikakis ,
S.J. Perantonis
.
Adaptive degraded document image binarization.
Pattern Recognit.
,
3 ,
317 -
327
-
24)
-
S. Lu ,
B. Su ,
C.L. Tan
.
Document image binarization using background estimation and stroke edges.
Int. J. Doc. Anal. Recognit.
,
303 -
314
-
25)
-
J.N. Kapur ,
P.K. Sahoo ,
A.K.C. Wong
.
A new method for gray-level picture thresholding using the entropy of the histogram.
Comput. Vis. Graph. Image Process.
,
273 -
285
-
26)
-
J.S. Weszka ,
A. Rosenfield
.
Histogram modification for threshold selection.
IEEE Trans. Syst. Man Cybern.
,
38 -
52
-
27)
-
S. Huang ,
M. Ahmadi ,
M.A. Sid-Ahmed
.
A hidden Markov model-based character extraction method.
Pattern Recognit.
,
2890 -
2900
-
28)
-
Wellner, P.D.: `Adaptive thresholding on the DigitalDesk', EPC-93–110, Technical, 1993.
-
29)
-
Shafait, F., Keysers, D., Breuel, T.M.: `Efficient implementation of local adaptive thresholding techniques using integral images', 15thDocument Recognition and Retrieval Conf., 2008, 6815.
http://iet.metastore.ingenta.com/content/journals/10.1049/iet-ipr.2011.0399
Related content
content/journals/10.1049/iet-ipr.2011.0399
pub_keyword,iet_inspecKeyword,pub_concept
6
6