IET Computer Vision
Volume 9, Issue 1, February 2015
Volumes & issues:
Volume 9, Issue 1
February 2015
Image clustering using exponential discriminant analysis
- Author(s): Nasir Ahmed
- Source: IET Computer Vision, Volume 9, Issue 1, p. 1 –12
- DOI: 10.1049/iet-cvi.2013.0144
- Type: Article
- + Show details - Hide details
-
p.
1
–12
(12)
Local learning based image clustering models are usually employed to deal with images sampled from the non-linear manifold. Recently, linear discriminant analysis (LDA) based various clustering models were proposed. However, in these clustering models, regularisation parameter was added to handle the small-sample-size (SSS) problem of LDA for high-dimensional image data. Owing to this, the authors had to tune a number of clustering parameters for optimal clustering performance in existing local learning based clustering approaches. In this study, the less-parameterised local learning based image clustering model is proposed. The proposed local exponential discriminant clustering (LEDC) model is based on exponential discriminant analysis (EDA). In the LEDC model, local scatter matrices are projected in the exponential domain in order to handle the SSS problem of LDA without adding regularisation parameter. In the proposed LEDC model, k-nearest neighbours are the only clustering parameter as compared with existing local learning based clustering approaches such as normalised cut, spectral embedded clustering and local discriminant model and global integration (LDMGI). Experimental results on twelve benchmark image datasets show that the LEDC model achieved a comparable clustering performance as that of the near competitor LDMGI model. Clustering performance is comparable because no discriminant information of LDA is lost in EDA. The authors concluded that the proposed LEDC clustering model is less-parameterised with a comparable clustering performance as that of existing state-of-the-art local learning based clustering approaches.
Effective background modelling and subtraction approach for moving object detection
- Author(s): Wei Liu ; Hongfei Yu ; Huai Yuan ; Hong Zhao ; Xiaowei Xu
- Source: IET Computer Vision, Volume 9, Issue 1, p. 13 –24
- DOI: 10.1049/iet-cvi.2013.0242
- Type: Article
- + Show details - Hide details
-
p.
13
–24
(12)
This study presents a hierarchical background modelling and subtraction approach for real-time detection of moving objects. At the first level, a novel pixel-wise background modelling method is proposed for coarse detection. The method can dynamically assign the optimal number of components for each pixel with the borrow–lend strategy. And a flexible learning rate which is variable and different for each component is presented to adapt to scene changes. Additionally, a new mechanism using a framework of finite state machine is introduced to maintain and update the background models. At the second level, in order to deal with sudden illumination changes, a block-wise foreground validation approach is adopted for refined detection. The authors compare the proposed approach with state-of-the-art methods and experimental results under various scenes demonstrate the robustness and effectiveness of the proposed approach.
Learning multi-planar scene models in multi-camera videos
- Author(s): Fei Yin ; Sergio A. Velastin ; Tim Ellis ; Dimitrios Makris
- Source: IET Computer Vision, Volume 9, Issue 1, p. 25 –40
- DOI: 10.1049/iet-cvi.2013.0261
- Type: Article
- + Show details - Hide details
-
p.
25
–40
(16)
Many man-made environments are constructed with multiple levels where people walk, joined by stairs, ramps and overpasses. This study proposes a novel method to learn the geometry of a scene containing more than a single ground plane by tracking pedestrians and combining information from multiple views. The method estimates a scene model with multiple planes by measuring the variation of pedestrian heights across each camera's field of view. It segments the image into separate plane regions, estimating the relative depth and altitude for each image pixel, thus building a three-dimensional reconstruction of the scene. By estimating the multiple planes, the method enables tracking algorithms to follow objects (pedestrians and/or vehicles) that are moving on different ground planes in the scene. The authors also introduce what they believe is the first public dataset with pedestrian traffic on multiple planes to encourage other researchers to compare their work in this field.
Bi-level thresholding for binarisation of handwritten and printed documents
- Author(s): Jennifer Ranjani J.
- Source: IET Computer Vision, Volume 9, Issue 1, p. 41 –50
- DOI: 10.1049/iet-cvi.2013.0256
- Type: Article
- + Show details - Hide details
-
p.
41
–50
(10)
Document image binarisation algorithms have been available in the literature for decades. However, most of the state-of-the-art methods address specific image degradation or characteristics. Moreover, they require one or more parameters to be tuned manually so as to present a significant binary image. In this study, a hybrid approach for document binarisation is presented. In the pre-processing stage, the degradation in the background image is smoothed using the L 0-gradient minimisation algorithm and the foreground is enhanced using the local contrast feature. A divide and conquer based recursive auto-thresholding algorithm is then utilised to binarise the enhanced image. The proposed algorithm is evaluated objectively using the evaluation metrics such as F-measure, peak signal-to-noise ratio, negative rate metric. The extensive experiments over the different datasets including the Document Image Binarization Contest (DIBCO) 2009, Handwritten Document Image Binarization Competition (H-DIBCO) 2010, DIBCO 2011 and H-DIBCO 2012 show that the proposed hybrid binarisation algorithm outperforms most of the state-of-the-art algorithms significantly.
Probabilistic principal component analysis for texture modelling of adaptive active appearance models and its application for head pose estimation
- Author(s): Navid Mahmoudian Bidgoli ; Abolghasem A. Raie ; M. Naraghi
- Source: IET Computer Vision, Volume 9, Issue 1, p. 51 –62
- DOI: 10.1049/iet-cvi.2013.0317
- Type: Article
- + Show details - Hide details
-
p.
51
–62
(12)
This study suggests an application of human–robot interaction based on three-dimensional real-time monocular head pose tracker in which active appearance models (AAMs) are utilised to extract facial features. In order to improve texture model, two probabilistic approaches are proposed for principal component analysis in the presence of missing values. It is observed that using the suggested Bayesian model not only increases the fitting accuracy of the model, but also reduces model parameters which may cause an increase in the speed of model fitting. Moreover, contrary to the common assumption in AAM, the gradient matrix must not be supposed to be constant. In this investigation, a method is suggested in which the gradient matrix is adapted with new images during model fitting of video sequences as much as possible. In the next step, by means of suggested methods, operator's head pose will be estimated by POSIT algorithm and by its implementation on PeopleBot robot, enhancement of the interaction between human and robot is presented in order to control the orientation of the robot camera.
Tracking with spatial constrained coding
- Author(s): Xiaolin Tian ; Licheng Jiao ; Fandi Zhao ; Xiaohua Zhang
- Source: IET Computer Vision, Volume 9, Issue 1, p. 63 –74
- DOI: 10.1049/iet-cvi.2014.0017
- Type: Article
- + Show details - Hide details
-
p.
63
–74
(12)
A video tracking method based on spatial constrained coding (SCC) is proposed in this study. To characterise local image structure information, the dense scale-invariant feature transform (SIFT) descriptor is extracted for each pixel in the image. The proposed tracking method uses SCC model which adopts a new constrained strategy – weighted code, which is achieved by considering the sum of the weighted codes based on grey values of neighbouring pixels and distances between them. The proposed model is able to obtain robust code of corresponding pixels in the frames of complex scenes by taking spatial information into account, which enhances the stability of coding and makes the tracker more robust for object tracking. Twelve challenging sequences involving partial or full occlusion, large pose variation and drastic illumination change are chosen to test the proposed method. The experimental results show the proposed method performs excellent in comparison with other previously proposed trackers.
Nose tip detection on three-dimensional faces using pose-invariant differential surface features
- Author(s): Ye Li ; YingHui Wang ; BingBo Wang ; LianSheng Sui
- Source: IET Computer Vision, Volume 9, Issue 1, p. 75 –84
- DOI: 10.1049/iet-cvi.2014.0070
- Type: Article
- + Show details - Hide details
-
p.
75
–84
(10)
Three-dimensional (3D) facial data offer the potential to overcome the difficulties caused by the variation of head pose and illumination in 2D face recognition. In 3D face recognition, localisation of nose tip is essential to face normalisation, face registration and pose correction etc. Most of the existing methods of nose tip detection on 3D face deal mainly with frontal or near-frontal poses or are rotation sensitive. Many of them are training-based or model-based. In this study, a novel method of nose tip detection is proposed. Using pose-invariant differential surface features – high-order and low-order curvatures, it can detect nose tip on 3D faces under various poses automatically and accurately. Moreover, it does not require training and does not depend on any particular model. Experimental results on GavabDB verify the robustness and accuracy of the proposed method.
Salient region detection: an integration approach based on image pyramid and region property
- Author(s): Lingfu Kong ; Liangliang Duan ; Wenji Yang ; Yan Dou
- Source: IET Computer Vision, Volume 9, Issue 1, p. 85 –97
- DOI: 10.1049/iet-cvi.2013.0285
- Type: Article
- + Show details - Hide details
-
p.
85
–97
(13)
Salient region detection is important for many computer vision tasks. The saliency detection results may serve as the basis for further high-level vision tasks like object segmentation and tracking. In this study, the authors propose an integration approach to detect salient region based on three principles from psychological evidence and observations of images, including colour contrast in a global context, spatially compact colour distribution, multi-scale image abstraction. Based on the above-mentioned principles, the authors’ saliency analysis approach can be formulated in a unified framework. Moreover, they introduce the weighted salient image centre into their saliency estimation model which can boost the performance of saliency detection. They have evaluated the results of their method on the two publicly available databases, including MSRA-1000 and MSRA-5000. The experimental results on the datasets demonstrate the effectiveness of the approaches against the other approaches to analyse image saliency.
Face tracking based on differential harmony search
- Author(s): Ming-Liang Gao ; Li-Li Li ; Xian-Ming Sun ; Dai-Sheng Luo
- Source: IET Computer Vision, Volume 9, Issue 1, p. 98 –109
- DOI: 10.1049/iet-cvi.2014.0035
- Type: Article
- + Show details - Hide details
-
p.
98
–109
(12)
Owing to its significant roles in computer vision applications, human face tracking has drawn extensive attention in recent years. Most researchers solve face tracking using particle filter, meanshift and their derivatives. Unlike the traditional methods, in this study, face tracking is treated as an optimisation problem and a new meta-heuristic optimisation algorithm, differential harmony search (DHS), is introduced to solve face tracking problems. We compare the speed and accuracy of the proposed method with particle filter, meanshift and improved harmony search. Experimental results show that DHS-based tracker is faster and more accurate and it is easy to handle the parameters tuning. Furthermore, to improve the reliability of tracking, multiple visual cues are applied to DHS-based tracking system and experimental results demonstrate the increased robustness achieved by fusing multiple cues.
Multi-scale mean shift tracking
- Author(s): Wangsheng Yu ; Xiaohua Tian ; Zhiqiang Hou ; Yufei Zha ; Yuan Yang
- Source: IET Computer Vision, Volume 9, Issue 1, p. 110 –123
- DOI: 10.1049/iet-cvi.2014.0077
- Type: Article
- + Show details - Hide details
-
p.
110
–123
(14)
In this study, a three-dimensional mean shift tracking algorithm, which combines the multi-scale model and background weighted spatial histogram, is proposed to address the problem of scale estimation under the framework of mean shift tracking. The target template is modelled with multi-scale model and described with three-dimensional spatial histogram. The tracking algorithm is implemented by three-dimensional mean shift iteration, which translates the problem of scale estimation in two-dimensional image plane into the localisation in three-dimensional image space. To enhance the robustness, the background weighted histogram is employed to suppress the background information in the target candidate model. Firstly, the multi-scale model and three-dimensional spatial histogram are introduced to represent the target template. Then, the three-dimensional mean shift iteration formulation is derived based on the similarity measure between the target model and the target candidate model. Finally, a multi-scale mean shift tracking algorithm combining multi-scale model and background weighted spatial histogram is proposed. The proposed algorithm is evaluated on some challenging sequences which contain scale changed targets and other complex appearance variations in comparison with three representative mean shift based tracking algorithms. Both the qualitative results and quantitative analysis indicate that the proposed algorithm outperforms the referenced algorithms in both tracking precision and scale estimation.
Online visual tracking by integrating spatio-temporal cues
- Author(s): Yang He ; Mingtao Pei ; Min Yang ; Yuwei Wu ; Yunde Jia
- Source: IET Computer Vision, Volume 9, Issue 1, p. 124 –137
- DOI: 10.1049/iet-cvi.2013.0247
- Type: Article
- + Show details - Hide details
-
p.
124
–137
(14)
The performance of online visual trackers has improved significantly, but designing an effective appearance-adaptive model is still a challenging task because of the accumulation of errors during the model updating with newly obtained results, which will cause tracker drift. In this study, the authors propose a novel online tracking algorithm by integrating spatio-temporal cues to alleviate the drift problem. The authors' goal is to develop a more robust way of updating an adaptive appearance model. The model consists of multiple modules called temporal cues, and these modules are updated in an alternate way which can keep both the historical and current information of the tracked object to handle drastic appearance change. Each module is represented by several fragments called spatial cues. In order to incorporate all the spatial and temporal cues, the authors develop an efficient cue quality evaluation criterion that combines appearance and motion information. Then the tracking results are obtained by a two-stage dynamic integration mechanism. Both qualitative and quantitative evaluations on challenging video sequences demonstrate that the proposed algorithm performs more favourably against the state-of-the-art methods.
Scene text recognition by learning co-occurrence of strokes based on spatiality embedded dictionary
- Author(s): Song Gao ; Chunheng Wang ; Baihua Xiao ; Cunzhao Shi ; Wen Zhou ; Zhong Zhang
- Source: IET Computer Vision, Volume 9, Issue 1, p. 138 –148
- DOI: 10.1049/iet-cvi.2014.0022
- Type: Article
- + Show details - Hide details
-
p.
138
–148
(11)
Text information contained in scene images is very helpful for high-level image understanding. In this study, the authors propose to learn co-occurrence of local strokes for scene text recognition by using a spatiality embedded dictionary (SED). Unlike spatial pyramid partitioning images into grids to incorporate spatial information, the authors SED associates every codeword with a particular response region and introduces more precise spatial information for robust character recognition. After localised soft coding and max pooling of the first layer, a sparse dictionary is learned to model co-occurrence of several local strokes, which further improves classification performance. Experimental results on two scene character recognition datasets ICDAR2003 and CHARS74 K demonstrate that their character recognition method outperforms state-of-the-art methods. Besides, competitive word recognition results are also reported for four benchmark word recognition datasets ICDAR2003, ICDAR2011, ICDAR2013 and street view text when combining their character recognition method with a conditional random field language model.
Cooperative object tracking using dual-pan–tilt–zoom cameras based on planar ground assumption
- Author(s): Zhigao Cui ; Aihua Li ; Guoyan Feng ; Ke Jiang
- Source: IET Computer Vision, Volume 9, Issue 1, p. 149 –161
- DOI: 10.1049/iet-cvi.2013.0246
- Type: Article
- + Show details - Hide details
-
p.
149
–161
(13)
Pan–tilt–zoom (PTZ) cameras play an important role in visual surveillance system. Dual-PTZ camera system is the simplest and most typical one. The superiority of this system lies in that it can obtain both large-view information and high-resolution local-view information of the tracked object at the same time. One method to achieve such task is to use master–slave configuration. One camera (master) tracks moving objects at low resolution and provides the positional information to another camera (slave). Then the slave camera can point towards the object at high resolution and track it dynamically. In this paper, we propose a novel framework exploiting planar ground assumption to achieve cooperative tracking. The approach differs from conventional methods in that we exploit planar geometric constraint to solve the camera collaboration problem. Compared with the existing approach, the proposed framework can be used in the case of wide baseline, and allows the depth change of the tracked object. The proposed method can also adapt to the dynamic change of the surveillance scene. Besides, we also describe a self-calibration method of homography matrix which is induced by the ground plane between two cameras. We demonstrate the effectiveness of the proposed method by testing it with a tracking system for surveillance applications.
Most viewed content
Most cited content for this Journal
-
Brain tumour classification using two-tier classifier with adaptive segmentation technique
- Author(s): V. Anitha and S. Murugavalli
- Type: Article
-
Driving posture recognition by convolutional neural networks
- Author(s): Chao Yan ; Frans Coenen ; Bailing Zhang
- Type: Article
-
Local directional mask maximum edge patterns for image retrieval and face recognition
- Author(s): Santosh Kumar Vipparthi ; Subrahmanyam Murala ; Anil Balaji Gonde ; Q.M. Jonathan Wu
- Type: Article
-
Fast and accurate algorithm for eye localisation for gaze tracking in low-resolution images
- Author(s): Anjith George and Aurobinda Routray
- Type: Article
-
‘Owl’ and ‘Lizard’: patterns of head pose and eye pose in driver gaze classification
- Author(s): Lex Fridman ; Joonbum Lee ; Bryan Reimer ; Trent Victor
- Type: Article