IET Computer Vision
Volume 10, Issue 8, December 2016
Volumes & issues:
Volume 10, Issue 8
December 2016
-
- Author(s): Weiwei Xing ; Yahui Zhao ; Ergude Bao
- Source: IET Computer Vision, Volume 10, Issue 8, p. 769 –779
- DOI: 10.1049/iet-cvi.2015.0464
- Type: Article
- + Show details - Hide details
-
p.
769
–779
(11)
Improving image resolution has broad applications and is an important research topic. Recently, a hybrid method Adaptive Sparse Domain Selection (ASDS) combining a reconstruction-based method and an example-based method has been proposed to take advantage of the two, but may not reconstruct sufficient details. In this study, the authors propose to improve ASDS: Zeyde's method is first used to obtain an intermediate image with high-frequency details, and then the obtained image is used to replace the autoregressive model of ASDS as the example-based term. In addition, the authors may split the input image into patches and use different parameter settings for the patches of different amount of details. Experimental results demonstrate the improved hybrid methods can produce high-quality images quantitatively and perceptually.
- Author(s): Ksh. Robert Singh and Saurabh Chaudhury
- Source: IET Computer Vision, Volume 10, Issue 8, p. 780 –787
- DOI: 10.1049/iet-cvi.2015.0486
- Type: Article
- + Show details - Hide details
-
p.
780
–787
(8)
This study describes the classification of four varieties of bulk rice grain images using back-propagation neural network (BPNN). Eighteen colour features, 27 texture features using grey-level co-occurrence matrix, 24 wavelet features and 45 combined features (combination of colour and texture) were extracted from the colour images of bulk rice grains. Classification was carried out on three different data set of images under different environmental conditions. It is seen that BPNN is able to classify faithfully the four varieties of rice grain even with a poor image quality. It is also found that classification based on reduced wavelet features outperform the classification using all other features (such as colour, texture features taken separately) for two data set of images with minimum resolution. The authors have further compared the proposed BPNN technique with other classifiers such as support vector machine, k-nearest neighbour and naive Bayes classifier on all the three data sets. It is found that the average classification accuracy of more than 96% was able to achieve using BPNN consistently on all different features for each data set.
- Author(s): Asmaa Sadiq ; Ghazali Sulong ; Loay Edwar
- Source: IET Computer Vision, Volume 10, Issue 8, p. 788 –797
- DOI: 10.1049/iet-cvi.2016.0009
- Type: Article
- + Show details - Hide details
-
p.
788
–797
(10)
Since 2003, the scan line corrector (SLC) of the Landsat 7 Enhanced Thematic Mapper Plus (ETM+) sensor has failed permanently, inhibiting the retrieval or scanning of 22% of the pixels in each Landsat 7 SLC-off image. This utter failure has seriously limited the scientific applications and usability of ETM+ data. Precise and complete recovery of the missing pixels for the Landsat 7 SLC-off images is a challenging issue and developing an efficient gap-fill algorithm with improved ETM+ data usability has been ever-demanding. In this study, a new gap filling method has been introduced to reconstruct the SLC-off images via multi-temporal SLC-off auxiliary fill images. A correlation is established between the corresponding pixels in the target SLC-off image and two auxiliary fill images in parallel using the multiple linear regressions model. Both simulated and actual defective Landsat 7 images were tested to assess the performance of the proposed model by comparing with two multi-temporal data based methods, the local linear histogram matching method and Neighbourhood Similar Pixel Interpolator method. The quantitative evaluations indicate that the proposed method makes an accurate estimate of the missing values even for more temporally distant fill images.
- Author(s): Zhizhi Guo ; Qianxiang Zhou ; Zhongqi Liu ; Chunhui Liu
- Source: IET Computer Vision, Volume 10, Issue 8, p. 798 –805
- DOI: 10.1049/iet-cvi.2015.0457
- Type: Article
- + Show details - Hide details
-
p.
798
–805
(8)
Although many 3D head pose estimation methods based on monocular vision can achieve an accuracy of 5°, how to reduce the number of required training samples and how to not to use any hardware parameters as input features are still among the biggest challenges in the field of head pose estimation. To aim at these challenges, the authors propose an accurate head pose estimation method which can act as an extension to facial key point detection systems. The basic idea is to use the normalised distance between key points as input features, and to use ℓ1-minimisation to select a set of sparse training samples which can reflect the mapping relationship between the feature vector space and the head pose space. The linear combination of the head poses corresponding to these samples represents the head pose of the test sample. The experiment results show that the authors’ method can achieve an accuracy of 2.6° without any extra hardware parameters or information of the subject. In addition, in the case of large head movement and varying illumination, the authors’ method is still able to estimate the head pose.
- Author(s): Mubeen Ghafoor ; Imtiaz Ahmad Taj ; Mohammad Noman Jafri
- Source: IET Computer Vision, Volume 10, Issue 8, p. 806 –816
- DOI: 10.1049/iet-cvi.2016.0005
- Type: Article
- + Show details - Hide details
-
p.
806
–816
(11)
A fingerprint image with non-uniform ridge frequencies can be considered as a two-dimensional dynamic signal. A non-uniform stress on the sensing area applied during fingerprint acquisition may result in a non-linear distortion that disturbs the local frequency of ridges adversely affecting the matching performance. This study presents a new approach based on Short time Fourier transform analysis and local adaptive contextual filtering for frequency distortion removal and enhancement. In the proposed approach, the fingerprint image is divided into sub-images to determine local dominant frequency and orientation. Gaussian Directional band pass filtering is then adaptively applied in frequency domain. These filtered sub-images are then combined in spatial domain using a novel technique to obtain the enhanced fingerprint image of high ridge quality and uniform inter-ridge distance. Simulation results show the efficacy of the proposed enhancement technique as compared to other well-known contextual filtering based enhancement techniques reported in the literature.
- Author(s): Irfan Riaz ; Xue Fan ; Hyunchul Shin
- Source: IET Computer Vision, Volume 10, Issue 8, p. 817 –827
- DOI: 10.1049/iet-cvi.2015.0451
- Type: Article
- + Show details - Hide details
-
p.
817
–827
(11)
This study addresses the shortcomings of the dark channel prior (DCP). The authors propose a new and efficient method for transmission estimation with bright-object handling capability. Based on the intensity value of a bright surface, they categorise DCP failures into two types: (i) obvious failure: occurs on surfaces that are brighter than ambient light. They show that, for these surfaces, altering the transmission value proportional to the brightness is better than the thresholding strategy; (ii) non-obvious failure: occurs on surfaces that are brighter than the neighbourhood average haziness value. Based on the observation that the transmission of a surface is loosely connected to its neighbours, the local average haziness value is used to recompute the transmission of such surfaces. This twofold strategy produces a better estimate of block and pixel-level haze thickness than DCP. To reduce haloes, a reliability map of block-level haze is generated. Then, via reliability-guided fusion of block- and pixel-level haze values, a high-quality refined transmission is obtained. Experimental results show that the authors’ method competes well with state-of-the-art methods in typical benchmark images while outperforming these methods in more challenging scenarios. The authors’ proposed reliability-guided fusion technique is about 60 times faster than other well-known DCP-based approaches.
- Author(s): Hyunduk Kim ; Myoung-Kyu Sohn ; Dong-Ju Kim ; Sang-Heon Lee
- Source: IET Computer Vision, Volume 10, Issue 8, p. 828 –835
- DOI: 10.1049/iet-cvi.2015.0242
- Type: Article
- + Show details - Hide details
-
p.
828
–835
(8)
In many situations, it would be practical for a computer system user interface to have a model of where a person is looking and what the user is paying attention to. In this study, the authors describe a novel feature coding method for head pose estimation. The widely-used sparse coding (SC) method encodes a test sample using a sparse linear combination of training samples. However, it does not consider the underlying structure of the data in the feature space. In contrast, locality-constrained linear coding (LLC) utilises locality constraints to project each input data into its local-coordinate system. Based on the recent success of LLC, the authors introduce locality-constrained sparse coding (LSC) to overcome the limitation of Sparse Coding. The authors also propose kernel locality-constrained sparse coding, which is a non-linear extension of LSC. By using kernel tricks, the authors implicitly map the input data into the kernel feature space associated with the kernel function. In experiments, the proposed algorithm was applied to a head pose estimation application. Experimental results demonstrated the increased effectiveness and robustness of the method.
- Author(s): Jian-Xun Mi and Tao Liu
- Source: IET Computer Vision, Volume 10, Issue 8, p. 836 –841
- DOI: 10.1049/iet-cvi.2015.0462
- Type: Article
- + Show details - Hide details
-
p.
836
–841
(6)
Error detection is an important approach to improve the robustness of face recognition method. However, it is hard to directly detect the invalid pixels in a facial image. The authors decompose the hard problem into many simpler sub-problems in this study. That is, the error detection process of pixels is divided into multiple phases and a portion of invalid pixels are detected in each phase. The goal is to decrease the ratio of invalid pixels to the whole pixels in a testing image, which progressively improves the final recognition accuracy. The performance that their method deals with occlusion and corruption problems is evaluated on different databases. In addition, the comparison with other state-of-the-art studies shows that the proposed method achieves the best results in face occlusion and disguise issues.
- Author(s): Houqiang Zhao ; Ke Xiang ; Songxiao Cao ; Xuanyin Wang
- Source: IET Computer Vision, Volume 10, Issue 8, p. 842 –851
- DOI: 10.1049/iet-cvi.2015.0371
- Type: Article
- + Show details - Hide details
-
p.
842
–851
(10)
Accurate human tracking in surveillance scenes is one of the preliminary requirements for other tasks. However, when the human target is small, the extracted features may not be prominent and thus the tracking performance is unsatisfactory. The colour feature is relatively robust to the change of target size and shape, but it is prone to be affected by the background information. For the above reasons, the authors introduce random walker segmentation into human tracking and determine the background region according to the distribution characteristics of segmentation results. Even if the colour of the target is very similar to that of the background, this algorithm can segment the target. Furthermore, the principal component analysis method is used to distinguish human targets from the background as well. During tracking, the authors prevent the degradation of the target model by adding new target information. In order to overcome the mean-shift local optimisation problem, the authors search for the candidate target region with the largest weight according to the sum of all probability in each region. Experimental results further show that the authors’ tracking algorithm demonstrates better performance on tracking small human target under some challenging scenes compared with several existing tracking methods.
- Author(s): Weili Ding ; Yong Li ; Honghai Liu
- Source: IET Computer Vision, Volume 10, Issue 8, p. 852 –860
- DOI: 10.1049/iet-cvi.2015.0390
- Type: Article
- + Show details - Hide details
-
p.
852
–860
(9)
Vanishing point detection is a key technique in the fields such as road detection, camera calibration and visual navigation. This study presents a new vanishing point detection method, which delivers efficiency by using a dark channel prior-based segmentation method and an adaptive straight lines search mechanism in the road region. First, the dark channel prior information is used to segment the image into a series of regions. Then the straight lines are extracted from the region contours, and the straight lines in the road region are estimated by a vertical envelope and a perspective quadrilateral constraint. The vertical envelope roughly divides the whole image into sky region, vertical region and road region. The perspective quadrilateral constraint, as the authors defined herein, eliminates the vertical lines interference inside the road region to extract the approximate straight lines in the road region. Finally, the vanishing point is estimated by the meanshift clustering method, which are computed based on the proposed grouping strategies and the intersection principles. Experiments have been conducted with a large number of road images under different environmental conditions, and the results demonstrate that the authors’ proposed algorithm can estimate vanishing point accurately and efficiently in unstructured road scenes.
- Author(s): Myoung-Kyu Sohn ; Sang-Heon Lee ; Hyunduk Kim ; Hyeyoung Park
- Source: IET Computer Vision, Volume 10, Issue 8, p. 861 –867
- DOI: 10.1049/iet-cvi.2015.0239
- Type: Article
- + Show details - Hide details
-
p.
861
–867
(7)
Hand pose recognition has received increasing attention in an area of human–computer interaction. With the recent spread of many low-cost three-dimensional (3D) cameras, research into understanding more natural gestures has increased. In this study, the authors present a method for hand part classification and joint estimation from a single depth image. They apply random decision forests (RDFs) for hand part classification. Foreground pixels in the hand image are estimated by RDF. Then hand joints are estimated based on the classified hand parts. They suggest a robust feature extraction method for per-pixel classification, which enhances the accuracy of hand part classification. They also propose a tree selection algorithm using legacy trained RDF to classify unseen test data. Selecting trees using the proposed method show better performance than using all the trees as in conventional method. Depth images and label images synthesised by 3D hand mesh model were used for training forests and algorithm verification. The authors’ experiments show that the enhanced algorithm outperforms the state-of-the-art method in accuracy.
- Author(s): Yumin Tian ; Haihong Zheng ; Qichao Chen ; Dan Wang ; Risan Lin
- Source: IET Computer Vision, Volume 10, Issue 8, p. 868 –872
- DOI: 10.1049/iet-cvi.2016.0128
- Type: Article
- + Show details - Hide details
-
p.
868
–872
(5)
To reduce the efforts of human in browsing long surveillance videos, synopsis videos are proposed. Traditional synopsis video generation methods condense most of the activities in the video by simultaneously showing several actions, even when they originally occurred at different times. This inevitably causes ignorance of temporal relationship among objects. For example, two persons walk shoulder to shoulder and they are detected and tracked separately, but in the synopsis they never ‘met’. In this study, a trajectory mapping model is defined, whose energy function includes not only the cost caused by the synopsis video, but that of the original video. In this way, it tries to make the relationship between objects of the original video consistent with that of the synopsis. Finally, the video synopsis is generated by an energy minimisation method. Experiments show that the proposed video synopsis can reduce the spatiotemporal redundancies of the input video as much as possible. Moreover, it can keep the important relationship between objects and maintain the time consistency of important activities.
- Author(s): Seniha E. Yuksel and Paul D. Gader
- Source: IET Computer Vision, Volume 10, Issue 8, p. 873 –883
- DOI: 10.1049/iet-cvi.2016.0138
- Type: Article
- + Show details - Hide details
-
p.
873
–883
(11)
In many applications data classification may be hindered by the existence of multiple contexts that produce an input sample. To alleviate the problems associated with multiple contexts, context-based classification is a process that uses different classifiers depending on a measure of the context. Context-based classifiers offer the promise of increasing performance by allowing classifiers to become experts at classifying input samples of certain types, rather than trying to force single classifiers to perform well on all possible inputs. This study introduces a novel mixture of experts (ME) model, the mixture of hidden Markov model experts, for context-based classification of samples that are variable length sequences; and derives the update equations for a single probabilistic model that to learn the experts and a gate that connects the experts. The model has a similar high-level structure to the ME model but has the novelty that the gates and the experts are HMMs and the input data are sequences. Experimental results are presented on three datasets including one for landmine detection. Detailed analysis of the model is provided; which, over multiple runs and cross-validation experiments, show superior results over the compared algorithms.
- Author(s): Junhua Yan ; Shunfei Wang ; Tianxia Xie ; Yong Yang ; Jiayi Wang
- Source: IET Computer Vision, Volume 10, Issue 8, p. 884 –893
- DOI: 10.1049/iet-cvi.2016.0075
- Type: Article
- + Show details - Hide details
-
p.
884
–893
(10)
To resist the adverse effect of shadow interference, illumination changes, indigent texture and scenario jitter in object detection and improve performance, a background modelling method based on local fusion feature and variational Bayesian learning is proposed. First, U-LBSP (uniform-local binary similarity patterns) texture feature, lab colour and location feature are used to construct local fusion feature. U-LBSP is modified from local binary patterns in order to reduce computational complexity and better resist the influence of shadow and illumination changes. Joint colour and location feature are introduced to deal with the problem of indigent texture and scenario jitter. Then, LFGMM (Gaussian mixture model based on local fusion feature) is updated and learned by variational Bayes. In order to adapt to dynamic changing scenarios, the variational expectation maximisation algorithm is applied for distribution parameters optimisation. In this way, the optimal number of Gaussian components as well as their parameters can be automatically estimated with less time expended. Experimental results show that the authors’ method achieves outstanding detection performance especially under conditions of shadow disturbances, illumination changes, indigent texture and scenario jitter. Strong robustness and high accuracy have been achieved.
- Author(s): Guang Han ; Heng Luo ; Jixin Liu ; Ning Sun ; Kun Du ; Xiaofei Li
- Source: IET Computer Vision, Volume 10, Issue 8, p. 894 –904
- DOI: 10.1049/iet-cvi.2016.0079
- Type: Article
- + Show details - Hide details
-
p.
894
–904
(11)
A novel multi-band joint local sparse tracking algorithm via wavelet transforms is proposed in this study. The object image may contain rich information of different types; the authors use wavelet transforms to decompose the object image into some sub-band images first. This will help extract the information in different frequency ranges for the object. Then same block operation is executed on all the sub-band images. The l 2, 1 mixed-norm is used to describe the multi-band joint local sparse representation on each patch; it can effectively extract the structural information in different frequency ranges. Thus, more accurate object appearance model can be established. Second, the coefficients on the diagonal of coefficient matrix are extracted for the confidence degrees of the candidate objects in this band, and then the confidence degree results in all the bands are fused to determine the best candidate object in the current frame. This can effectively alleviate the object drifting. Finally, both qualitative and quantitative evaluation results on 15 challenging video sequences demonstrate that the proposed tracking algorithm in this study can achieve better tracking effects compared with the other state-of-the-art algorithms.
- Author(s): Jianjun Li ; Xia Mao ; Xingyu Wu ; Xiaogeng Liang
- Source: IET Computer Vision, Volume 10, Issue 8, p. 905 –911
- DOI: 10.1049/iet-cvi.2016.0048
- Type: Article
- + Show details - Hide details
-
p.
905
–911
(7)
Human action recognition is an important task. This study presents an efficient framework for recognising action with a 3D skeleton kinematic joint model in less computational time for practical usage. First, a tensor shape descriptor (TSD) is proposed in this study, which takes advantage of the spatial independence of body joints, avoids a lot of difficult problem of the explicit motion estimation required in traditional methods, reserves the spatial information of each frame. Thus, the new TSD is a complete and view-invariant descriptor. Second, a novel tensor dynamic time warping (TDTW) method is proposed to measure joint-to-joint similarity of 3D skeletal body joints locally in the temporal extent, which is implemented by extending DTW to that of two multiway data arrays (or tensors). Then, a multi-linear projection process is employed to map the TSD to a low-dimensional tensor subspace, which is classified by the nearest neighbour classifier. The experiment results on the public action data set (MSR-Action3D) and motion capture data set (CMU_Mocap) show that the proposed method can achieve a comparable or better performance in recognition accuracy compared with the state-of-the-art approaches.
Improved hybrid method for image super-resolution
Efficient technique for rice grain classification using back-propagation neural network and wavelet decomposition
Recovering defective Landsat 7 Enhanced Thematic Mapper Plus images via multiple linear regression model
Training-based head pose estimation under monocular vision
Fingerprint frequency normalisation and enhancement using two-dimensional short-time Fourier transform analysis
Single image dehazing with bright object handling
Kernel locality-constrained sparse coding for head pose estimation
Multi-step linear representation-based classification for face recognition
Random walks colour histogram modification for human tracking
Efficient vanishing point detection method in unstructured road environments based on dark channel prior
Enhanced hand part classification from a single depth image using random decision forests
Surveillance video synopsis generation method via keeping important relationship among objects
Context-based classification via mixture of hidden Markov model experts with applications in landmine detection
Variational Bayesian learning for background subtraction based on local fusion feature
Multi-band joint local sparse tracking via wavelet transforms
Human action recognition based on tensor shape descriptor
Most viewed content
Most cited content for this Journal
-
Brain tumour classification using two-tier classifier with adaptive segmentation technique
- Author(s): V. Anitha and S. Murugavalli
- Type: Article
-
Driving posture recognition by convolutional neural networks
- Author(s): Chao Yan ; Frans Coenen ; Bailing Zhang
- Type: Article
-
Local directional mask maximum edge patterns for image retrieval and face recognition
- Author(s): Santosh Kumar Vipparthi ; Subrahmanyam Murala ; Anil Balaji Gonde ; Q.M. Jonathan Wu
- Type: Article
-
Fast and accurate algorithm for eye localisation for gaze tracking in low-resolution images
- Author(s): Anjith George and Aurobinda Routray
- Type: Article
-
‘Owl’ and ‘Lizard’: patterns of head pose and eye pose in driver gaze classification
- Author(s): Lex Fridman ; Joonbum Lee ; Bryan Reimer ; Trent Victor
- Type: Article