IET Computer Vision
Volume 10, Issue 6, September 2016
Volumes & issues:
Volume 10, Issue 6
September 2016
-
- Author(s): Shervin Rahimzadeh Arashloo
- Source: IET Computer Vision, Volume 10, Issue 6, p. 466 –474
- DOI: 10.1049/iet-cvi.2015.0222
- Type: Article
- + Show details - Hide details
-
p.
466
–474
(9)
Robustness to a diverse range of image transformations and distortions has been an everlasting goal of visual pattern recognition. While there have been a huge number of efforts to advance the state-of-the art in this direction over the last decades, two prominent outstanding schemes, among others, are deep multilayer architectures and graphical models, providing some degree of robustness to undesired image perturbations. In this study, the authors aim at shedding some light on the underlying concepts, mechanisms, strengths and potentials of each methodology while discussing their relative merits from a practical point of view. In particular, they discuss the underlying motivations for the construction of deep multilayer architectures and undirected graphical models, also known as Markov random fields. The principles in the construction of each architecture, how invariance properties are achieved in each approach, the efficiency of each approach in terms of computations required during train and test as well as the degree of human labour required in each approach are discussed. Finally, an experimental comparison of the performances of the two frameworks is performed on a challenging problem of face recognition in unconstrained settings in the presence of a wide range of undesirable visual perturbations.
A comparison of deep multilayer networks and Markov random field matching models for face recognition in the wild
-
- Author(s): Yueqiang Zhang ; Xin Li ; Haibo Liu ; Yang Shang
- Source: IET Computer Vision, Volume 10, Issue 6, p. 475 –482
- DOI: 10.1049/iet-cvi.2015.0099
- Type: Article
- + Show details - Hide details
-
p.
475
–482
(8)
In this study, the authors have proposed a new solution for the problem of pose estimation from a set of matched 3D model and 2D image lines. Traditional line-based pose estimation methods utilising the finite information of the observations are based on the assumption that the noises for the two endpoints of the image line segment are statistically independent. However, in this study, the authors prove that these two noises are negatively correlative when the image line segment is fitted by the least-squares technique from the noisy edge points. Moreover, the authors derive the noise model describing the probabilistic relationship between the 3D model line and their finite image observations. Based on the proposed noise model, the maximum-likelihood approach is exploited to estimate the pose parameters. The authors have carried out synthetic experiments to compare the proposed method to other pose optimisation methods in the literature. The experimental results show that the proposed methods yield a clear higher precision than the traditional methods. The authors also use real image sequences to demonstrate the performance of the proposed method.
- Author(s): Xuesong Wang ; Chen Chen ; Yuhu Cheng
- Source: IET Computer Vision, Volume 10, Issue 6, p. 483 –492
- DOI: 10.1049/iet-cvi.2015.0131
- Type: Article
- + Show details - Hide details
-
p.
483
–492
(10)
The existing attribute-based zero-shot learning models at different levels ignore some necessary prior knowledge. It is essential to improve classification accuracy of zero-shot learning that how to mine attribute-related and class-related prior knowledge further being incorporated into the attribute prediction models. For the mining of class-related prior knowledge, measurement of the class–class correlation by using whitened cosine similarity is proposed. Likewise for the mining of attribute-related prior knowledge, measurements of the attribute–class and attribute–attribute correlation are proposed by using sparse representation coefficient. Therefore, a novel indirect attribute prediction (IAP) model is presented by exploiting class-related and attribute-related prior knowledge (IAP_CAPK). Experimental results on animals with attributes and a-Pascal/a-Yahoo datasets show that, when compared with IAP and direct attribute prediction, the proposed IAP_CAPK not only yields more accurate attribute prediction and zero-shot image classification, but also achieves much higher computational efficiency.
- Author(s): Shibin Xuan ; Shuenling Xiang ; Haiying Ma
- Source: IET Computer Vision, Volume 10, Issue 6, p. 493 –502
- DOI: 10.1049/iet-cvi.2015.0350
- Type: Article
- + Show details - Hide details
-
p.
493
–502
(10)
Representation-based face-recognition techniques have received attention in the field of pattern recognition in recent years; however, the well-known works focus mainly on constraint conditions and dictionary learning. Few researchers study, which sample data features determine the performance of representation-based classification algorithms. To address this problem, the authors define the structure-scatter degree, which represents the structural features of training sample sets, to determine whether a set is suitable for the representation-based classification algorithm. Experimental results show that sets with a higher structure scatter more likely allows a classification algorithm to obtain a higher recognition rate. Further, the block contribution degree (DBC) of a training sample set is defined to evaluate whether a sample set is suitable for block-based sparse-representation classification algorithms. Experimental results indicate that if the DBC approaches zero, the block technique is unlikely to improve the performance of a representation-based classification algorithm. Thus, they devise a self-adaptive optimisation method to generate an optimal block size, an overlapping degree, and a block-weighting scheme. Finally, they propose the structure scatter-based subclass representation classification. Experimental results demonstrate that the proposed algorithm not only improves the recognition accuracy of the representation-based classification algorithm, but also greatly reduces its time complexity.
- Author(s): Xue Fan and Hyunchul Shin
- Source: IET Computer Vision, Volume 10, Issue 6, p. 503 –512
- DOI: 10.1049/iet-cvi.2015.0313
- Type: Article
- + Show details - Hide details
-
p.
503
–512
(10)
In this study, a novel and efficient technique is proposed for road vanishing point detection in challenging scenes. Currently, most existing texture-based methods detect the vanishing point using pixel wise texture orientation estimation and voting map generation, which suffers from high computational complexity. Since only road trails (e.g. road edges, ruts, and tire tracks) would contribute informative votes to vanishing point detection, the Weber adaptive local filter is proposed to distinguish road trails from background noise, which is envisioned to reduce the workload and to eliminate uninformative votes introduced by the background noise. Furthermore, instead of using the conventional pixel-wise voting scheme, the salient-block-wise weighted soft voting is developed to eliminate most of the noise votes introduced by incorrectly estimated pixel-wise texture orientations, and to further reduce the computation time of voting stage as well. The experimental results on the benchmark dataset demonstrate that the proposed method shows superior performance. The authors’ method is about ten times faster in detection speed and outperforms by 3.6% in detection accuracy, when compared with a well-known state-of-the-art approach.
- Author(s): Richa Srivastava ; Om Prakash ; Ashish Khare
- Source: IET Computer Vision, Volume 10, Issue 6, p. 513 –527
- DOI: 10.1049/iet-cvi.2015.0251
- Type: Article
- + Show details - Hide details
-
p.
513
–527
(15)
Various multimodal medical images like computed tomography (CT), magnetic resonance imaging (MRI), positron emission tomography, single photon emission CT and structural MRI have different characteristics and carry different types of complementary anatomical and functional information. Therefore, fusion of multimodal images is required, in order to achieve good spatial resolution images carrying both anatomical and functional information. In this work, the authors have proposed a fusion technique based on curvelet transform. Curvelet transform is a multiscale, multidirectional transform having anisotropic property and is very efficient in capturing edge points in images. Edges in an image are the important information carrying points used to show better visual structure of the image. They use local energy-based fusion rule which is more effective than single pixel-based fusion rules. Comparison of the proposed method with other existing spatial and wavelet transform based methods, in terms of visual and quantitative measures show the effectiveness of the proposed method. For quantitative analysis of the method, they used five fusion metrics as entropy, standard deviation, edge-strength, sharpness and average gradient.
- Author(s): Hong-Bo Zhang ; Qing Lei ; Duan-Sheng Chen ; Bi-Neng Zhong ; Jialin Peng ; Ji-Xiang Du ; Song-Zhi Su
- Source: IET Computer Vision, Volume 10, Issue 6, p. 528 –536
- DOI: 10.1049/iet-cvi.2015.0420
- Type: Article
- + Show details - Hide details
-
p.
528
–536
(9)
In this study, the authors investigate the possibility of boosting action recognition performance by exploiting the associated scene context. Towards this end, the authors model a scene as a mid-level ‘middle layer’ in order to bridge action descriptors and action categories. This is achieved via a scene topic model, in which hybrid visual descriptors, including spatial–temporal action features and scene descriptors, are first extracted from a video sequence. Then, the authors learn a joint probability distribution between scene and action using a naive Bayes nearest neighbour algorithm, which is adopted to jointly infer the action categories online by combining off-the-shelf action recognition algorithms. The authors demonstrate the advantages of their approach by comparing it with state-of-the-art approaches using several action recognition benchmarks.
- Author(s): Jun Lei ; Guohui Li ; Jun Zhang ; Qiang Guo ; Dan Tu
- Source: IET Computer Vision, Volume 10, Issue 6, p. 537 –544
- DOI: 10.1049/iet-cvi.2015.0408
- Type: Article
- + Show details - Hide details
-
p.
537
–544
(8)
Continuous action recognition in video is more complicated compared with traditional isolated action recognition. Besides the high variability of postures and appearances of each action, the complex temporal dynamics of continuous action makes this problem challenging. In this study, the authors propose a hierarchical framework combining convolutional neural network (CNN) and hidden Markov model (HMM), which recognises and segments continuous actions simultaneously. The authors utilise the CNN's powerful capacity of learning high level features directly from raw data, and use it to extract effective and robust action features. The HMM is used to model the statistical dependences over adjacent sub-actions and infer the action sequences. In order to combine the advantages of these two models, the hybrid architecture of CNN-HMM is built. The Gaussian mixture model is replaced by CNN to model the emission distribution of HMM. The CNN-HMM model is trained using embedded Viterbi algorithm, and the data used to train CNN are labelled by forced alignment. The authors test their method on two public action dataset Weizmann and KTH. Experimental results show that the authors’ method achieves improved recognition and segmentation accuracy compared with several other methods. The superior property of features learnt by CNN is also illustrated.
- Author(s): Xintong Cui ; Jiangming Kan ; Wenbin Li
- Source: IET Computer Vision, Volume 10, Issue 6, p. 545 –550
- DOI: 10.1049/iet-cvi.2015.0020
- Type: Article
- + Show details - Hide details
-
p.
545
–550
(6)
Illumination influences the performance of region feature matching based on a grey image. A novel region-matching algorithm based on the colour invariants and colour-invariant moments in rgb orthogonal colour space is proposed. First, a colour image is converted in RGB colour space to rgb orthogonal colour space that has colour invariance. Second, the colour invariants H and C λ are calculated. Then, the maximally stable extremal region is extracted from the colour invariants and the colour-invariant moments are computed. Finally, the nearest neighbour method is used to find corresponding regions. The proposed method can take advantage of both the colour and geometric properties of the images to solve the problem of illumination influences. Experimental results from the Amsterdam Library of Object Images database and images captured on the Beijing Forestry University campus show that performance of the proposed algorithm is better than that of prior art methods.
- Author(s): Li-juan Wang ; Jing Han ; Yi Zhang ; Lian-fa Bai
- Source: IET Computer Vision, Volume 10, Issue 6, p. 551 –558
- DOI: 10.1049/iet-cvi.2015.0280
- Type: Article
- + Show details - Hide details
-
p.
551
–558
(8)
In view of the shortcoming of traditional image fusion based on discrete wavelet transform (DWT) with unclear textural information, an effective visible light and infrared image fusion algorithm via feature residual and statistical matching is proposed in this study. First, the source images are decomposed into low-frequency coefficients and high-frequency coefficients by DWT. Second, two different fusion schemes are designed for the low-frequency coefficients and high frequency ones, respectively. The low-frequency coefficients are fused by a local feature residual-based scheme to achieve adaptive fusion; the high-frequency coefficients are accomplished by a local statistical matching-based scheme to extract the edge information effectively. Finally, the fused image is obtained by inverse DWT. Experimental results demonstrate that the proposed method can produce a more accurate fused image, leading to an improved performance compared with existing methods.
- Author(s): Arash Rikhtegar ; Mohammad Pooyan ; Mohammad Taghi Manzuri-Shalmani
- Source: IET Computer Vision, Volume 10, Issue 6, p. 559 –566
- DOI: 10.1049/iet-cvi.2015.0037
- Type: Article
- + Show details - Hide details
-
p.
559
–566
(8)
Proposing a proper method for face recognition is still a challenging subject in biometric and computer vision applications. Although some reliable systems were introduced under relatively controlled conditions, their recognition rate is not satisfactory in the general settings. This is especially true when there are variations in pose, illumination, and facial expression. To alleviate these problems, a hybrid face recognition system is proposed which benefits from the superiority of both convolutional neural network (CNN) and support vector machine (SVM). To this end, first a genetic algorithm is employed to find the optimum structure of CNN. Then, the performance of the system is improved by replacing the last layer of CNN with an ensemble of SVMs. Finally, using concepts of error correction, decision is made. The potential of CNN as a trainable feature extractor provides a flexible recognition system that can recognise faces with variations in pose and illumination. Simulation results show that the system achieves good recognition rate and is robust against variations in terms of facial expressions, occlusion, noise, and illuminations.
- Author(s): Sunil Kumar ; M.K. Bhuyan ; Biplab Ketan Chakraborty
- Source: IET Computer Vision, Volume 10, Issue 6, p. 567 –576
- DOI: 10.1049/iet-cvi.2015.0273
- Type: Article
- + Show details - Hide details
-
p.
567
–576
(10)
The aim of facial expression recognition (FER) algorithms is to extract discriminative features of a face. However, discriminative features for FER can only be obtained from the informative regions of a face. Also, each of the facial subregions have different impacts on different facial expressions. Local binary pattern (LBP) based FER techniques extract texture features from all the regions of a face, and subsequently the features are stacked sequentially. This process generates the correlated features among different expressions, and hence affects the accuracy. This research moves toward addressing these issues. The authors' approach entails extracting discriminative features from the informative regions of a face. In this view, they propose an informative region extraction model, which models the importance of facial regions based on the projection of the expressive face images onto the neural face images. However, in practical scenarios, neutral images may not be available, and therefore the authors propose to estimate a common reference image using Procrustes analysis. Subsequently, weighted-projection-based LBP feature is derived from the informative regions of the face and their associated weights. This feature extraction method reduces miss-classification among different classes of expressions. Experimental results on standard datasets show the efficacy of the proposed method.
- Author(s): Souad Lahrache ; Rajae El Ouazzani ; Abderrahim El Qadi
- Source: IET Computer Vision, Volume 10, Issue 6, p. 577 –584
- DOI: 10.1049/iet-cvi.2015.0383
- Type: Article
- + Show details - Hide details
-
p.
577
–584
(8)
Image memorability represents the degree to which images are remembered or forgotten after a period of time. Studying image memorability in computer vision is the task of finding special characteristics in memorable images, in order to develop a representative model of this type of images. Several approaches have been realised to examine features that can affect image memorability. In this study, the authors use bag-of-features as another kind of visual feature descriptor to assess image memorability. The authors’ method based on bag-of-visual-words (BoVWs) technique involves four main steps. First, the authors extract local image features from regions/points of interest which are automatically detected. Then, they encode these local features by mapping them to a created visual vocabulary. Later, the authors apply features pooling and normalisation techniques to obtain image BoVW representation. Finally, the authors use this representation to examine image memorability as a problem of classification. They present different implementation choices for each step and compare reached results. The authors’ method performs best significant results in comparison with other approaches found in literature.
- Author(s): S. Gu ; Z. Ma ; M. Xie ; Z. Chen
- Source: IET Computer Vision, Volume 10, Issue 6, p. 585 –592
- DOI: 10.1049/iet-cvi.2015.0210
- Type: Article
- + Show details - Hide details
-
p.
585
–592
(8)
Template tracking has been extensively investigated in computer vision to track objects for various applications. Tracking based on gradient descent algorithm using image gradient is one of the most popular object tracking method. However, it is difficult to define the relationship between the observed data set and the warping function due to the unobserved heterogeneity of the data set which inevitably results in poor tracking performance. This study proposes a novel method based on hierarchical mixture of expert to perform robust, real-time tracking from stationary cameras. By extending the idea of hyperplane approximation, the proposed approach establishes a hierarchical mixture of generalised linear regression model instead of a single model which reduces the non-linear error. The experiments’ results show significant improvement over the traditional hyperplane approximation (HA) approach.
- Author(s): Brindha Murugan and Ammasai Gounden Nanjappa Gounder
- Source: IET Computer Vision, Volume 10, Issue 6, p. 593 –602
- DOI: 10.1049/iet-cvi.2015.0344
- Type: Article
- + Show details - Hide details
-
p.
593
–602
(10)
This study proposes a chaos-based image encryption scheme using Henon map and Lorenz equation with multiple levels of diffusion. The Henon map is used for confusion and the Lorenz equation for diffusion. Apart from the Lorenz equation, another matrix with the same size as the original image is generated which is a complex function of the original image. This matrix which is configured as a diffusion matrix permits two stages of diffusion. Due to this step, there is a strong sensitivity to input image. This encryption algorithm has high key space, entropy very close to eight (for grey images) and very less correlation among adjacent pixels. The highlight of this method is the ideal number of pixels change rate and unified average changing intensity it offers. These ideal values indicate that the encrypted images produced by this proposed scheme are random-like. Further, a cryptanalysis study has been carried out to prove that the proposed algorithm is resistant to known attacks.
- Author(s): Zongwei Zhou and Zhong Jin
- Source: IET Computer Vision, Volume 10, Issue 6, p. 603 –612
- DOI: 10.1049/iet-cvi.2015.0298
- Type: Article
- + Show details - Hide details
-
p.
603
–612
(10)
Object detection plays a critical role for automatic video analysis in many vision applications. Background subtraction has been the mainstream in the field of moving objects detection. However, most of state-of-the-art techniques of background subtraction operate on each pixel independently ignoring the global features of images. A motion detection method based on subspace update of background is proposed in this study. This method uses a subspace spanned by the principal components of background sequence to characterise the background and integrates the regional continuity of objects to segment the foreground. To deal with changes in the background geometry, a learning factor is introduced into the authors’ model to update the subspace timely. Additionally, to reduce computational complexity, they use two-dimension principal component analysis (PCA) rather than traditional PCA to obtain the principal components of background. Experiments demonstrate that the update policy is effective and in most cases this proposed method can achieve better results than others compared in this study.
Probabilistic approach for maximum likelihood estimation of pose using lines
Zero-shot learning by exploiting class-related and attribute-related prior knowledge
Subclass representation-based face-recognition algorithm derived from the structure scatter of training samples
Road vanishing point detection using weber adaptive local filter and salient-block-wise weighted soft voting
Local energy-based multimodal medical image fusion in curvelet domain
Probability-based method for boosting human action recognition using scene context
Continuous action segmentation and recognition using hybrid convolutional neural network-hidden Markov model model
Region matching based on colour invariants in rgb orthogonal space
Image fusion via feature residual and statistical matching
Genetic algorithm-optimised structure of convolutional neural network for face recognition applications
Extraction of informative regions of a face for facial expression recognition
Bag-of-features for image memorability evaluation
Online learning of mixture experts for real-time tracking
Image encryption scheme based on block-based confusion and multiple levels of diffusion
Two-dimension principal component analysis-based motion detection framework with subspace update of background
Most viewed content
Most cited content for this Journal
-
Brain tumour classification using two-tier classifier with adaptive segmentation technique
- Author(s): V. Anitha and S. Murugavalli
- Type: Article
-
Driving posture recognition by convolutional neural networks
- Author(s): Chao Yan ; Frans Coenen ; Bailing Zhang
- Type: Article
-
Local directional mask maximum edge patterns for image retrieval and face recognition
- Author(s): Santosh Kumar Vipparthi ; Subrahmanyam Murala ; Anil Balaji Gonde ; Q.M. Jonathan Wu
- Type: Article
-
Fast and accurate algorithm for eye localisation for gaze tracking in low-resolution images
- Author(s): Anjith George and Aurobinda Routray
- Type: Article
-
‘Owl’ and ‘Lizard’: patterns of head pose and eye pose in driver gaze classification
- Author(s): Lex Fridman ; Joonbum Lee ; Bryan Reimer ; Trent Victor
- Type: Article