IET Computer Vision
Volume 13, Issue 2, March 2019
Volumes & issues:
Volume 13, Issue 2
March 2019
-
- Source: IET Computer Vision, Volume 13, Issue 2, p. 87 –89
- DOI: 10.1049/iet-cvi.2019.0018
- Type: Article
- + Show details - Hide details
-
p.
87
–89
(3)
- Author(s): Yanshan Li ; Xianchen Wang ; Qinghua Huang ; Xiaohui Hu ; Weixin Xie
- Source: IET Computer Vision, Volume 13, Issue 2, p. 90 –96
- DOI: 10.1049/iet-cvi.2018.5112
- Type: Article
- + Show details - Hide details
-
p.
90
–96
(7)
Spatial–spectral representation plays an important role in hyperspectral images (HSIs) classification. However, many of the existing local feature algorithms for HSIs are based on the two-dimensional image and do not take full advantage of the information hidden in HSI, such as spatial–spectral locality correlation information, thereby reducing the robustness of these algorithms. In response to these problems, this study presents a robust multi-view spatial–spectral representation method with the characteristics of HSIs. There are two key techniques in this representation method, called spatial–spectral locality constrained linear coding (SSLLC) and spatial–spectral pyramid matching model (SSPM). Firstly, SSLLC applies the locality information of the feature points and visual words and uses the discriminant information provided by the nearest-neighbouring spatial–spectral feature points in HSIs. Secondly, SSPM works by partitioning the image into increasingly fine sub-cubes and uses the cubes to match the local features of the HSIs. The multi-view representation is tolerant to illumination change, image rotation, affine distortion etc. To assess the validity of authors' algorithm, the authors compared their results with several existing approaches, including a deep learning method. The experimental results show that this representation method can effectively improve the accuracy of HSIs classification.
- Author(s): Timothy Apasiba Abeo ; Xiang-Jun Shen ; Jian-Ping Gou ; Qi-Rong Mao ; Bing-Kun Bao ; Shuying Li
- Source: IET Computer Vision, Volume 13, Issue 2, p. 97 –108
- DOI: 10.1049/iet-cvi.2018.5135
- Type: Article
- + Show details - Hide details
-
p.
97
–108
(12)
This study proposes a novel dimensionality reduction (DR) method for multi-view datasets. The principal component analysis (PCA) idea of minimising least squares reconstruction errors is extended to consider both data distribution and penalty weights called dictionary to recover outliers free global structures from missing and noisy data points. In this way, PCA is viewed as a special instance of the authors’ proposed dictionary induced least squares framework (DLS). Furthermore, to appropriately handle multi-view DR, we combine the DLS with multiple manifold embeddings (DLSME). Therefore it can obtain lower projections while maintaining a balance between preserving global structures with DLS and local structures with multi-manifold embeddings. Extensive experiments on object and face recognition datasets verify that the DLS achieves better classification results with lower dimensional projections than PCA. Also, on many multi-view datasets of visual recognition and web image annotation, the DLSME method demonstrates more effectiveness than Graph-Laplacian PCA (gLPCA), robust PCA-optimal mean, canonical correlation analysis (CCA), bilinear models (BLM), neighbourhood preserving embedding, locality preserving projections, and locality sensitive discriminant analysis.
- Author(s): Ming Yan ; Ling Liu ; Sunitha Basodi ; Yi Pan
- Source: IET Computer Vision, Volume 13, Issue 2, p. 109 –116
- DOI: 10.1049/iet-cvi.2018.5162
- Type: Article
- + Show details - Hide details
-
p.
109
–116
(8)
Benign epilepsy with centrotemporal spikes (BECT) may be the most popular epilepsy to attack children. In recent years, more and more studies have shown that magnetic resonance imaging (MRI) and functional magnetic resonance imaging (fMRI) are promising techniques in distinguishing BECT patients from healthy controls. However, these existing works have suffered from two limitations. On the one hand, they have paid more attention to the brain changes between BETC and healthy controls than developing machine learning methods that can recognize BECT patients. On the other hand, most of the existing approaches extract hand-crafted features from MRI or fMRI, which cannot obtain the desired performance due to the limited representative capacity of the used features. To address these issues, we propose a novel classification method by fusing the predictions of three different views: hand-crafted features view, MRI view, and fMRI view. The final result is obtained by passing through those predictions after a fusing neural network. The basic idea of our method is that multiple views could provide complementary information and thus can boost the classification performance. Extensive experiments show that the proposed multi-view method is remarkably superior to single-view methods.
- Author(s): Hang Shao ; Yuchen Guo ; Guiguang Ding ; Jungong Han
- Source: IET Computer Vision, Volume 13, Issue 2, p. 117 –124
- DOI: 10.1049/iet-cvi.2018.5131
- Type: Article
- + Show details - Hide details
-
p.
117
–124
(8)
This study considers the zero-shot learning problem under the multi-label setting where each test sample is associated with multiple labels that are unseen in training data. The authors propose a novel learning framework based on label factorisation for this problem. Specifically, the authors’ framework takes three key issues into consideration and addresses them in a unified way. The first is knowledge transfer that utilises information from seen classes to build recognition models for unseen classes. The second is label correlation which means that labels which have different semantics may co-occur frequently. This is an important issue in multi-label learning. The authors propose to learn a shared latent space by label factorisation and use the label semantics as the decoding function, which can address both issues. The third is the predictability which requires the learned latent space to be strongly related to the visual features. It is guaranteed by incorporating a regression model into the learning framework. The authors derive two specific formulations from the general framework and propose the corresponding learning algorithms. The authors conducted extensive experiments on three multi-label data sets. The results demonstrated the effectiveness.
- Author(s): Wenkai Zhang ; Kun Fu ; Xian Sun ; Yuhang Zhang ; Hao Sun ; Hongqi Wang
- Source: IET Computer Vision, Volume 13, Issue 2, p. 125 –130
- DOI: 10.1049/iet-cvi.2017.0568
- Type: Article
- + Show details - Hide details
-
p.
125
–130
(6)
Image collection summarisation aims to represent a large-scale multi-modal collection with a small subset of images and tags, helping navigate a large image dataset. Most extant methods leverage the contributions of text-to-visual summaries, ignoring the visual contribution to the textual topic. When the tags are weakly labelled, the textual topic cannot accurately reflect the visual summary. To solve this, the authors propose a novel model, joint optimisation of convex non-negative matrix factorisation, which incorporates images and tags in a beneficial way. The objective function contains visual and textual error functions, sharing the same indicator matrix, connecting different modal relations. Then, they propose an iterative algorithm to optimise the proposed model. Finally, they explore the effects of different visual feature representations (e.g. bag-of-words and deep learning) on multi-modal collection summary. Our proposed method is then compared with state-of-the-art algorithms using two multi-modal datasets (i.e. MIRFlickr and NUS-WIDE-SCENE). Experimental results demonstrate the effectiveness of their proposed approach.
- Author(s): Chenchao Xiang ; Zhou Yu ; Suguo Zhu ; Jun Yu ; Xiaokang Yang
- Source: IET Computer Vision, Volume 13, Issue 2, p. 131 –138
- DOI: 10.1049/iet-cvi.2018.5104
- Type: Article
- + Show details - Hide details
-
p.
131
–138
(8)
Phrase-based visual grounding aims to localise the object in the image referred by a textual query phrase. Most existing approaches adopt a two-stage mechanism to address this problem: first, an off-the-shelf proposal generation model is adopted to extract region-based visual features, and then a deep model is designed to score the proposals based on the query phrase and extracted visual features. In contrast to that, the authors design an end-to-end approach to tackle the visual grounding problem in this study. They use a region proposal network to generate object proposals and the corresponding visual features simultaneously, and multi-modal factorised bilinear pooling model to fuse the multi-modal features effectively. After that, two novel losses are posed on top of the multi-modal features to rank and refine the proposals, respectively. To verify the effectiveness of the proposed approach, the authors conduct experiments on three real-world visual grounding datasets, namely Flickr-30k Entities, ReferItGame and RefCOCO. The experimental results demonstrate the significant superiority of the proposed method over the existing state-of-the-arts.
- Author(s): Jiatong Li ; Yanjie Zhao ; Zhiguo Jiang
- Source: IET Computer Vision, Volume 13, Issue 2, p. 139 –145
- DOI: 10.1049/iet-cvi.2018.5011
- Type: Article
- + Show details - Hide details
-
p.
139
–145
(7)
Here, the authors propose a correlation-guided Monte Carlo Markov chain (MCMC) solver to promote the efficiency for tracking multiple objects under recursive Bayesian filtering framework. Instead of randomly proposing the target location according to certain distribution, the authors’ method guides the MCMC solver to sample among locations that the targets are more likely to appear. The high possible locations for each target are obtained using its corresponding response map by evaluating the correlation between the target appearance and its online model. Furthermore, the proposed tracking framework is natural to transfer the rich domain-specific offline correlation features into the target online model. With the calculation in the Fourier domain and the reversible jump strategy for MCMC, the correlation-guided method is able to track variable multiple objects with high efficiency. At the same time, the correlation feature transfer strategy is capable of improving tracking precision with easy offline training. The proposed method is evaluated by both the synthetic and real scenario videos. The experimental results demonstrate the effectiveness of the proposed method and its superior performance against its counterparts.
- Author(s): Xianyou Zeng ; Long Xu ; Yigang Cen ; Ruizhen Zhao ; Wanli Feng
- Source: IET Computer Vision, Volume 13, Issue 2, p. 146 –156
- DOI: 10.1049/iet-cvi.2018.5158
- Type: Article
- + Show details - Hide details
-
p.
146
–156
(11)
It is a great challenge to develop an effective appearance model for robust visual tracking due to various interfering factors, such as pose change, occlusion, background clutter etc. More and more visual tracking methods tend to exploit the local appearance model to deal with the above challenges. In this study, the authors present a simple yet effective weighted structural local sparse appearance model, which can better describe the target appearance information through patch-based generative weight. To further improve the robustness of tracking, they implement this appearance model on two-scale patches. The two derived appearance models are then combined to form a collaborative model to play their advantages. Extensive experiments on the tracking benchmark dataset show that the proposed method performs favourably against several state-of-the-art methods.
- Author(s): Yundong Li ; Xueyan Zhang ; Hongguang Li ; Qichen Zhou ; Xianbin Cao ; Zhifeng Xiao
- Source: IET Computer Vision, Volume 13, Issue 2, p. 157 –164
- DOI: 10.1049/iet-cvi.2018.5129
- Type: Article
- + Show details - Hide details
-
p.
157
–164
(8)
Object detection and tracking under complex environment are challenging because of the disturbances induced by background clutter, illumination changes, occlusions and other factors. The bulk of traditional algorithms basically rely on hand-crafted features, which are not sufficiently robust to a complex environment. Moreover, the processes of detection and tracking are separated, which leads to the overall efficiency not high. In this study, a novel local probability model (LPM)-based mean shift (MS) algorithm is proposed to integrate object detection and tracking. The main contributions include: (i) a new framework based on the combination of LPM and MS is established for the integration of object tracking and detection. (ii) For object detection, the training and prediction of LPM are built by stacked denoising autoencoders based deep learning. (iii) For object tracking, an MS tracking algorithm leveraging LPM is modified to improve the tracking efficiency under a complex environment. Experimental results demonstrate that the proposed method is superior to the colour histograms based MS and histograms of oriented gradients based MS in terms of robustness and tracking accuracy.
- Author(s): Weidong Min ; Mengdan Fan ; Jing Li ; Qing Han
- Source: IET Computer Vision, Volume 13, Issue 2, p. 165 –171
- DOI: 10.1049/iet-cvi.2018.5586
- Type: Article
- + Show details - Hide details
-
p.
165
–171
(7)
In face recognition, searching a person's face in the whole picture is generally too time-consuming to ensure high-detection accuracy. Objects similar to the human face or multi-view faces in low-resolution images may result in the failure of face recognition. To alleviate the above problems, a real-time face recognition method based on pre-identification and multi-scale classification is proposed in this study. The face area is segmented based on the proportion of human faces in the pedestrian area to reduce the search range, and faces can be robustly detected in complicated scenarios such as heads moving frequently or with large angles. To accurately recognise small-scale faces, the authors propose the multi-scale and multi-channel shallow convolution network, which combines a multi-scale mechanism on the feature map with a multi-channel convolution network for real-time face recognition. It performs face matching only in the pre-identified face areas instead of the whole image, therefore it is more efficient. Experimental results showed that the proposed real-time face recognition method detects and recognises faces correctly, and outperforms the existing methods in terms of effectiveness and efficiency.
- Author(s): Yali Peng ; Lingjun Li ; Shigang Liu ; Jun Li ; Han Cao
- Source: IET Computer Vision, Volume 13, Issue 2, p. 172 –177
- DOI: 10.1049/iet-cvi.2018.5096
- Type: Article
- + Show details - Hide details
-
p.
172
–177
(6)
Due to the environment and equipment are not controllable, the process of face image acquisition is inevitable to be interfered by external factors, and there are usually only a small number of available face images. Insufficient samples are not conducive to face recognition. Therefore, it is a popular scheme to produce virtual samples based on the available training samples. In this study, the authors first take the symmetry of human face into account, and propose a novel method to generate virtual samples. Then a representation-based classification method and the score fusion strategy are applied to both original face images and virtual images to perform face recognition. Several sparse representation-based classification algorithms are compared on ORL, FERET and GT databases. Experimental results show that the authors’ method is effective for improving the face recognition.
- Author(s): Feiniu Yuan ; Jinting Shi ; Xue Xia ; Qinghua Huang ; Xuelong Li
- Source: IET Computer Vision, Volume 13, Issue 2, p. 178 –187
- DOI: 10.1049/iet-cvi.2018.5164
- Type: Article
- + Show details - Hide details
-
p.
178
–187
(10)
It is challenging to recognize smoke from visual scenes due to large variations of smoke colors, textures and shapes. To improve robustness, we propose a novel feature extraction method based on similarity and dissimilarity matching measures of Local Binary Patterns (LBP). Given two bit-sequences of an LBP code pair, the similarity and dissimilarity matching measures are defined as the ratios of the 1–1 bitwise matching number to the 0–0 bitwise matching number and the 1–0 number to the 0–1 number, respectively. To capture local code variations, we calculate the measures between LBP codes of a center pixel and its neighbors. Then we compare each measure with its global mean to propose Similarity Matching based Local Binary Patterns (SMLBP) and Dissimilarity Matching based Local Binary Patterns (DMLBP). Since SMLBP and DMLBP extract spatial variations of the 1st order LBP codes, they actually represent the 2nd order variations of pixel values. Furthermore, we adopt different mapping modes and multi-scale neighborhoods to obtain rotation and scale invariances. Finally, we concatenate the histograms of LBP, SMLBP and DMLBP to generate a feature vector containing 1st and 2nd order information. Experiments show that our method obviously outperforms existing methods.
- Author(s): Qian Shi ; Yipeng Zhang ; Xiaoping Liu ; Kefei Zhao
- Source: IET Computer Vision, Volume 13, Issue 2, p. 188 –193
- DOI: 10.1049/iet-cvi.2018.5145
- Type: Article
- + Show details - Hide details
-
p.
188
–193
(6)
This study presents a transfer learning method for addressing the insufficient sample problem in hyperspectral image classification. In order to find common feature representation for both the source domain and target domain, we introduce a regularisation based on Bregman divergence into the objective function of the subspace learning algorithm, which can minimise the Bregman divergence between the distribution of training samples in the source domain and the test samples in the target domain. Hyperspectral image with biased sampling is used to evaluate the effectiveness of the proposed method. The results show that the proposed method can achieve a higher classification accuracy than traditional subspace learning methods under the condition of biased sampling.
- Author(s): Xinyao Tang ; Bo Du ; Jianzhong Huang ; Zengmao Wang ; Lefei Zhang
- Source: IET Computer Vision, Volume 13, Issue 2, p. 194 –205
- DOI: 10.1049/iet-cvi.2017.0524
- Type: Article
- + Show details - Hide details
-
p.
194
–205
(12)
This study presents a novel algorithm which combines active learning (AL) and transfer learning for medical data classification. The main idea of the proposed algorithm is iteratively querying a small number of informative unlabelled target samples, and, at the same time, removing the source samples which do not fit with the posterior probability distributions in the target domain, so as to combine the basic idea of AL with transfer learning. The experimental results obtained in the classification of the datasets from the University of California Irvine (UCI) Machine Learning Repository and The Cancer Imaging Archive (TCIA) confirm the effectiveness of the proposed algorithm.
- Author(s): Xin Jin ; Le Wu ; Xiaodong Li ; Xiaokun Zhang ; Jingying Chi ; Siwei Peng ; Shiming Ge ; Geng Zhao ; Shuying Li
- Source: IET Computer Vision, Volume 13, Issue 2, p. 206 –212
- DOI: 10.1049/iet-cvi.2018.5249
- Type: Article
- + Show details - Hide details
-
p.
206
–212
(7)
In this study, the authors address a challenging problem of aesthetic image classification, which is to label an input image as high- or low-aesthetic quality. We take both the local and global features of images into consideration. A novel deep convolutional neural network named ILGNet is proposed, which combines both the inception modules and a connected layer of both local and global features. The ILGnet is based on GoogLeNet. Thus, it is easy to use a pre-trained GoogLeNet for large-scale image classification problem and fine tune their connected layers on a large-scale database of aesthetic-related images: AVA, i.e. domain adaptation. The experiments reveal that their model achieves the state of the arts in AVA database. Both the training and testing speeds of their model are higher than those of the original GoogLeNet.
- Author(s): Jian Zhang and Chaoyang Zhu
- Source: IET Computer Vision, Volume 13, Issue 2, p. 213 –223
- DOI: 10.1049/iet-cvi.2018.5151
- Type: Article
- + Show details - Hide details
-
p.
213
–223
(11)
Here, the authors propose an end-to-end method based on deep learning to reconstruct three-dimensional (3D) face models from given face images. In the training stage, the authors propose to extract the feature representations from the 3D sample faces and corresponding 2D sample images through the proposed local deep feature alignment (LDFA) algorithm, and estimate an explicit mapping from the 2D features to their 3D counterparts for each local neighbourhood, then the authors learn a feed-forward deep neural network for each neighbourhood whose parameters are initialised with the parameters obtained in the locality-aware learning process and the explicit mapping. In the testing stage, the authors only need to feed a given face image to the deep neural network corresponding to the nearest sample image and receive the outputted 3D face model. Extensive experiments have been conducted on both non-face and face data sets. The authors find that the LDFA algorithm performs better than several popular unsupervised feature extraction algorithms, and the 3D reconstruction results obtained by the proposed method also outperform the comparison methods.
- Author(s): Zhanwen Liu ; Tao Gao ; Fanjie Kong ; Ziheng Jiao ; Aodong Yang ; Shuying Li ; Bo Liu
- Source: IET Computer Vision, Volume 13, Issue 2, p. 224 –232
- DOI: 10.1049/iet-cvi.2018.5163
- Type: Article
- + Show details - Hide details
-
p.
224
–232
(9)
Although promising results have been achieved in the restoration of complex illumination images with the Retinex algorithm, there are still some drawbacks in the processing of Retinex. Considering the noise characteristics of complex illumination images, in this study, we propose a novel restoration algorithm for noisy complex illumination, which combines guided adaptive multi-scale Retinex (GAMSR) and improvement BayesShrink threshold filtering (IBTF) based on double-density dual-tree complex wavelet transform (DDDTCWT) domain. Extensive restoration experiments are conducted on three typical types images and the same image with different noises. On the basis of a series of evaluation indexes, we compare our method to those of state-of-the-art algorithms. The results show that (i) SSIM of the proposed IBTF is superior to traditional Bayes threshold method by 15% as the standard variance is 100. (ii) PSNR of the proposed GAMSR enhances 15% to traditional MSR. (iii) The clarity of final results for restoration speeds up three times than that of original images, and the information entropy is improved slightly too. Therefore, the proposed method can effectively enhance the details, edges and textures of the image under complex illumination and noises.
- Author(s): Xuefeng Jiang ; Lin Zhang ; Junrui Liu ; Shuying Li
- Source: IET Computer Vision, Volume 13, Issue 2, p. 233 –239
- DOI: 10.1049/iet-cvi.2018.5143
- Type: Article
- + Show details - Hide details
-
p.
233
–239
(7)
Hyperspectral imaging makes it possible to obtain object information with fine spectral resolution as well as spatial resolution, which is beneficial to a wide array of applications. However, there is a high correlation among the bands in a hyperspectral image (HSI). Band selection (BS), selecting only some representative bands to describe well the original image, is an appropriate approach to tackle this problem. In this study, the authors propose an efficient greedy-based unsupervised BS method, namely the maximum simplex volume by orthogonal-projection BS method. The main contributions are two-fold: (i) an information-lossless compressed descriptor in the Euclidean sense that can reduce the amount of redundant information in the band analysis and (ii) an orthogonal-projection-based algorithm to find the band points forming the simplex of maximum volume. The experimental results on four real HSIs demonstrate that the proposed method can achieve satisfying pixel classification performances and is computationally fast.
- Author(s): Guiqing He ; Siyuan Xing ; Xingjian He ; Jun Wang ; Jianping Fan
- Source: IET Computer Vision, Volume 13, Issue 2, p. 240 –248
- DOI: 10.1049/iet-cvi.2018.5496
- Type: Article
- + Show details - Hide details
-
p.
240
–248
(9)
The image fusion method based on sparse representation in the single-scale image domain has produced better fusion results than the classic methods based on multi-scale analysis nowadays. However, due to the limited number of dictionary atoms, it is difficult to provide an accurate description for image details in the sparse-representation-based image fusion methods, and it requires a lot of time. A novel dictionary is constructed with non-subsampled contourlet transform and sparse representation by using the proposed simultaneous strategy. Then the novel dictionary could combine the sparsity attribute of the learning dictionary with a multi-scale feature of non-subsampled contourlet transform. Moreover, the simultaneous strategy is combined with this novel dictionary so that sparse coefficients can be represented with the same dictionary atoms and thus they can be compared in a reasonable and accurate way. Finally, the image fusion method along with this novel dictionary is proposed and named non-subsampled contourlet transform (NSCT)–simultaneous sparse representation (SSR). Experimental results show that the proposed fusion method NSCT–SSR, with its more excellent fusion effect and better anti-noise capability, outperforms the existing fusion methods, which are based on both multi-scale domain and sparse representation in the single-scale image domain.
Guest Editorial: Visual Domain Adaptation and Generalisation
Robust multi-view representation for spatial–spectral domain in application of hyperspectral image classification
Dictionary-induced least squares framework for multi-view dimensionality reduction with multi-manifold embeddings
Multi-view learning for benign epilepsy with centrotemporal spikes
Zero-shot multi-label learning via label factorisation
Joint optimisation convex-negative matrix factorisation for multi-modal image collection summarisation based on images and tags
End-to-end visual grounding via region proposal networks and bilinear pooling
Correlation-guided multi-object tracking with correlation feature transfer
Dual-scale weighted structural local sparse appearance model for object tracking
Object detection and tracking under Complex environment using deep learning-based LPM
Real-time face recognition based on pre-identification and multi-scale classification
Virtual samples and sparse representation-based classification algorithm for face recognition
Co-occurrence matching of local binary patterns for improving visual adaption and its application to smoke recognition
Regularised transfer learning for hyperspectral image classification
On combining active and transfer learning for medical data classification
ILGNet: inception modules with connected local and global features for efficient image aesthetic quality classification using domain adaptation
Approach to 3D face reconstruction through local deep feature alignment
Restoration algorithm for noisy complex illumination
Maximum simplex volume: an efficient unsupervised band selection method for hyperspectral image
Image fusion method based on simultaneous sparse representation with non-subsampled contourlet transform
Most viewed content
Most cited content for this Journal
-
Brain tumour classification using two-tier classifier with adaptive segmentation technique
- Author(s): V. Anitha and S. Murugavalli
- Type: Article
-
Driving posture recognition by convolutional neural networks
- Author(s): Chao Yan ; Frans Coenen ; Bailing Zhang
- Type: Article
-
Local directional mask maximum edge patterns for image retrieval and face recognition
- Author(s): Santosh Kumar Vipparthi ; Subrahmanyam Murala ; Anil Balaji Gonde ; Q.M. Jonathan Wu
- Type: Article
-
Fast and accurate algorithm for eye localisation for gaze tracking in low-resolution images
- Author(s): Anjith George and Aurobinda Routray
- Type: Article
-
‘Owl’ and ‘Lizard’: patterns of head pose and eye pose in driver gaze classification
- Author(s): Lex Fridman ; Joonbum Lee ; Bryan Reimer ; Trent Victor
- Type: Article