IET Computer Vision
Volume 11, Issue 8, December 2017
Volumes & issues:
Volume 11, Issue 8
December 2017
-
- Source: IET Computer Vision, Volume 11, Issue 8, p. 621 –622
- DOI: 10.1049/iet-cvi.2017.0496
- Type: Article
- + Show details - Hide details
-
p.
621
–622
(2)
- Author(s): Maryam Koohzadi and Nasrollah Moghadam Charkari
- Source: IET Computer Vision, Volume 11, Issue 8, p. 623 –632
- DOI: 10.1049/iet-cvi.2016.0355
- Type: Article
- + Show details - Hide details
-
p.
623
–632
(10)
A study on one of the most important issues in a human action recognition task, i.e. how to create proper data representations with a high-level abstraction from large dimensional noisy video data, is carried out. Most of the recent successful studies in this area are mainly focused on deep learning. Deep learning methods have gained superiority to other approaches in the field of image recognition. In this survey, the authors first investigate the role of deep learning in both image and video processing and recognition. Owing to the variety and plenty of deep learning methods, the authors discuss them in a comparative form. For this purpose, the authors present an analytical framework to classify and to evaluate these methods based on some important functional measures. Furthermore, a categorisation of the state-of-the-art approaches in deep learning for human action recognition is presented. The authors summarise the significantly related works in each approach and discuss their performance.
- Author(s): Giampaolo Pagnutti ; Ludovico Minto ; Pietro Zanuttigh
- Source: IET Computer Vision, Volume 11, Issue 8, p. 633 –642
- DOI: 10.1049/iet-cvi.2016.0502
- Type: Article
- + Show details - Hide details
-
p.
633
–642
(10)
We present an approach for segmentation and semantic labelling of RGBD data exploiting together geometrical cues and deep learning techniques. An initial over-segmentation is performed using spectral clustering and a set of non-uniform rational B-spline surfaces is fitted on the extracted segments. Then a convolutional neural network (CNN) receives in input colour and geometry data together with surface fitting parameters. The network is made of nine convolutional stages followed by a softmax classifier and produces a vector of descriptors for each sample. In the next step, an iterative merging algorithm recombines the output of the over-segmentation into larger regions matching the various elements of the scene. The couples of adjacent segments with higher similarity according to the CNN features are candidate to be merged and the surface fitting accuracy is used to detect which couples of segments belong to the same surface. Finally, a set of labelled segments is obtained by combining the segmentation output with the descriptors from the CNN. Experimental results show how the proposed approach outperforms state-of-the-art methods and provides an accurate segmentation and labelling.
- Author(s): Xulei Yang ; Zeng Zeng ; Su Yi
- Source: IET Computer Vision, Volume 11, Issue 8, p. 643 –649
- DOI: 10.1049/iet-cvi.2016.0482
- Type: Article
- + Show details - Hide details
-
p.
643
–649
(7)
This work conducts a feasibility study of deep learning approaches for automatic segmentation of left ventricle (LV) cavity from cardiac magnetic resonance (CMR) images. Automatic LV cavity segmentation is a challenging task, partially due to the small size of the object as compared to the large CMR image background, especially at the apex. To cater for small object segmentation, the authors present a localisation-segmentation framework, to first locate the object in the large full image, then segment the object within the small cropped region of interest. The localisation is performed by a deep regression model based on convolutional neural networks, while the segmentation is done by the deep neural networks based on U-Net architecture. They also employ the Dice loss function for the training process of the segmentation models, to investigate its effects on the segmentation performance. The deep learning models are trained and evaluated by using public endocardium-annotated CMR datasets from York University and MICCAI 2009 LV Challenge websites. The average dice metric values of the authors’ proposed framework are 0.91 and 0.93, respectively, on these two databases. These results are promising as compared to the best results achieved by the current state-of-art, which shows the potentials of deep learning approaches for this particular application.
- Author(s): Ali Maina Bukar and Hassan Ugail
- Source: IET Computer Vision, Volume 11, Issue 8, p. 650 –655
- DOI: 10.1049/iet-cvi.2016.0486
- Type: Article
- + Show details - Hide details
-
p.
650
–655
(6)
In recent years, automatic facial age estimation has gained popularity due to its numerous applications. Much work has been done on frontal images and lately, minimal estimation errors have been achieved on most of the benchmark databases. However, in reality, images obtained in unconstrained environments are not always frontal. For instance, when conducting a demographic study or crowd analysis, one may get profile images of the face. To the best of our knowledge, no attempt has been made to estimate ages from the side-view of face images. Here the authors exploit this by using a pretrained deep residual neural network to extract features, and then utilise a sparse partial least-squares regression approach to estimate ages. Despite having less information as compared with frontal images, the results show that the extracted deep features achieve a promising performance.
- Author(s): Saraswathi Duraisamy and Srinivasan Emperumal
- Source: IET Computer Vision, Volume 11, Issue 8, p. 656 –662
- DOI: 10.1049/iet-cvi.2016.0425
- Type: Article
- + Show details - Hide details
-
p.
656
–662
(7)
In this study, a novel deep learning-based framework for classifying the digital mammograms is introduced. The development of this methodology is based on deep learning strategies that model the presence of the tumour tissues with level sets. It is difficult to robustly segment mammogram image due to low contrast between normal and lesion tissues. Therefore, Chan-Vese level set method is used to extract the initial contour of mammograms and deep learning convolutional neural network (DL-CNN) algorithm is used to learn the features of mammary-specific mass and microcalcification clusters. To increase the classification accuracy and reduce the false positives, a well-known fully complex-valued relaxation network classifier is used in the last stage of DL-CNN network. Experimental results using the standard benchmarking breast cancer dataset (MIAS and BCDR) show that the proposed method exhibits significant improvement in performance over the traditional methods. Performance measures such as accuracy, sensitivity, specificity, AUC achieved are 99%, 0.9875, 1.0 and 0.9815, respectively. The proposed framework performs well in classifying the digital mammograms as normal, benign or malignant and its subclasses as well.
Guest Editorial: Deep Learning in Computer Vision
Survey on deep learning methods in human action recognition
Segmentation and semantic labelling of RGBD data with convolutional neural networks and surface fitting
Deep convolutional neural networks for automatic segmentation of left ventricle cavity from cardiac magnetic resonance images
Automatic age estimation from facial profile view
Computer-aided mammogram diagnosis system using deep learning convolutional fully complex-valued relaxation neural network classifier
-
- Author(s): Qian Liu and Chao Wang
- Source: IET Computer Vision, Volume 11, Issue 8, p. 663 –674
- DOI: 10.1049/iet-cvi.2016.0294
- Type: Article
- + Show details - Hide details
-
p.
663
–674
(12)
The key problem of colour face recognition technique is how to take full advantage of the colour information and extract effective discriminating features. To solve this problem, the authors propose a novel non-linear feature extraction approach for colour face recognition, named dual multi-kernel discriminating correlation analysis, which separately maps different colour components of face images into different non-linear kernel spaces, and then implements multi-kernel learning and discriminant analysis with the correlation metric not only within each colour component but also between diverse components. Then, to choose the optimum kernel space for each colour component and select the most suitable colour space for their approach, they design a kernel selection strategy and a colour space selection strategy, respectively. Experimental results in the face recognition grand challenge version 2 and labelled faces in the wilds databases validate the effectiveness of the proposed approach and two strategies.
- Author(s): Zhe Sun ; Zheng-Ping Hu ; Meng Wang ; Shu-Huan Zhao
- Source: IET Computer Vision, Volume 11, Issue 8, p. 675 –682
- DOI: 10.1049/iet-cvi.2016.0505
- Type: Article
- + Show details - Hide details
-
p.
675
–682
(8)
Recently, researchers have proposed different feature descriptors to achieve robust performance for facial expression recognition (FER). However, finding a discriminative feature descriptor remains one of the critical tasks. In this paper, we propose a discriminative feature learning scheme to improve the representation power of expressions. First, we obtain a discriminative feature matrix (DFM) based pixel difference representation. Subsequently, all DFMs corresponding to the training samples are used to construct a discriminative feature dictionary (DFD). Next, DFD is projected on a vertical two-dimensional linear discriminant analysis in direction (V-2DLDA) space to compute between and within-class scatter because V-2DLDA works well with the DFD in matrix representation and achieves good efficiency. Finally, nearest neighbor (NN) classifier is used to determine the labels of the query samples. DFD represents the local feature changes that are robust to the expression, illumination et al. Besides, we exploit V-2DLDA to find an optimal projection matrix since it not only protects the discriminative features but reduces the dimensions. The proposed method achieves satisfying recognition results, reaching accuracy rates as high as 91.87% on CK+ database, 82.24% on KDEF database, and 78.94% on CMU Multi-PIE database in the LOSO scenario, which perform better than other comparison methods.
- Author(s): Elham Shabaninia ; Ahmad Reza Naghsh-Nilchi ; Shohreh Kasaei
- Source: IET Computer Vision, Volume 11, Issue 8, p. 683 –690
- DOI: 10.1049/iet-cvi.2016.0373
- Type: Article
- + Show details - Hide details
-
p.
683
–690
(8)
Although there is an increasing interest in employing the depth data in computer vision applications, the spatial resolution of depth maps is still limited compared with typical visible-light images. A novel method is proposed to synthetically improve the spatial resolution of a single depth image. It integrates the higher-order terms into the Markov random field (MRF) formulation of example-based methods in order to improve the representational power of those methods. The inference is performed by approximately minimising the higher-order multi-label MRF energies. In addition, to improve the efficiency of the inference algorithm, a hierarchical scheme on the number of MRF states is proposed. First, a large number of states are used to obtain an initial labelling by solving the minimisation problem of inference for only the first-order energies. Then, the problem is solved for the higher-order energies in a smaller number of states. Performance comparisons show that proposed method improves the results of first-order approaches that are based on simple four-connected MRF graph structure, both qualitatively and quantitatively.
- Author(s): Hongmin Liu ; Lu Li ; Zhiheng Wang ; Zhanqiang Huo
- Source: IET Computer Vision, Volume 11, Issue 8, p. 691 –701
- DOI: 10.1049/iet-cvi.2016.0299
- Type: Article
- + Show details - Hide details
-
p.
691
–701
(11)
This study investigates the problem of constructing binary descriptor and develops a novel binary descriptor called simple tri-bit binary descriptor (STBD) based on a simple sampling pattern (SSP) and a tri-value binarisation strategy (TBS). First, an SSP is proposed, in which sample points are divided into two groups according to the distance from the pattern centre and smoothed by different circular filters. Then, to make the descriptor adaptive to the matched images, a selection strategy which directly employs detected keypoints as training data is introduced to select 256 point pairs with low correlation from initial pairs. Finally, a modified TBS method is presented to properly refine intensity comparison results. Experiments show that the proposed STBD can perform well and is robust to various transformations, except for scale change.
- Author(s): Meiling Gong ; Jinhui Lan ; Changlin Yang ; Hongtao Wu ; Tao Zhi
- Source: IET Computer Vision, Volume 11, Issue 8, p. 702 –709
- DOI: 10.1049/iet-cvi.2016.0213
- Type: Article
- + Show details - Hide details
-
p.
702
–709
(8)
Image segmentation is an important step in image processing, but contemporary segmentation algorithms have problems such as poor anti-noise performance, over-segmentation, and imprecise results. To solve these problems, the authors proposed an adaptive image segmentation algorithm under the constraint of edge posterior probability. This algorithm first resolves the problem of over-segmentation by improving the watershed algorithm. Then, the algorithm automatically decides whether to adopt the edge threshold segmentation resulting from the watershed algorithm based on the proposed edge posterior probability model. Experiments showed that the proposed algorithm has excellent anti-noise performance, highly precise segmentation result, and are useful in effectively segmenting low-contrast images.
- Author(s): Jinfu Yang ; Ying Wang ; Guanghui Wang ; Mingai Li
- Source: IET Computer Vision, Volume 11, Issue 8, p. 710 –716
- DOI: 10.1049/iet-cvi.2016.0469
- Type: Article
- + Show details - Hide details
-
p.
710
–716
(7)
Salient object detection, as a necessary step of many computer vision applications, has attracted extensive attention in recent years. A novel salient object detection method is proposed based on multi-superpixel-scale contrast. Saliency value of each superpixel is measured with a global score, which is computed using the region's colour contrast and the spatial distances to all other regions in the image. High-level information is also incorporated to improve the performance, and the saliency maps are fused across multiple levels to yield a reliable final result using the modified multi-layer cellular automata. The proposed algorithm is evaluated and compared with five state-of-the-art approaches on three publicly standard datasets. Both quantitative and qualitative experimental results demonstrate the effectiveness and efficiency of the proposed method.
- Author(s): Xiangrong Wang and Jieyu Zhao
- Source: IET Computer Vision, Volume 11, Issue 8, p. 717 –724
- DOI: 10.1049/iet-cvi.2016.0429
- Type: Article
- + Show details - Hide details
-
p.
717
–724
(8)
Markov random fields (MRFs) are prominent in modelling image to handle image processing problems. However, they confront the bottleneck of model selection in further improving the performance. That is difficult to decide how many objects in an image automatically. Motivated by Bayesian non-parametric (BN) models, a layered BN MRF is proposed. The proposed model is hierarchical: the lower level is a random-field like model, while the higher level is a Chinese restaurant process (CRP). The clustering procedure can be formulated briefly as follows. The input data is first clustered with the lower level MRF to form a set of components. Then the higher level CRP is used to merge the components into larger clusters. Furthermore, a split–merge Monte Carlo Markov chain is employed. Quantitative evaluations over BSD500 data set and MSRC data set show the proposed model is comparable to the state-of-the-art BN models and other graphical models in modelling unsupervised distance-dependent problems.
- Author(s): Ho Gi Jung
- Source: IET Computer Vision, Volume 11, Issue 8, p. 725 –732
- DOI: 10.1049/iet-cvi.2016.0317
- Type: Article
- + Show details - Hide details
-
p.
725
–732
(8)
Recently, several methods have been published that demonstrate how to reconstruct an image from a discriminative feature vector. This study explains that previous approaches minimising the histogram-of-oriented-gradient (HOG) feature error in the principal component analysis (PCA) domain of the learning database have a disadvantage in that they cannot reflect the different dynamic range of each PCA dimension, and proposes an improved method to exploit the eigenvalue as the weighting factor of each PCA dimension. Experimental results using pedestrian and vehicle image databases quantitatively show that the proposed method improves the quality of reconstructed images. Additionally, the proposed method is applied to the image reconstruction of the resultant support vectors (SVs) of reduced-set construction which showed the best performance among SV number reduction methods. As the resultant SVs of reduced-set construction are not corresponding to any image of the learning database, it is hard to analyse the problem and performance of the method. By observing the images of the resultant SVs, one potential problem regarding the database used is newly considered and the direction of further study can be established in order to address the problem.
- Author(s): Hongmei Zhu ; Jihao Yin ; Ding Yuan
- Source: IET Computer Vision, Volume 11, Issue 8, p. 733 –743
- DOI: 10.1049/iet-cvi.2016.0446
- Type: Article
- + Show details - Hide details
-
p.
733
–743
(11)
Stereo matching between binocular stereo images is fundamental to many computer vision tasks, such as three-dimensional (3D) reconstruction and robot navigation. Various structures of real 3D scenes lead stereo matching to be an old yet still challenging problem. In this study, the authors proposed a novel adaptive support weights technique which exploits the hierarchical information provided by multilevel segmentation to preserve the robustness to imaging conditions and spatial proximity in cost aggregation. Besides, a generalisable cost refinement strategy is designed to remove the matching ambiguity in large weakly textured regions. The proposed strategy utilises both the fluctuation of the filtered cost volume and the colour information to further improve the matching accuracy. Experimental results of 50 stereo images demonstrate the effectiveness and efficiency of the proposed method. Furthermore, a systematic evaluation is developed to assess the conventional steps in local stereo methods and then reliable suggestions are given to the beginners and researchers outside the stereo matching field.
- Author(s): Sheng Yu ; Yun Cheng ; Li Xie ; Shao-Zi Li
- Source: IET Computer Vision, Volume 11, Issue 8, p. 744 –749
- DOI: 10.1049/iet-cvi.2017.0005
- Type: Article
- + Show details - Hide details
-
p.
744
–749
(6)
Human action recognition is an important and challenging topic in computer vision. Recently, convolutional neural networks (CNNs) have established impressive results for many image recognition tasks. The CNNs usually contain million parameters which prone to overfit when training on small datasets. Therefore, the CNNs do not produce superior performance over traditional methods for action recognition. In this study, the authors design a novel two-stream fully convolutional networks architecture for action recognition which can significantly reduce parameters while keeping performance. To utilise the advantage of spatial-temporal features, a linear weighted fusion method is used to fuse two-stream networks’ feature maps and a video pooling method is adopted to construct the video-level features. At the meantime, the authors also demonstrate that the improved dense trajectories has significant impact for action recognition. The authors’ method can achieve the state-of-the-art performance on two challenging datasets UCF101 (93.0%) and HMDB51 (70.2%).
Within-component and between-component multi-kernel discriminating correlation analysis for colour face recognition
Discriminative feature learning-based pixel difference representation for facial expression recognition
High-order Markov random field for single depth image super-resolution
STBD: a simple tri-bit binary descriptor for point matching
Adaptive image segmentation algorithm under the constraint of edge posterior probability
Salient object detection based on global multi-scale superpixel contrast
Hierarchical non-parametric Markov random field for image segmentation
Analysis of reduced-set construction using image reconstruction from a HOG feature vector
SVCV: segmentation volume combined with cost volume for stereo matching
Fully convolutional networks for action recognition
Most viewed content
Most cited content for this Journal
-
Brain tumour classification using two-tier classifier with adaptive segmentation technique
- Author(s): V. Anitha and S. Murugavalli
- Type: Article
-
Driving posture recognition by convolutional neural networks
- Author(s): Chao Yan ; Frans Coenen ; Bailing Zhang
- Type: Article
-
Local directional mask maximum edge patterns for image retrieval and face recognition
- Author(s): Santosh Kumar Vipparthi ; Subrahmanyam Murala ; Anil Balaji Gonde ; Q.M. Jonathan Wu
- Type: Article
-
Fast and accurate algorithm for eye localisation for gaze tracking in low-resolution images
- Author(s): Anjith George and Aurobinda Routray
- Type: Article
-
‘Owl’ and ‘Lizard’: patterns of head pose and eye pose in driver gaze classification
- Author(s): Lex Fridman ; Joonbum Lee ; Bryan Reimer ; Trent Victor
- Type: Article