IET Computer Vision
Volume 11, Issue 6, September 2017
Volumes & issues:
Volume 11, Issue 6
September 2017
-
- Author(s): Hanling Zhang ; Chenxing Xia ; Xiuju Gao
- Source: IET Computer Vision, Volume 11, Issue 6, p. 379 –388
- DOI: 10.1049/iet-cvi.2016.0492
- Type: Article
- + Show details - Hide details
-
p.
379
–388
(10)
In this study, the authors propose a distinctive bottom-up visual saliency detection algorithm based on a new background prior and a new reinforcement. Inspired by genetic algorithm, the final map is obtained with three steps. First of all, the authors construct a background-based saliency map by manifold ranking via superior image corners selected by convex-hull as background prior, which is different from most of the existing background prior-based methods treated all image boundaries as background. Then, a better result is obtained by ranking the relevance of the image elements with foreground seeds extracted from the preliminary saliency map. Furthermore, a novel optimisation framework is introduced with the intention of refining the map, which integrates an energy function with a guided filter. Experimental results on three public datasets indicate that the proposed method performs favourably against the state-of-the-art algorithms.
- Author(s): Lin Huafeng ; Li Jing ; Zhou Peiyun ; Liang Dachuan ; Li Dongmin
- Source: IET Computer Vision, Volume 11, Issue 6, p. 389 –397
- DOI: 10.1049/iet-cvi.2016.0169
- Type: Article
- + Show details - Hide details
-
p.
389
–397
(9)
Since most existing saliency detection models are not suitable for the condition that the salient objects are near at the image border, the authors propose a saliency detection approach based on adaptive background template (SCB) despite of the position of the salient objects. First, a selection strategy is presented to establish the adaptive background template by removing the potential saliency superpixels from the image border regions, and the initial saliency map is obtained. Second, a propagation mechanism based on K-means algorithm is designed for maintaining the neighbourhood coherence of the above saliency map. Finally, a new spatial prior is presented to integrate the saliency detection results by aggregating two complementary measures such as image centre preference and the background template exclusion. Comprehensive evaluations on six benchmark datasets indicate that the authors’ method outperforms other state-of-the-art approaches. In addition, a new dataset containing 300 challenging images is constructed for evaluating the performance of various salient object detection methods.
- Author(s): Mert Kilickaya ; Burak Kerim Akkus ; Ruket Cakici ; Aykut Erdem ; Erkut Erdem ; Nazli Ikizler-Cinbis
- Source: IET Computer Vision, Volume 11, Issue 6, p. 398 –406
- DOI: 10.1049/iet-cvi.2016.0286
- Type: Article
- + Show details - Hide details
-
p.
398
–406
(9)
In the past few years, automatically generating descriptions for images has attracted a lot of attention in computer vision and natural language processing research. Among the existing approaches, data-driven methods have been proven to be highly effective. These methods compare the given image against a large set of training images to determine a set of relevant images, then generate a description using the associated captions. In this study, the authors propose to integrate an object-based semantic image representation into a deep features-based retrieval framework to select the relevant images. Moreover, they present a novel phrase selection paradigm and a sentence generation model which depends on a joint analysis of salient regions in the input and retrieved images within a clustering framework. The authors demonstrate the effectiveness of their proposed approach on Flickr8K and Flickr30K benchmark datasets and show that their model gives highly competitive results compared with the state-of-the-art models.
- Author(s): Chi-Yi Tsai ; Hsien-Chen Liao ; Kuang-Jui Hsu
- Source: IET Computer Vision, Volume 11, Issue 6, p. 407 –414
- DOI: 10.1049/iet-cvi.2016.0082
- Type: Article
- + Show details - Hide details
-
p.
407
–414
(8)
Traffic sign recognition is a very important function in automatic driving assistance systems (ADAS). This study addresses the design and implementation of a vision-based ADAS based on an image-based speed-limit sign (SLS) recognition algorithm, which can automatically detect and recognise SLS on the road in real-time. To improve the recognition rate of SLS having different orientations and scales in the image, this study also presents a new sign content description algorithm, which describes the detected road sign using centroid-to-contour (CtC) distances of the extracted sign content. The proposed CtC descriptor is robust to translation, rotation and scale changes of the SLS in the image. This advantage improves the recognition accuracy of a support vector machine classifier trained using a large database of traffic signs. The proposed SLS recognition method had been implemented on two different embedded platforms, each of them equipped with an ARM-based Quad-Core CPU running Android 4.4 operating system. Experimental results validate that the proposed method not only provides a high recognition rate, but also achieves real-time performance up to 30 frames per second for processing 1280 × 720 video streams running on a commercial ARM-based smartphone.
- Author(s): Ithaya Rani Panner Selvam and Muneeswaran Karuppiah
- Source: IET Computer Vision, Volume 11, Issue 6, p. 415 –425
- DOI: 10.1049/iet-cvi.2016.0087
- Type: Article
- + Show details - Hide details
-
p.
415
–425
(11)
Gender recognition is a challenging and innovative research topic in the present sophisticated world of visual technology. This study proposes a system which can identify the gender based on face image. For finding the location of the face region, each input image is divided into overlapping blocks and Gabor features are extracted with different scale and orientations. Generate the enhanced feature, concatenate mean, standard deviation and skewness of Gabor features which are obtained from each block. For detecting face region, this feature is passed to ensemble classifier. To recognise the gender, reinforced local binary patterns are used to extract the facial local features. Adaboost algorithm is used to select and classify the discriminative features such as male or female. The authors’ experimental results on Labeled Faces in the Wild (LFW), FERET and Gallagher databases for face detection using Gabor features achieve 98, 98.5 and 96.5% accuracy, respectively. Moreover, the reinforced local binary patterns achieve the accuracy for gender classification as 97.08, 98.5 and 94.21% on the LFW, FERET and Gallagher databases, respectively. Both are achieving improved performance compared with other standard methodologies described in the literature.
- Author(s): Manuel I. López-Quintero ; Manuel J. Marín-Jiménez ; Rafael Muñoz-Salinas ; Rafael Medina-Carnicer
- Source: IET Computer Vision, Volume 11, Issue 6, p. 426 –433
- DOI: 10.1049/iet-cvi.2016.0249
- Type: Article
- + Show details - Hide details
-
p.
426
–433
(8)
This study targets 2D articulated human pose estimation (i.e. localisation of body limbs) in stereo videos. Although in recent years depth-based devices (e.g. Microsoft Kinect) have gained popularity, as they perform very well in controlled indoor environments (e.g. living rooms, operating theatres or gyms), they suffer clear problems in outdoor scenarios and, therefore, human pose estimation is still an interesting unsolved problem. The authors propose here a novel approach that is able to localise upper-body keypoints (i.e. shoulders, elbows, and wrists) in temporal sequences of stereo image pairs. The authors’ method starts by locating and segmenting people in the image pairs by using disparity and appearance information. Then, a set of candidate body poses is computed for each view independently. Finally, temporal and stereo consistency is applied to estimate a final 2D pose. The authors’ validate their model on three challenging datasets: ‘stereo human pose estimation dataset’, ‘poses in the wild’ and ‘INRIA 3DMovie’. The experimental results show that the authors’ model not only establishes new state-of-the-art results on stereo sequences, but also brings improvements in monocular sequences.
- Author(s): Murtaza Aslam ; Fozia Rajbdad ; Shahid Khattak ; Shoaib Azmat
- Source: IET Computer Vision, Volume 11, Issue 6, p. 434 –447
- DOI: 10.1049/iet-cvi.2016.0406
- Type: Article
- + Show details - Hide details
-
p.
434
–447
(14)
Anthropometric dimensions, such as lengths, heights, breadths, circumferences and their ratios are highly significant in healthcare, security, sports, clothing, tools and equipment industry. In this study, an automatic and precise method for anthropometric dimensions of human body using two-dimensional images is proposed. The dimensions are obtained by using fiducial points that are detected from frontal and lateral views of body silhouettes. Primary anthropometric dimensions, which include heights, breadths, depths and lengths, are obtained by calculating the difference between two relevant fiducial points. The secondary dimensions: ratios are obtained directly from primary dimensions, and circumference dimensions are estimated precisely using ellipsoid model. A total of 75, i.e. 51 primary and 24 secondary dimensions are obtained, which are three times the number acquired by the state-of-the-art method. The accuracy of acquired dimensions is verified by comparing it with the manual measurements by using the standard parameter of maximum allowable error. It is found that mean absolute difference of all the dimensions, obtained by the proposed method, lie within the limits of maximum allowable error. More importantly, the mean absolute difference for the majority of dimensions (20 out of 24) is significantly less for proposed method as compared with the best method in existing literature.
- Author(s): Marco Marcon ; Augusto Sarti ; Stefano Tubaro
- Source: IET Computer Vision, Volume 11, Issue 6, p. 448 –454
- DOI: 10.1049/iet-cvi.2016.0193
- Type: Article
- + Show details - Hide details
-
p.
448
–454
(7)
A multi-camera rig calibration algorithm based on a double sided planar target is proposed. Due to their inherently simple realisation, low cost and accuracy, planar calibration targets came out as one of the most largely adopted calibration tools both for intrinsic and extrinsic camera parameters. However, concerning the estimation of extrinsic parameters, one of the major drawbacks of these targets is their requirement for distinct target visibility from both cameras. This prevents many configurations from being adopted where, e.g. two cameras are facing each other. An inexpensive solution could be based on printing/pasting a planar pattern on both target sides, however, the relative misalignment between the patterns on the two sides and the target thickness could be unknown. The authors propose a solution where double-sided target displacement error is estimated together with the extrinsic parameters allowing the reuse of all the available planar calibration tools in less constrained configurations. To assess their approach the authors tested the system in two scenarios, one using two professional 4K cameras and one using two smartphones.
- Author(s): Husheng Dong ; Shengrong Gong ; Chunping Liu ; Yi Ji ; Shan Zhong
- Source: IET Computer Vision, Volume 11, Issue 6, p. 455 –462
- DOI: 10.1049/iet-cvi.2016.0265
- Type: Article
- + Show details - Hide details
-
p.
455
–462
(8)
Distance metric learning has achieved great success in person re-identification. Most existing methods that learn metrics from pairwise constraints suffer the problem of imbalanced data. In this study, the authors present a large margin relative distance learning (LMRDL) method which learns the metric from triplet constraints, so that the problem of imbalanced sample pairs can be bypassed. Different from existing triplet-based methods, LMRDL employs an improved triplet loss that enforces penalisation on the triplets with minimal inter-class distance, and this leads to a more stringent constraint to guide the learning. To suppress the large variations of pedestrian's appearance in different camera views, the authors propose to learn the metric over the intra-class subspace. The proposed method is formulated as a logistic metric learning problem with positive semi-definite constraint, and the authors derive an efficient optimisation scheme to solve it based on the accelerated proximal gradient approach. Experimental results show that the proposed method achieves state-of-the-art performance on three challenging datasets (VIPeR, PRID450S, and GRID).
- Author(s): Zuofeng Zhong ; Yong Xu ; Zuoyong Li ; Yinnan Zhao
- Source: IET Computer Vision, Volume 11, Issue 6, p. 463 –470
- DOI: 10.1049/iet-cvi.2016.0426
- Type: Article
- + Show details - Hide details
-
p.
463
–470
(8)
Robustness is an important factor for background modelling on various scenarios. Current pixel-based adaptive segmentation method cannot effectively tackle diverse objects simultaneously. To address this problem, in this study, a background modelling method using discriminative motion representation is proposed. Instead of simple usage of intensity to construct the background model, the proposed method extracts a new local descriptor which uses a weighted combination of differential excitations for each pixel to enhance the discriminability of pixels. On the basis of this background model, different categories of objects can be quickly identified by a simple but effective classification rule and accurately be represented in background model by a smart selection of updating strategies. Therefore, the authors’ background modelling method can generate complete representation for static objects and decrease false detection caused by dynamic background or illumination variations. Extensive experiments have been conducted to demonstrate that the proposed method obtains more advantages of foreground detection than the state-of-the-art methods. In addition, the proposed method provides a computational efficient algorithm for foreground detection tasks.
- Author(s): Yan Zhang and Hua Peng
- Source: IET Computer Vision, Volume 11, Issue 6, p. 471 –478
- DOI: 10.1049/iet-cvi.2016.0322
- Type: Article
- + Show details - Hide details
-
p.
471
–478
(8)
One sample per person (OSPP) face recognition is a challenging problem in face recognition community. Lack of samples is the main reason for the failure of most algorithms in OSPP. In this study, the authors propose a new algorithm to generalise intra-class variations of multi-sample subjects to single-sample subjects by deep autoencoder and reconstruct new samples. In the proposed algorithm, a generalised deep autoencoder is first trained with all images in the gallery, then a class-specific deep autoencoder (CDA) is fine-tuned for each single-sample subject with its single sample. Samples of the multi-sample subject, which is most like the single-sample subject, are input to the corresponding CDA to reconstruct new samples. For classification, minimum L2 distance, principle component analysis, sparse represented-based classifier and softmax regression are used. Experiments on the Extended Yale Face Database B, AR database and CMU PIE database are provided to show the validity of the proposed algorithm.
- Author(s): Jiazhong Chen ; Jie Chen ; Hua Cao ; Rong Li ; Tao Xia ; Hefei Ling ; Yang Chen
- Source: IET Computer Vision, Volume 11, Issue 6, p. 479 –487
- DOI: 10.1049/iet-cvi.2016.0453
- Type: Article
- + Show details - Hide details
-
p.
479
–487
(9)
In existing local and global consistency (LGC) framework, the cost functions related to classifying functions adopt the sum of each row of weight matrix as an important factor. Some of these classifying functions are successfully applied to saliency detection. From the point of saliency detection, this factor is inversely proportional to the colour contrast between image regions and their surroundings. However, an image region that holds a big colour contrast against it surroundings does not denote it must be a salient region. Therefore a suitable variant of LGC is introduced by removing this factor in cost function, and a suitable classifying function (SCF) is decided. Then a saliency detection method that utilises the SCF, content-based initial label assignment scheme, and appearance-based label assignment scheme is presented. Via updating the content-based initial labels and appearance-based labels by the SCF, a coarse saliency map and several intermediate saliency maps are obtained. Furthermore, to enhance the detection accuracy, a novel optimisation function is presented to fuse the intermediate saliency maps that have a high detection performance for final saliency generation. Numerous experimental results demonstrate that the proposed method achieves competitive performance against some recent state-of-the-art algorithms for saliency detection.
- Author(s): Muhammad Shehzad Hanif ; Shafiq Ahmad ; Khurram Khurshid
- Source: IET Computer Vision, Volume 11, Issue 6, p. 488 –496
- DOI: 10.1049/iet-cvi.2016.0487
- Type: Article
- + Show details - Hide details
-
p.
488
–496
(9)
In this study, the authors propose two kinds of improvements to a baseline tracker that employs the tracking-by-detection framework. First, they explore different feature spaces by employing features commonly used in object detection to improve the performance of detector in feature space. Second, they propose a robust scale estimation algorithm that estimates the size of the object in the current frame. Their experimental results on the challenging online tracking benchmark-13 dataset show that reduced dimensionality histogram of oriented gradients boosts the performance of the tracker. The proposed scale estimation algorithm provides a significant gain and reduces the failure of the tracker in challenging scenarios. The improved tracker is compared with 13 state-of-the-art trackers. The quantitative and qualitative results show that the performance of the tracker is comparable with the state of the art against initialisation errors, variations in illumination, scale and motion, out-of-plane and in-plane rotations, deformations and low resolution.
- Author(s): Fenglei Wang ; Qiang Guo ; Jun Lei ; Jun Zhang
- Source: IET Computer Vision, Volume 11, Issue 6, p. 497 –504
- DOI: 10.1049/iet-cvi.2016.0417
- Type: Article
- + Show details - Hide details
-
p.
497
–504
(8)
Text recognition in natural scene remains a challenging problem due to the highly variable appearance in unconstrained condition. The authors develop a system that directly transcribes scene text images to text without character segmentation. They formulate the problem as sequence labelling. They build a convolutional recurrent neural network (RNN) by using deep convolutional neural networks (CNN) for modelling text appearance and RNNs for sequence dynamics. The two models are complementary in modelling capabilities and so integrated together to form the segmentation free system. They train a Gaussian mixture model–hidden Markov model to supervise the training of the CNN model. The system is data driven and needs no hand labelled training data. Their method has several appealing properties: (i) It can recognise arbitrary length text images. (ii) The recognition process does not involve sophisticated character segmentation. (iii) It is trained on scene text images with only word-level transcriptions. (iv) It can recognise both the lexicon-based or lexicon-free text. The proposed system achieves competitive performance comparison with the state of the art on several public scene text datasets, including both lexicon-based and non-lexicon ones.
Robust saliency detection via corner information and an energy function
Saliency detection using adaptive background template
Data-driven image captioning via salient region discovery
Real-time embedded implementation of robust speed-limit sign recognition using a novel centroid-to-contour description method
Gender recognition based on face image using reinforced local binary patterns
Mixing body-parts model for 2D human pose estimation in stereo videos
Automatic measurement of anthropometric dimensions using frontal and lateral silhouettes
Multicamera rig calibration by double-sided thick checkerboard
Large margin relative distance learning for person re-identification
Background modelling using discriminative motion representation
Sample reconstruction with deep autoencoder for one sample per person face recognition
Saliency detection using suitable variant of local and global consistency
On the improvement of foreground–background model-based object tracker
Convolutional recurrent neural networks with hidden Markov model bootstrap for scene text recognition
-
- Source: IET Computer Vision, Volume 11, Issue 6, page: 505 –505
- DOI: 10.1049/iet-cvi.2017.0169
- Type: Article
- + Show details - Hide details
-
p.
505
(1)
Erratum: Enhanced X-ray image segmentation method using prior shape
Most viewed content
Most cited content for this Journal
-
Brain tumour classification using two-tier classifier with adaptive segmentation technique
- Author(s): V. Anitha and S. Murugavalli
- Type: Article
-
Driving posture recognition by convolutional neural networks
- Author(s): Chao Yan ; Frans Coenen ; Bailing Zhang
- Type: Article
-
Local directional mask maximum edge patterns for image retrieval and face recognition
- Author(s): Santosh Kumar Vipparthi ; Subrahmanyam Murala ; Anil Balaji Gonde ; Q.M. Jonathan Wu
- Type: Article
-
Fast and accurate algorithm for eye localisation for gaze tracking in low-resolution images
- Author(s): Anjith George and Aurobinda Routray
- Type: Article
-
‘Owl’ and ‘Lizard’: patterns of head pose and eye pose in driver gaze classification
- Author(s): Lex Fridman ; Joonbum Lee ; Bryan Reimer ; Trent Victor
- Type: Article