IET Computer Vision
Volume 9, Issue 4, August 2015
Volumes & issues:
Volume 9, Issue 4
August 2015
-
- Author(s): Tianming Zhan ; Yongzhao Zhan ; Zhe Liu ; Liang Xiao ; Zhihui Wei
- Source: IET Computer Vision, Volume 9, Issue 4, p. 447 –455
- DOI: 10.1049/iet-cvi.2014.0121
- Type: Article
- + Show details - Hide details
-
p.
447
–455
(9)
The authors propose a fast and effective solution for automatic segmentation of white matter lesions by using T1 and fluid-attenuated inversion recovery (FLAIR) image modalities with no need for manual segmentation and atlas registration. Initially, a brain tissue segmentation method is used to segment the T1 image into cerebrospinal fluid (CSF), grey matter and white matter. Based on the obtained tissue segmentation results, the region of interest (ROI) of the FLAIR image is created by subtracting the CSF from the FLAIR image. Subsequently, the authors calculate the z-score of the intensities in the ROI and define a threshold to perform a preliminary identification of abnormalities from normal tissues. The abnormalities obtained at this stage are used as the prior knowledge for the modified level-set technique. The proposed level set method here is applied based on local Gaussian distribution to precisely detect the boundaries of the white matter lesions in the ROI. The level set method based on local Gaussian distribution fitting energy is robust to the intensity inhomogeneity of MR data and therefore capable of precisely extracting the boundaries of white matter lesions. Experimental analysis and quantitative comparisons with the peak-seeking and state-of-the-art white matter lesion segmentation (WMLS) techniques demonstrate that the algorithm is a stable and effective approach which significantly outperforms other trusted solutions for white matter lesion segmentation.
- Author(s): Jingzhou Huang
- Source: IET Computer Vision, Volume 9, Issue 4, p. 456 –466
- DOI: 10.1049/iet-cvi.2014.0166
- Type: Article
- + Show details - Hide details
-
p.
456
–466
(11)
The authors propose a new stereo matching algorithm based on an iterative optimisation framework including bi-cubic B-spline surface fitting and accelerated region belief propagation (BP). They first compute the initial cost and disparity map by the adaptive support-weight approach and then launch the iterative process in which the disparity space image is refined via the bi-cubic B-spline fitting and optimised via the accelerated region BP. Two innovations are contained in the algorithm: (i) disparity space image refinement based on segmented bi-cubic B-spline surface fitting; and (ii) an accelerated region message passing approach for BP. The algorithm is verified on the Middlebury benchmark and experimental results show the algorithm is effective and achieves the state-of-the-art accuracy.
- Author(s): Qian Liu
- Source: IET Computer Vision, Volume 9, Issue 4, p. 467 –475
- DOI: 10.1049/iet-cvi.2013.0302
- Type: Article
- + Show details - Hide details
-
p.
467
–475
(9)
How to effectively utilise the colour image information and extract useful features is the key to colour face recognition. In this study, the authors propose a novel colour face recognition approach named colour-feature dual discriminating correlation analysis, which incorporates correlation metric into the discriminant analysis technique, and realises colour-feature discriminating correlation analysis not only within each colour component but also between different components. The public face recognition grand challenge version 2 database is employed as the test data. Experimental results illustrate that the proposed approach outperforms several representative colour face recognition methods.
- Author(s): Yong Luo and Ye Peng Guan
- Source: IET Computer Vision, Volume 9, Issue 4, p. 476 –488
- DOI: 10.1049/iet-cvi.2014.0261
- Type: Article
- + Show details - Hide details
-
p.
476
–488
(13)
It is important to efficiently segment motion objects from video in computer vision applications. A novel foreground segmentation approach has been developed based on structural similarity background modelling, which responds quickly to sudden illumination changes and dynamic background. Both structural similarity map and environmental variation parameters are taken as a dynamic feedback controller to update the background. A multi-modal features fusion strategy has been proposed to segment foregrounds in a dynamic cluttered scene without any hypothesis for the scenario content in advance. Experiments for videos with some challenging content have been performed. Comparative study with state-of-the-art methods has indicated the superior performance of the proposed method.
- Author(s): Yi Song ; Shuxiao Li ; Chengfei Zhu ; Sheng Jiang ; Hongxing Chang
- Source: IET Computer Vision, Volume 9, Issue 4, p. 489 –499
- DOI: 10.1049/iet-cvi.2014.0150
- Type: Article
- + Show details - Hide details
-
p.
489
–499
(11)
The mean shift algorithm has been introduced successfully into the field of computer vision to be an efficient approach for visual tracking but the tracker has been awkward in handling the scale change of the object. This study addresses the scale estimation problem of the mean shift tracker, and proposes a novel method which is based on invariant foreground occupation ratio to solve this problem. The foreground occupation ratio is defined as the proportion of the foreground pixels in an image region. By taking an analysis of the foreground occupation ratio, the authors obtain its three simple properties. With its property of scale invariance, an iterative approximation approach is employed to estimate the scale of the foreground in the current image. The scale value is modified by a weighting function, and it is adjusted along the two axes with respect to the width and the height of the target. The scale estimation algorithm is then employed in the mean shift tracker to obtain the ability of scale adaptation for tracking. Experimental results show that, using the authors method for object scale estimation, the mean shift tracker performs well in tracking the target efficiently when its scale continuously changes.
- Author(s): Gang Zhou ; Yuehu Liu ; Liang Xu ; Zhenhong Jia
- Source: IET Computer Vision, Volume 9, Issue 4, p. 500 –510
- DOI: 10.1049/iet-cvi.2014.0297
- Type: Article
- + Show details - Hide details
-
p.
500
–510
(11)
As an important step in text-based information extraction systems, scene text detection has become a popular subject of research in recent years. In this study, the authors present a novel approach to robustly detect texts which are variable in scales, colours, fonts, languages and orientations in scene images. To segment candidate text connected components (CCs) from images, both local contrast and colour consistency are considered in superpixel level. To filter out the non-text CCs, a hierarchical model is designed. This hierarchical model groups the CCs into three cascaded stages, and is equipped with a well-designed classifier in each stage. Experimental results on the public ICDAR 2005 dataset and the MSRA-TD500 dataset show that their approach obtains better performance than other state-of-the-art methods.
- Author(s): Marwa Ismail ; Salwa Elshzaly ; Aly Farag ; Chuck Sites ; Robert Curtin ; Robert Falk ; Albert Seow ; Gerald Dryden
- Source: IET Computer Vision, Volume 9, Issue 4, p. 511 –521
- DOI: 10.1049/iet-cvi.2014.0177
- Type: Article
- + Show details - Hide details
-
p.
511
–521
(11)
This study revisits a visualisation technique for virtual colonoscopy, known as virtual fly-over (FO). The method views the entire colon anatomy from above the centreline. It assigns two cameras located on opposite sides of the centreline, each of which is responsible for viewing one half of the colon. This approach has several advantages over related colon visualisation methods with regards to visibility coverage and polyp detection rate. However, the traditional FO implementation created a few drawbacks that hinder complete visualisation. For example, it could overlook polyps located on the line between the two halves, and it has no data-specific initialisation for cutting planes. The authors enhance the FO method in a number of respects in this study: resolve the cutting issue and improve the virtual camera setup in order to better visualise both the colon surface's hidden structures and polyps that are difficult to locate. Quantitative validation of the revamped FO on 30 actual clinical datasets with complicated shapes and large volumes demonstrated that the average surface visualisation rate is equal to 99.5 ± 0.2%. Also, true and synthetic polyps with various shapes and sizes were used to clinically validate the proposed method. Detection rate is up to 100% on the tested sets.
- Author(s): Salma Elloumi ; Serhan Cosar ; Guido Pusiol ; Francois Bremond ; Monique Thonnat
- Source: IET Computer Vision, Volume 9, Issue 4, p. 522 –530
- DOI: 10.1049/iet-cvi.2014.0311
- Type: Article
- + Show details - Hide details
-
p.
522
–530
(9)
In this study, the authors propose a complete framework based on a hierarchical activity model to understand and recognise activities of daily living in unstructured scenes. At each particular time of a long-time video, the framework extracts a set of space-time trajectory features describing the global position of an observed person and the motion of his/her body parts. Human motion information is gathered in a new feature that the authors call perceptual feature chunks (PFCs). The set of PFCs is used to learn, in an unsupervised way, particular regions of the scene (topology) where the important activities occur. Using topologies and PFCs, the video is broken into a set of small events (‘primitive events’) that have a semantic meaning. The sequences of ‘primitive events’ and topologies are used to construct hierarchical models for activities. The proposed approach has been tested with the medical field application to monitor patients suffering from Alzheimer's and dementia. The authors have compared their approach to their previous study and a rule-based approach. Experimental results show that the framework achieves better performance than existing works and has the potential to be used as a monitoring tool in medical field applications.
- Author(s): Xue Chen ; Chunheng Wang ; Baihua Xiao ; Xinyuan Cai
- Source: IET Computer Vision, Volume 9, Issue 4, p. 531 –540
- DOI: 10.1049/iet-cvi.2014.0092
- Type: Article
- + Show details - Hide details
-
p.
531
–540
(10)
View variation is a major challenge in face recognition. In this study, the authors propose a novel cross-view face recognition method by seeking potential intermediate domains between the source and target views to model the connection of varying-views faces. Specifically, each intermediate domain is associated with a dictionary subspace. Learning proceeds in two phases. First, the authors discriminatively train a sub-dictionary for each subclass of data, which then compose a structured dictionary of powerful reconstructive and discriminative capability on the source data. Secondly, the authors gradually adapt the source domain dictionary to the target domain by incrementally reducing the reconstruction error on the target data, which forms a smooth transition path connecting the source and target domains. Instead of updating the structured dictionary integrally, the authors develop a refined sub-dictionary-based updating algorithm, which makes the intermediate dictionaries fit on the target data better and faster. Finally, the authors apply invariant sparse codes across the source, intermediate and target domains to render domain-shared representations, where the sample differences caused by view changes are reduced. Experiments on the CMU-PIE and Multi-PIE dataset demonstrate the effectiveness of the proposed method.
- Author(s): Eslam Mostafa ; Asem M. Ali ; Aly A. Farag
- Source: IET Computer Vision, Volume 9, Issue 4, p. 541 –548
- DOI: 10.1049/iet-cvi.2014.0011
- Type: Article
- + Show details - Hide details
-
p.
541
–548
(8)
In this study, the authors learn a similarity measure that discriminates between inter-class and intra-class samples based on a statistical inference perspective. A non-linear combination of Mahalanobis is proposed to reflect the properties of a likelihood ratio test. Since an object's appearance is influenced by the identity of the object and variations in the capturing process, the authors represent the feature vector, which is the difference between two samples in the differences space, as a sample that is drawn from a mixture of many distributions. This mixture consists of the identities distribution and other distributions of the variations in the capturing process, in case of dissimilar samples. However, in the case of similar samples, the mixture consists of the variations in the capturing process distributions only. Using this representation, the proposed similarity measure accurately discriminates between inter-class and intra-class samples. To highlight the good performance of the proposed similarity measure, it is tested on different computer vision applications: face verification and person re-identification. To illustrate how the proposed learning method can easily be used on large scale datasets, experiments are conducted on different challenging datasets: labelled faces in the wild (LFW), public figures face database, ETHZ and VIPeR. Moreover, in these experiments, the authors evaluate different stages, for example, features detector, descriptor type and descriptor dimension, which constitute the face verification pipeline. The experimental results confirm that the learning method outperforms the state-of-the-art.
- Author(s): Weili Ding and Yong Li
- Source: IET Computer Vision, Volume 9, Issue 4, p. 549 –558
- DOI: 10.1049/iet-cvi.2014.0187
- Type: Article
- + Show details - Hide details
-
p.
549
–558
(10)
Detecting the vanishing point in a road image is important for robot navigation, intelligent transportation and other fields. This study proposes a new vanishing point detection method that uses the vertical information in complex urban road and street environments. First, all straight lines in the road image are detected based on curvature scale space and principal component analysis methods. Second, a new road region extraction method is proposed to support vanishing point detection. In this method, the image is decomposed into approximate sky, vertical and road regions using the estimation envelopes of the vertical lines. Third, the straight lines in the road region are extracted using a path perspective triangle and line length limits. The straight lines are then categorised into groups using proposed grouping strategies. Finally, vanishing point candidates are calculated from paired lines extracted from different groups and from the same group, and the final vanishing point is then estimated for the urban road scene using the mean shift clustering method. The experimental results show that the proposed algorithm can estimate the vanishing point accurately and efficiently in complex urban road environments, despite interference from vehicles and pedestrians and on curved and unstructured roads.
- Author(s): Tahiyah Nou-Shene ; Vikramkumar Pudi ; K. Sridharan ; Vineetha Thomas ; J. Arthi
- Source: IET Computer Vision, Volume 9, Issue 4, p. 559 –569
- DOI: 10.1049/iet-cvi.2014.0120
- Type: Article
- + Show details - Hide details
-
p.
559
–569
(11)
Autonomous vehicles engaged in terrain exploration are typically equipped with a camera. The camera is subjected to vibration as the vehicle moves so that the videos captured require stabilisation to facilitate accurate interpretation by remote operators. Dedicated architectures for video stabilisation that offer high performance while consuming low area and power are desirable for this application. This study presents a pipelined very large-scale integration architecture. It is based on exploiting the separability property of the two-dimensional (2-D) Sobel matrix and the 2-D Gaussian filtering matrix to obtain an efficient corner point detection architecture. It also employs the coordinate rotation digital computer architecture for global motion vector calculation. The proposed architecture has been coded in Verilog and synthesised for a field programmable gate array (FPGA), which offers massive parallelism at fairly low power. The proposed architecture is shown to be highly area efficient. An FPGA-based autonomous vehicle has been fabricated, and experiments with a camera mounted on the vehicle are presented and analysed.
- Author(s): Tomislav Pribanic ; Marko Lelas ; Igor Krois
- Source: IET Computer Vision, Volume 9, Issue 4, p. 570 –575
- DOI: 10.1049/iet-cvi.2014.0075
- Type: Article
- + Show details - Hide details
-
p.
570
–575
(6)
Analysing two or more video sequences of dynamic scenes typically requires time synchronisation between sequences, where this alignment is not always possible using hardware. A particular method will most likely process the entire, frequently lengthy, imaged material, requiring additional processing which normally serves for synchronisation only. Software-based synchronisation methods impose, in basically all cases, certain assumptions about an imaged three-dimensional (3D) scene and are suited for the already imaged video material in the past. The authors argue that there are applications where the unsynchronised video sequences have not yet been taken. The time-efficient solution uses a pendulum consisting of a small ball, attached to a 50 cm string and suspended from a pivot so that it can swing freely. The authors estimate the time instant when the ball swings through the equilibrium position. The difference in these times for two cameras yields a subframe time difference between cameras. The proposed method yields subframe differences, statistically no different from ground truth data. 3D reconstruction results for synchronised data clearly outperform those which are unsynchronised. The proposed method relaxes any restrictions and assumptions about the 3D scene that will be imaged later on, yet it allows accurate subframe synchronisation in less than a second.
- Author(s): Markus Ylimäki ; Juho Kannala ; Jukka Holappa ; Sami S. Brandt ; Janne Heikkilä
- Source: IET Computer Vision, Volume 9, Issue 4, p. 576 –587
- DOI: 10.1049/iet-cvi.2014.0281
- Type: Article
- + Show details - Hide details
-
p.
576
–587
(12)
In this study, the authors propose a multi-view stereo reconstruction method which creates a three-dimensional point cloud of a scene from multiple calibrated images captured from different viewpoints. The method is based on a prioritised match expansion technique, which starts from a sparse set of seed points, and iteratively expands them into neighbouring areas by using multiple expansion stages. Each seed point represents a surface patch and has a position and a surface normal vector. The location and surface normal of the seeds are optimised using a homography-based local image alignment. The propagation of seeds is performed in a prioritised order in which the most promising seeds are expanded first and removed from the list of seeds. The first expansion stage proceeds until the list of seeds is empty. In the following expansion stages, the current reconstruction may be further expanded by finding new seeds near the boundaries of the current reconstruction. The prioritised expansion strategy allows efficient generation of accurate point clouds and their experiments show its benefits compared with non-prioritised expansion. In addition, a comparison to the widely used patch-based multi-view stereo software shows that their method is significantly faster and produces more accurate and complete reconstructions.
- Author(s): Juha Hirvonen and Pasi Kallio
- Source: IET Computer Vision, Volume 9, Issue 4, p. 588 –594
- DOI: 10.1049/iet-cvi.2014.0416
- Type: Article
- + Show details - Hide details
-
p.
588
–594
(7)
An automatic computer vision algorithm that detects individual paper fibres from an image, assesses the possibility of grasping the detected fibres with microgrippers and detects the suitable grasping points is presented. The goal of the algorithm is to enable automatic fibre manipulation for mechanical characterisation, which has traditionally been slow manual work. The algorithm classifies the objects in images based on their morphology, and detects the proper grasp points from the individual fibres by applying given geometrical constraints. The authors test the ability of the algorithm to detect the individual fibres with 35 images containing more than 500 fibres in total, and also compare the graspability analysis and the calculated grasp points with the results of an experienced human operator with 15 images that contain a total of almost 200 fibres. The detection results are outstanding, with fewer than 1% of fibres missed. The graspability analysis gives sensitivity of 0.83 and specificity of 0.92, and the average distance between the grasp points of the human and the algorithm is 220 µm. Also, the choices made by the algorithm are much more consistent than the human choices.
- Author(s): T. Malathi and M.K. Bhuyan
- Source: IET Computer Vision, Volume 9, Issue 4, p. 595 –602
- DOI: 10.1049/iet-cvi.2014.0210
- Type: Article
- + Show details - Hide details
-
p.
595
–602
(8)
The stereo matching problem takes two images captured by nearby cameras and attempts to recover quantitative disparity information. Most of the existing stereo matching algorithms find it difficult to estimate disparity in the occlusion, discontinuities and textureless regions in the images. In the last few decades, a number of stereo matching methods have been proposed to overcome some of these problems. In the same line of thought, the authors propose a new feature-based stereo matching method, which consists of four basic steps – feature-based stereo correspondence, two-pass cost aggregation, disparity computation using winner-takes-all selection and finally, the disparity refinement. In the proposed method, local features of Gabor wavelet in spatial domain are used for matching cost computation and subsequently a cost aggregation step is implemented by combined use of the Kuwahara filter and the median filter. Experimental results on the Middlebury benchmark database shows that the proposed method outperforms many existing local stereo matching methods.
- Author(s): Chong Yu ; Yonghong Song ; Quan Meng ; Yuanlin Zhang ; Yang Liu
- Source: IET Computer Vision, Volume 9, Issue 4, p. 603 –613
- DOI: 10.1049/iet-cvi.2013.0307
- Type: Article
- + Show details - Hide details
-
p.
603
–613
(11)
Text plays an important role in daily life because of its rich information, thus automatic text detection in natural scenes has many attractive applications. However, detecting and recognising such text is always a challenging problem. In this study, the authors propose a method which extends the widely-used stroke width transform by two steps of edge analysis, namely candidate edge recombination and edge classification. A new method that recognises text through candidate edge recombination and candidate edge recognition is also proposed. In the step of candidate edge recombination, they use the idea of over-segmentation and region merging. To separate text edge from background, the edge of the input image is first divided into small segments. Then, neighbour edge segments are merged, if they have similar stroke width and colour. Through this step, each character is described by one candidate boundary. In the step of boundary classification, candidate boundaries are aggregated into text chains, followed by chain classification using character-based and chain-based features. To recognise text, the grey image is extracted based on the location of each candidate edge after the step of candidate edge recombination. Then, histogram of gradient features and a classifier are used to recognise each character. To evaluate the effectiveness of their method, the algorithm is run on the ICDAR competition dataset and Street View Text database. The experimental results show that the proposed method provides promising performance in comparison with the existing methods.
Automatic method for white matter lesion segmentation based on T1-fluid-attenuated inversion recovery images
Stereo matching based on segmented B-spline surface fitting and accelerated region belief propagation
Colour-feature dual discriminating correlation analysis for face recognition
Motion objects segmentation based on structural similarity background modelling
Invariant foreground occupation ratio for scale adaptive mean shift tracking
Scene text detection method based on the hierarchical model
Revamped fly-over for accurate colon visualisation in virtual colonoscopy
Unsupervised discovery of human activities from long-time videos
Cross-view face recognition via structured dictionary based domain shift
Learning a non-linear combination of Mahalanobis distances using statistical inference for similarity measure
Efficient vanishing point detection method in complex urban road environments
Very large-scale integration architecture for video stabilisation and implementation on a field programmable gate array-based autonomous vehicle
Sequence-to-sequence alignment using a pendulum
Fast and accurate multi-view reconstruction by multi-stage prioritised matching
Automatic image-based detection and inspection of paper fibres for grasping
Estimation of disparity map of stereo image pairs using spatial domain local Gabor wavelet
Text detection and recognition in natural scene with edge analysis
-
- Author(s): M. Hassaballah and Saleh Aly
- Source: IET Computer Vision, Volume 9, Issue 4, p. 614 –626
- DOI: 10.1049/iet-cvi.2014.0084
- Type: Article
- + Show details - Hide details
-
p.
614
–626
(13)
Face recognition has received significant attention because of its numerous applications in access control, law enforcement, security, surveillance, Internet communication and computer entertainment. Although significant progress has been made, the state-of-the-art face recognition systems yield satisfactory performance only under controlled scenarios and they degrade significantly when confronted with real-world scenarios. The real-world scenarios have unconstrained conditions such as illumination and pose variations, occlusion and expressions. Thus, there remain plenty of challenges and opportunities ahead. Latterly, some researchers have begun to examine face recognition under unconstrained conditions. Instead of providing a detailed experimental evaluation, which has been already presented in the referenced works, this study serves more as a guide for readers. Thus, the goal of this study is to discuss the significant challenges involved in the adaptation of existing face recognition algorithms to build successful systems that can be employed in the real world. Then, it discusses what has been achieved so far, focusing specifically on the most successful algorithms, and overviews the successes and failures of these algorithms to the subject. It also proposes several possible future directions for face recognition. Thus, it will be a good starting point for research projects on face recognition as useful techniques can be isolated and past errors can be avoided.
Face recognition: challenges, achievements and future directions
Most viewed content
Most cited content for this Journal
-
Brain tumour classification using two-tier classifier with adaptive segmentation technique
- Author(s): V. Anitha and S. Murugavalli
- Type: Article
-
Driving posture recognition by convolutional neural networks
- Author(s): Chao Yan ; Frans Coenen ; Bailing Zhang
- Type: Article
-
Local directional mask maximum edge patterns for image retrieval and face recognition
- Author(s): Santosh Kumar Vipparthi ; Subrahmanyam Murala ; Anil Balaji Gonde ; Q.M. Jonathan Wu
- Type: Article
-
Fast and accurate algorithm for eye localisation for gaze tracking in low-resolution images
- Author(s): Anjith George and Aurobinda Routray
- Type: Article
-
‘Owl’ and ‘Lizard’: patterns of head pose and eye pose in driver gaze classification
- Author(s): Lex Fridman ; Joonbum Lee ; Bryan Reimer ; Trent Victor
- Type: Article