IET Computer Vision
Print ISSN
1751-9632
Online ISSN 1751-9640
Online ISSN 1751-9640
IET Computer Vision seeks original research papers in a wide range of areas of computer vision. The vision of the journal is to publish the highest quality research work that is relevant and topical to the field, but not forgetting those works that aim to introduce new horizons and set the agenda for future avenues of research in Computer Vision.
This publication was previously known as IEE Proceedings - Vision, Image and Signal Processing 1994-2006. ISSN 1350-245X. more..
Volumes & issues:
Latest content
-
Editorial: Invited papers from Editorial Board Members
- Author(s): E.R. Hancock
- + Show Description
-
Hide details
-
p.
499
(1)
-
Many-to-many feature matching in object recognition: a review of three approaches
- Author(s): A. Shokoufandeh; Y. Keselman; M.F. Demirci; D. Macrini; S. Dickinson
- + Show Description
-
Hide details
-
p.
500
–513
(14)
The mainstream object categorisation community relies heavily on object representations consisting of local image features, due to their ease of recovery and their attractive invariance properties. Object categorisation is therefore formulated as finding, that is, ‘detecting’, a one-to-one correspondence between image and model features. This assumption breaks down for categories in which two exemplars may not share a single local image feature. Even when objects are represented as more abstract image features, a collection of features at one scale (in one image) may correspond to a single feature at a coarser scale (in the second image). Effective object categorisation therefore requires the ability to match features many-to-many. In this paper, we review our progress on three independent object categorisation problems, each formulated as a graph matching problem and each solving the many-to-many graph matching problem in a different way. First, we explore the problem of learning a shape class prototype from a set of class exemplars which may not share a single local image feature. Next, we explore the problem of matching two graphs in which correspondence exists only at higher levels of abstraction, and describe a low-dimensional, spectral encoding of graph structure that captures the abstract shape of a graph. Finally, we embed graphs into geometric spaces, reducing the many-to-many graph-matching problem to a weighted point matching problem, for which efficient many-to-many matching algorithms exist.
-
Saliency in images and video: a brief survey
- Author(s): K. Duncan; S. Sarkar
- + Show Description
-
Hide details
-
p.
514
–523
(10)
Salient image regions permit non-uniform allocation of computational resources. The selection of a commensurate set of salient regions is often a step taken in the initial stages of many computer vision algorithms, thereby facilitating object recognition, visual search and image matching. In this study, the authors survey the role and advancement of saliency algorithms over the past decade. The authors first offer a concise introduction to saliency. Next, the authors present a summary of saliency literature cast into their respective categories then further differentiated by their domains, computational methods, features, context and use of scale. The authors then discuss the achievements and limitations of the current state of the art. This information is augmented by an outline of the datasets and performance measures utilised as well as the computational techniques pervasive in the literature.
-
Multimodal imaging: modelling and segmentation with biomedical applications
- Author(s): A.M. Ali; A.A. Farag; N. Alajlan; A.A. Farag
- + Show Description
-
Hide details
-
p.
524
–539
(16)
The maximum a posteriori (MAP) technique, combining intensity and spatial interactions, has been a standard statistical approach for image segmentation. Crucial steps for the MAP technique are the model identification, incorporation of priors, and the optimisation approach. This paper describes an unsupervised MAP segmentation framework of N-dimensional multimodal images. The input image and its desired labelling are described by a joint Markov-Gibbs random field (MGRF) model of independent image signals and interdependent region labels. A kernel approach is used to model the joint and marginal probability densities of objects from the gray level histogram, incorporating a generalised linear combination of Gaussians (LCG). A novel maximum likelihood estimate (MLE) for the number of classes in the LCG model is introduced. An approach is devised for MGRF model identification based on region characteristics. The segmentation process employs LCG to provide an initial segmentation, then α-expansion move algorithm iteratively refines the labelled image using MGRF. The resulting MAP algorithm is studied in terms of convergence and sensitivity to initialisation, improper estimation of the number of classes, and discontinuities in the objects. The framework is modular, allowing incorporation of intensity and spatial interactions with varying complexity, and can be extended to incorporate shape priors.
-
Backgroundless detection of pedestrians in cluttered conditions based on monocular images: a review
- Author(s): D. Simonnet; S.A. Velastin; E. Turkbeyler; J. Orwell
- + Show Description
-
Hide details
-
p.
540
–550
(11)
The significant progress in visual surveillance has been motivated by the need to emulate some of the human ability to monitor activity in human-made environments, particularly in the contexts of security and safety. The rapid rise in numbers of cameras installed in public and private places makes such automation desirable, at least to reduce CCTV workload. Real-world applications of visual surveillance impose the need of robust real-time solutions, able to deal with a wide range of circumstances and environmental conditions. Conventional approaches work based on what has become known as motion (or change) detection followed by tracking (in single or multiple camera systems). Objects of interest are represented by rectangular blobs and decisions on whether something might be interesting are made on rules or learned patterns of presence and trajectories of such blobs. There is growing interest in looking ‘inside the box’ for applications that are concerned with detailed human activity recognition and with robust detection of people even when image backgrounds change, as is the case of a moving camera. In this study, the authors consider the general problem of robust pedestrian detection irrespective of background, reviewing the state of the art, showing some representative results and suggesting ways forward.
-
Sparse local discriminant projections for discriminant knowledge extraction and classification
- Author(s): Z. Lai
- + Show Description
-
Hide details
-
p.
551
–559
(9)
One of the major disadvantages of the linear dimensionality reduction algorithms, such as principle component analysis (PCA) and linear discriminant analysis (LDA), is that the projections are lack of physical interpretation. Moreover, which features or variables play an important role in feature extraction and classification in classical linear dimensionality reduction methods is still not investigated well. This paper proposes a novel supervised learning method called sparse local discriminant projections (SLDPs) for linear dimensionality reduction. Differed from the recent manifold-learning-based methods such as local preserving projections (LPPs), SLDP introduces a sparse constraint into the objective function and integrates the local geometry, discriminant information and within-class geometry to obtain the sparse projections. The sparse projections can be efficiently computed by the Elastic Net. The most important and interesting thing is that the sparse projections learned by SLDP have a direct physical interpretation and provide us the discriminant knowledge and insightful understanding for the extracted features. The experimental results show that SLDP can give reasonable semantic results and achieves competitive performance compared with some techniques such as PCA, LPP, neighbourhood preserving embedding (NPE) and the recently proposed unified sparse subspace learning (USSL).
-
Empirical investigation into the correlation between vignetting effect and the quality of sensor pattern noise
- Author(s): C.-T. Li; R. Satta
- + Show Description
-
Hide details
-
p.
560
–566
(7)
The sensor pattern noise (SPN) is a unique attribute of the content of images that can facilitate identification of source digital imaging devices. Owing to its potential in forensic applications, it has drawn much attention in the digital forensic community. Although much work has been done on the applications of the SPN, investigations into its characteristics have been largely overlooked in the literature. In this study, the authors aim to fill this gap by providing insight into the characteristic dependency of the SPN quality on its location in images. They have observed that the SPN components at the image periphery are not reliable for the task of source camera identification, and tend to cause higher false-positive rates. Empirical evidence is presented in this work. The authors suspect that this location-dependent SPN quality degradation has strong connection with the so-called ‘vignetting effect’, as both exhibit the same type of location dependency. The authors recommend that when image blocks are to be used for forensic investigations, they should be taken from the image centre before SPN extraction is performed in order to reduce false-positive rate.
-
Non-linear factorised dynamic shape and appearance models for facial expression analysis and tracking
- Author(s): C.-S. Lee; A. Elgammal
- + Show Description
-
Hide details
-
p.
567
–580
(14)
Facial expressions exhibit non-linear shape and appearance deformations with variations in different people and expressions. The authors present a non-linear factorised shape and appearance model for facial expression analysis and tracking. The novel non-linear factorised generative model of facial expressions, using conceptual manifold embedding and empirical kernel maps, provides accurate facial expression shape and appearance. It preserves non-linear facial deformations based on the configuration, face style and expression type. The proposed model supports tasks, such as facial expression recognition, person identification and global and local facial motion tracking. Given a sequence of images, temporal embedding, expression type and person identification parameters are iteratively estimated for facial expression analysis. The authors combine global facial motion estimation and local facial deformation estimation for large global and subtle local facial motion tracking. The authors employ local facial motion deformation estimation using a thin-plate spline for subtle facial motion tracking. The global shape and appearance model provides appearance templates for the estimation of local deformation. Experimental results using Cohen–Kanade AU-coded facial expressions demonstrate facial expression recognition using estimated personal style parameter, and facial deformation tracking using global and local facial motion estimation.
-
Adaptive pattern spectrum image description using Euclidean and Geodesic distance without training for texture classification
- Author(s): V. González-Castro; E. Alegre; O. García-Olalla; L. Fernández-Robles; M.T. García-Ordás
- + Show Description
-
Hide details
-
p.
581
–589
(9)
Mathematical morphology can be used to extract a shape–size distribution called pattern spectrum (PS) with texture description purposes. However, the structuring element (SE) used to compute it does not vary along the image; and therefore it does not capture its geometrical variations. The author's proposal consists of computing an SE at each pixel whose size and shape varies with two distance criterions: an Geodesic distance and a Euclidean distance, in order to fit the texture as well as possible. Combining the Geodesic and the Euclidean descriptors as just one descriptor, the classification results of several textures from the VisTex and Brodatz database show that this approach outperforms the classical PS, the Geodesic and the Euclidean descriptors separately and, in contrast with other adaptive methods, it does not require previous training.
-
Multi-human tracking from sparse detection responses
- Author(s): Y. Shen; Z. Miao
- + Show Description
-
Hide details
-
p.
590
–602
(13)
In this study, the authors focus on the performance improvements of multi-human tracking from sparse detection responses. Many previous detection-based data association tracking methods used dense detection responses as input, but they did not take into account in the case of sparse detection responses. Dense detection responses are difficult to obtain in complex environments all the time. In order to achieve this goal, they propose a particle-filter-based triple threshold method to build reliable trajectories. Here, they apply topic model to represent human appearance. The appearance of each person can be considered as topic distribution. Then a cost function algorithm is used to associate these trajectories in a time sliding window for final tracking results. The cost function is composed of four parts: appearance cost, motion direction cost, object size cost and distance cost. These four parts are integrated into a unified formula to build this cost function. Finally, they use three challenging datasets to evaluate the performance of the author's approach in the case of dense and sparse detection responses, respectively, and compare with state-of-the-art approaches. The results show that their approach can obtain better tracking performance than that of previous methods in both cases.

