IET Computer Vision
Volume 11, Issue 1, February 2017
Volumes & issues:
Volume 11, Issue 1
February 2017
-
- Author(s): Xuguang Zhang ; Xufeng Zhang ; Yiming Wang ; Hui Yu
- Source: IET Computer Vision, Volume 11, Issue 1, p. 1 –9
- DOI: 10.1049/iet-cvi.2016.0022
- Type: Article
- + Show details - Hide details
-
p.
1
–9
(9)
It has been shown that mean shift tracking algorithm can achieve excellent results in pedestrian tracking task. It empirically estimates the target position of current frame by locating the maximum of a density function from the local neighborhood of the target position of previous frame. However, this method only considers its past trajectory without considering the influence of pedestrian environment when applying to pedestrian tracking. In practical, pedestrians always keep a safe distance away from obstacles when programming their paths. To address the issue of obstacle avoidance, this paper proposes a novel extended social force model-based mean shift tracking algorithm in which pedestrian environment is full taken in consideration. Firstly, an extended social force model is presented to quantify the interaction between pedestrian and obstacle by means of force. Furthermore, directional weights and speed weights are introduced to adjust the strength of the force in terms of the difference of individual perspectives and relative velocities. Finally, the initial target position is predicted by Newton's laws of motion and then the Mean Shift method is integrated to track target position. Experiment results show that this algorithm achieves an encouraging performance when obstacles exist.
- Author(s): Ray Khuboni and Bashan Naidoo
- Source: IET Computer Vision, Volume 11, Issue 1, p. 10 –21
- DOI: 10.1049/iet-cvi.2015.0446
- Type: Article
- + Show details - Hide details
-
p.
10
–21
(12)
This study presents an adaptive segmentation method for pre-processing input data to the patch-based multi-view stereo algorithm. A specially developed greyscale transformation is applied to the input image data, thus redefining the intensity histogram. The Nelder–Mead simplex method is used to adaptively locate an optimised segmentation threshold point in the modified histogram. The transformed input image is then segmented using the acquired threshold value, into foreground and background data. The segmentation information acquired is applied to the initial feature extraction and the cyclic patch-expansion procedure to constrain the reconstruction to a three-dimensional visibility space that excludes background artefacts. The method is targeted at segmenting out potentially disruptive data and is able to realise a reduction in cumulative error of the reconstruction process and thus improve the final reconstruction. With this method, the authors obtain results that are relatively similar to the original patch-based method, but with reduced time and space complexity.
- Author(s): Rinku Rabidas ; Jayasree Chakraborty ; Abhishek Midya
- Source: IET Computer Vision, Volume 11, Issue 1, p. 22 –32
- DOI: 10.1049/iet-cvi.2016.0163
- Type: Article
- + Show details - Hide details
-
p.
22
–32
(11)
Masses are one of the prevalent early signs of breast cancer, visible in mammogram. However, its variation in shape, size, and appearance often creates hazards in proper diagnosis of mammographic masses. This study analyses the 2D singularities of masses and their surrounding regions with Ripplet-II transform to classify them as benign and malignant. Since benign and malignant masses may change the orientation patterns of normal breast tissues differently, several textural features including Ripplet-II coefficients and statistical co-variates, derived from the Ripplet-II transformed images, are extracted to quantify the texture information of mammographic regions. The important features are then selected using stepwise logistic regression technique and evaluated using linear discriminant analysis and support vector machine with a ten-fold cross-validation. The best performance in terms of the area under the receiver operating characteristic curve of 0.91 ± 0.01 and 0.83 ± 0.01 and accuracy of 87.28 ± 0.02 and 75.60 ± 0.01 are obtained with the proposed method while experimenting with 58 images from the mini-MIAS and 200 images from the Digital Database for Screening Mammography database, respectively.
- Author(s): Alexander Cerón ; Iván Mondragón ; Flavio Prieto
- Source: IET Computer Vision, Volume 11, Issue 1, p. 33 –42
- DOI: 10.1049/iet-cvi.2015.0477
- Type: Article
- + Show details - Hide details
-
p.
33
–42
(10)
In this study, the authors propose a new method for transmission tower detection that involves the use of visual features and the linear content of the scene. For this process, they developed a descriptor based on a grid of two-dimensional feature descriptors that is useful not only for object detection, but also for tracking the area of interest. For the detection and classification, they used a support vector machine. The experiments were conducted with a dataset of real world images from transmission tower videos that were used to validate the strategy by comparing it with the ground truth. The results showed that the obtained method is fast and appropriate for tower detection in video sequences of environments that include rural and urban areas. The detection took less than 50 ms and was faster than other methods.
- Author(s): S. Jafar Hosseini and Helder Araujo
- Source: IET Computer Vision, Volume 11, Issue 1, p. 43 –49
- DOI: 10.1049/iet-cvi.2016.0006
- Type: Article
- + Show details - Hide details
-
p.
43
–49
(7)
This study addresses the problem of monocular reconstruction of surfaces that deform isometrically, using points tracked in a single image. To tackle this problem, a flat three-dimensional (3D) shape of the surface and its image are used as template. Such deformations are characterised by certain geometric constraints and to reconstruct the surfaces, these constraints have to be properly exploited. Therefore, the authors propose an algebraic formula that aims at the joint expression of the geometric constraints, namely those based on the differential properties and also those based on the upper-bound model. This expression is, in fact, a unique formulation that results from integrating these two types of constraints, and leads to the intended reconstructions, even when the surface is not strictly isometric. The template shape is used to set the parameters of the expression, which is then optimised (along with the projection equations) by means of a semi-definite programming (SDP) problem. This optimisation enables the estimation of 3D positions of the points on the surface. However, and for implementation purposes, this optimisation is applied separately to patches which, together, make up the whole surface. The experimental results show that the proposed approach improves the results from other methods in terms of accuracy.
- Author(s): Nagarajan Pitchandi and Saravana Perumaal Subramanian
- Source: IET Computer Vision, Volume 11, Issue 1, p. 50 –59
- DOI: 10.1049/iet-cvi.2016.0004
- Type: Article
- + Show details - Hide details
-
p.
50
–59
(10)
Vision sensors are employed in robotic assembly system to sense the dynamic environment and to position the manipulator precisely based on the sensor feedback. This process is termed as visual servoing. Precise calibration of the camera and camera/robot system are required to estimate the desired velocity of the robot and accurate positioning of the mating parts. In position-based visual servoing, roughly calibrated camera leads to errors in robot/camera pose identification that affects the positional accuracy and time to reach the target position. A camera calibration procedure based on genetic algorithm (GA) is proposed in this study to estimate the intrinsic and extrinsic parameters of the camera model for improving positional accuracy and faster convergence. The proposed algorithm is implemented with two-stage procedure and it comprises: determination of the camera parameters for distortion-less model and reduction of re-projection error through GA with linearly determined camera distortion-less parameters as an initial solution. The proposed camera calibration algorithm has been tested and compared with the dataset images in the literature for its performance in terms of measurement accuracy. The result shows that the proposed algorithm has the capability to calibrate the distorted images with minimum re-projection error using single image.
- Author(s): Lina Liu ; Shiwei Ma ; Ling Rui ; Jian Lu
- Source: IET Computer Vision, Volume 11, Issue 1, p. 60 –67
- DOI: 10.1049/iet-cvi.2015.0482
- Type: Article
- + Show details - Hide details
-
p.
60
–67
(8)
In view of the incremental dimensionality reduction problem of existing non-linear dimensionality reduction methods, a novel algorithm, based on locality constrained dictionary learning (LCDL), is proposed in this study. During the dictionary learning process, the neighbourhood size of some potential landmarks on a non-linear manifold is constrained to maintain the intrinsic local geometric feature of the datasets. Meanwhile, to improve the dictionary's discrimination ability, a structured dictionary is learnt by LCDL, whose sub-dictionaries are class-specific. Then sparse coding and its reconstruction errors are used for classification. The experimental results of dimensionality reduction prove that, compared with the existing methods, the proposed method can solve the out of sample extension and large-scale datasets problems efficiently. In addition, the experimental results of face, gender, and object category classification demonstrate that the authors’ algorithm outperforms some competing dictionary learning methods.
- Author(s): Muhammad Imran Shehzad ; Yasir A. Shah ; Zahid Mehmood ; Abdul Waheed Malik ; Shoaib Azmat
- Source: IET Computer Vision, Volume 11, Issue 1, p. 68 –77
- DOI: 10.1049/iet-cvi.2016.0156
- Type: Article
- + Show details - Hide details
-
p.
68
–77
(10)
This study presents a novel multiple objects tracking (MOT) approach that models object's appearance based on K-means, while introducing a new statistical measure for association of objects after occlusion. The proposed method is tested on several standard datasets dealing complex situations in both indoor and outdoor environments. The experimental results show that the proposed model successfully tracks multiple objects in the presence of occlusion with high accuracy. Moreover, the presented work has the capability to deal long term and complete occlusion without any prior training of the shape and motion model of the objects. Accuracy of the proposed method is comparable with that of the existing state-of-the-art techniques as it successfully deals with all MOT cases in the standard datasets. Most importantly, the proposed method is cost effective in terms of memory and/or computation as compared with that of the existing state-of-the-art techniques. These traits make the proposed system very useful for real-time embedded video surveillance platforms especially those that have low memory/compute resources.
- Author(s): Heng Wang ; Di Huang ; Yunhong Wang ; Hongyu Yang
- Source: IET Computer Vision, Volume 11, Issue 1, p. 78 –86
- DOI: 10.1049/iet-cvi.2016.0074
- Type: Article
- + Show details - Hide details
-
p.
78
–86
(9)
Facial aging simulation is one of the most challenging issues in automatic machine based face analysis, where the most essential requirements are (i) human identity should remain stable in texture synthesis and (ii) the texture synthesised is expected to accord with human cognitive perception in aging. In this study, the authors propose a tensor completion based method to transform the simulation task to a standard matrix completion one. To protect human dependent characteristics during texture synthesis, the proposed method processes the two major components, i.e. identity and age, in different channels. Furthermore, they incorporate prior information in such a process, assuming that the textures of different subjects in the same age group are similar and similar looking people tend to age in similar ways, and the metric learning technique is adopted to measure the similarity between identities so that the faces that have the highest similarities with the one in the test image are assigned bigger weights in texture generation. In addition, shape deformation is also considered to make the synthesised images more natural. Experimental results achieved on the FG-NET database demonstrate the effectiveness of the proposed method.
- Author(s): Jaeyong Ju ; Daehun Kim ; Bonhwa Ku ; David K. Han ; Hanseok Ko
- Source: IET Computer Vision, Volume 11, Issue 1, p. 87 –95
- DOI: 10.1049/iet-cvi.2016.0068
- Type: Article
- + Show details - Hide details
-
p.
87
–95
(9)
This study addresses the automatic multi-person tracking problem in complex scenes from a single, static, uncalibrated camera. In contrast with offline tracking approaches, a novel online multi-person tracking method is proposed based on a sequential tracking-by-detection framework, which can be applied to real-time applications. A two-stage data association is first developed to handle the drifting targets stemming from occlusions and people's abrupt motion changes. Subsequently, a novel online appearance learning is developed by using the incremental/decremental support vector machine with an adaptive training sample collection strategy to ensure reliable data association and rapid learning. Experimental results show the effectiveness and robustness of the proposed method while demonstrating its compatibility with real-time applications.
- Author(s): Dhanya S. Pankaj and Rama Rao Nidamanuri
- Source: IET Computer Vision, Volume 11, Issue 1, p. 96 –103
- DOI: 10.1049/iet-cvi.2016.0080
- Type: Article
- + Show details - Hide details
-
p.
96
–103
(8)
Multiview registration is an important stage in three-dimensional modelling pipeline. Motion averaging is an efficient approach for multiview registration which utilises the redundancy in overlap among the scans. The averaging of the underlying relative motions is performed in the corresponding Lie-algebra elements of the SE(3) transformation matrices. However, this method is non-robust and affected by the presence of outliers in the set of relative motions. The authors present a graph-based approach to filter out the outliers before performing averaging of motions. The relative motions are assigned weights based on their agreement with global motions and other relative motions. The results indicate that the authors’ approach can efficiently filter out the outliers and can thus introduce robustness to multiview registration using motion averaging.
- Author(s): Shuohao Li ; Jun Zhang ; Qiang Guo ; Jun Lei ; Dan Tu
- Source: IET Computer Vision, Volume 11, Issue 1, p. 104 –111
- DOI: 10.1049/iet-cvi.2015.0473
- Type: Article
- + Show details - Hide details
-
p.
104
–111
(8)
Connecting visual imagery with descriptive language is a challenge for computer vision and machine translation. To approach this problem, the authors propose a novel end-to-end model to generate descriptions for images. Some early works used convolutional neural network-long-short-term memory (CNN-LSTM) model to describe the image, where a CNN encodes the input image into feature vector and an LSTM decodes the feature vector into a description. Since two-dimensional LSTM (2DLSTM) has property of translation invariance and can encode the relationships between regions in an image, they not only apply a CNN to extract global features of an image, but also use a multidirectional 2DLSTM to encode the feature maps extracted by CNN into structural local features. Their model is trained through maximising the likelihood of the target description sentence from the training dataset. Experiments on two challenging datasets show the accuracy of the model and the fluency of the language which is learned by their model. They compare bilingual evaluation understudy score and retrieval metric of their results with current state-of-the-art scores and show the improvements on Flickr30k and MS COCO.
Extended social force model-based mean shift for pedestrian tracking under obstacle avoidance
Adaptive segmentation for multi-view stereo
Analysis of 2D singularities for mammographic mass classification
Real-time transmission tower detection from video based on a feature descriptor
SDP-based approach to monocular reconstruction of inextensible surfaces
GA-based camera calibration for vision-assisted robotic assembly system
Locality constrained dictionary learning for non-linear dimensionality reduction and classification
K-means based multiple objects tracking with long-term occlusion handling
Facial aging simulation via tensor completion and metric learning
Online multi-person tracking with two-stage data association and online appearance model learning
Robust algorithm for multiview registration
Generating image descriptions with multidirectional 2D long short-term memory
Most viewed content
Most cited content for this Journal
-
Brain tumour classification using two-tier classifier with adaptive segmentation technique
- Author(s): V. Anitha and S. Murugavalli
- Type: Article
-
Driving posture recognition by convolutional neural networks
- Author(s): Chao Yan ; Frans Coenen ; Bailing Zhang
- Type: Article
-
Local directional mask maximum edge patterns for image retrieval and face recognition
- Author(s): Santosh Kumar Vipparthi ; Subrahmanyam Murala ; Anil Balaji Gonde ; Q.M. Jonathan Wu
- Type: Article
-
Fast and accurate algorithm for eye localisation for gaze tracking in low-resolution images
- Author(s): Anjith George and Aurobinda Routray
- Type: Article
-
‘Owl’ and ‘Lizard’: patterns of head pose and eye pose in driver gaze classification
- Author(s): Lex Fridman ; Joonbum Lee ; Bryan Reimer ; Trent Victor
- Type: Article