IET Computer Vision
Volume 12, Issue 6, September 2018
Volumes & issues:
Volume 12, Issue 6
September 2018
-
- Author(s): Huanlong Zhang ; Xiujiao Zhang ; Yong Wang ; Xiaoliang Qian ; Yanfeng Wang
- Source: IET Computer Vision, Volume 12, Issue 6, p. 763 –769
- DOI: 10.1049/iet-cvi.2017.0554
- Type: Article
- + Show details - Hide details
-
p.
763
–769
(7)
Kernelised correlation filter (KCF)-based trackers have recently attracted considerable attention due to their exciting accuracy and efficiency. Numerous improvements have been made later for coping with scales variation or partial occlusion etc. However, when there is an abrupt motion between the consecutive image frames, these trackers would face failure. To alleviate the problem, the authors present an extended cuckoo search (CS)-based KCF tracker (called ECSKCF). At first, the extended CS algorithm is constructed by the Simplex method (SM). CS has obvious capability in global search while the SM has exceptional advantage in local search. Based on ECS method, motion prediction is transformed to globally search for optimal position intending to enhance the quality of base image. Then, combined ECS with Gaussian distribution, a hybrid motion model is introduced to KCF framework, which has the capability of capturing abrupt motion. Finally, a unified framework is designed to track smooth or abrupt motion simultaneously. Extensive experimental results in both quantitative and qualitative measures demonstrate the effectiveness of the authors’ proposed method for abrupt motion tracking.
- Author(s): Srikanth Vasamsetti ; Supriya Setia ; Neerja Mittal ; Harish K. Sardana ; Geetanjali Babbar
- Source: IET Computer Vision, Volume 12, Issue 6, p. 770 –778
- DOI: 10.1049/iet-cvi.2017.0013
- Type: Article
- + Show details - Hide details
-
p.
770
–778
(9)
Moving object detection in a video sequence is one of the leading tasks of marine scientists to explore and monitor applications. The videos acquired in the underwater environment are usually degraded due to the physical properties of water medium as compared with images acquired in the air and that affects the performance of feature descriptors. In this study, a new feature descriptor, multi-frame triplet pattern (MFTP) is proposed for underwater moving object detection. The MFTP encodes the structure of local region based on three sets of frames, which are calculated by considering local differences in intensities between the centre pixel and its nine neighbours. Furthermore, the robustness of the proposed method is increased by integrating it with colour and motion features. The performance of the proposed framework is tested by conducting seven experiments on Fish4Knowledge database for underwater moving object detection applications. The results of the proposed method show a significant improvement as compared with state-of-the-art techniques in terms of their evaluation measures.
- Author(s): Shelda Sajeev ; Mariusz Bajger ; Gobert Lee
- Source: IET Computer Vision, Volume 12, Issue 6, p. 779 –786
- DOI: 10.1049/iet-cvi.2017.0586
- Type: Article
- + Show details - Hide details
-
p.
779
–786
(8)
Finding masses in dense background is a difficult task for even experienced radiologist. It is due to the similarity of intensity between the masses and the overlapped normal dense tissues. A novel method for classification of masses localised in dense background of breast is proposed. Nine structured superpixel patterns were generated using local binary pattern technique on superpixels. Analysis of these nine structured superpixel patterns revealed the most prominent ones, allowing for successful classification of malignant masses and normal dense breast regions. Two mammographic databases were used to evaluate the proposed approach: the publicly available digital database for screening mammography (DDSM), and a local database of mammograms (BreastScreen SA, BSSA). A total of 525 regions of interest (ROIs) were used (301 extracted from DDSM and 224 from BSSA). All 525 ROIs were localised in dense backgrounds of breasts. The results indicate that features generated from structured superpixel patterns can produce very effective and efficient texture descriptors of breast masses localised in dense background. Using Fisher linear discriminant analysis classifier, an area under the receiver operating characteristic curve score of 0.96 was achieved for DDSM and 0.93 for BSSA with only six features.
- Author(s): Kavitha Nagarathinam and Ruba Soundar Kathavarayan
- Source: IET Computer Vision, Volume 12, Issue 6, p. 787 –795
- DOI: 10.1049/iet-cvi.2017.0273
- Type: Article
- + Show details - Hide details
-
p.
787
–795
(9)
The presence of shadows degrades the performance of many computer vision and video surveillance applications, as objects can be incorrectly classified. The article proposes a method for detecting moving shadows using stationary wavelet transform (SWT) and Zernike moments (ZM) based on an automatic threshold determined by the wavelet coefficients. The multi-resolution and shift invariance properties of the SWT make it suitable for change detection and feature extraction. To reduce the redundant wavelet coefficients, ZM are applied. The novelty of the proposed method is the determination of the variant statistical threshold – ‘skewness’, without the requirement of any supervised learning or manual calibration. The experimental results prove that the proposed threshold performs well to show a better variation between the objects and shadows in various environments.
- Author(s): Bo Hu ; Leida Li ; Jiansheng Qian
- Source: IET Computer Vision, Volume 12, Issue 6, p. 796 –805
- DOI: 10.1049/iet-cvi.2017.0478
- Type: Article
- + Show details - Hide details
-
p.
796
–805
(10)
Motion deblurring has been widely studied. However, the relevant quality evaluation of motion deblurred images remains an open problem. The motion deblurred images are usually contaminated by noise, ringing and residual blur (NRRB) simultaneously. Unfortunately, most of the existing quality metrics are not designed for multiply distorted images, so they are limited in predicting the quality of motion deblurred images. In this study, the authors propose a new quality metric for motion deblurred images by measuring NRRB. For a motion deblurred image, the noise level is first estimated. Then the ringing effect is measured by incorporating visual saliency model to adapt to the characteristic of the human visual system. A reblurring-based method is proposed to extract similarity features between a motion deblurred image and its re-blurred version for evaluating the residual blur. Finally, the overall quality score of a motion deblurred image is obtained by pooling the scores of noise, ringing and blur. Experimental results conducted on a motion deblurring database demonstrate that the proposed metric significantly outperforms the existing quality metrics. In addition, the proposed NRRB metric is used for improving the existing general-purpose no-reference metrics, and very encouraging results are achieved.
- Author(s): Jun Dou ; Dongmei Niu ; Zhiquan Feng ; Xiuyang Zhao
- Source: IET Computer Vision, Volume 12, Issue 6, p. 806 –816
- DOI: 10.1049/iet-cvi.2017.0550
- Type: Article
- + Show details - Hide details
-
p.
806
–816
(11)
Point set registration is a fundamental problem in many domains of computer vision. In previous work on the registration, the point sets are often represented using Gaussian mixture models and the registration process is represented as a form of a probabilistic solution. For non-rigid point set registration, however, the asymmetric Gaussian (AG) model can capture spatially asymmetric distributions compared with symmetric Gaussian, and the structural feature of the point sets reserve relatively complete and has important significance in registration. In this work, the authors designed a new shape context (SC) descriptor which combines the local and global structures of the point set. Meanwhile, they proposed a non-rigid point set registration algorithm which formulates a registration process as the mixture probability density estimation of the AG mixture model, and the method introduce the structural feature by the new SC. Extensive experiments show that the proposed algorithm has a clear improvement over the state-of-the-art methods.
- Author(s): Anh Minh Truong and Atsuo Yoshitaka
- Source: IET Computer Vision, Volume 12, Issue 6, p. 817 –825
- DOI: 10.1049/iet-cvi.2017.0487
- Type: Article
- + Show details - Hide details
-
p.
817
–825
(9)
Understanding human activities has been an important research area in computer vision. Generally, the authors can model the human interactions as a temporal sequence with the transition in relationships of humans and objects. Besides, many studies have proved the effectiveness of long short-term memory (LSTM) on long-term temporal dependency problems. Here, the authors proposed a novel structured recurrent neural network (S-RNN) to model spatio-temporal relationships between human subjects and objects in daily human interactions. The authors represent the evolution of different components and the relationships between them over time by several subnets. Then, the hidden representations of those relations are fused and fed into the later layers to obtain the final hidden representation. The final prediction is carried out by the single-layer perceptron. The experimental results of different tasks on the CAD-120, SBU-Kinect-Interaction, multi-modal and multi-view and interactive, and NTU RGB+D data sets showed advantages of the proposed method compared with the state-of-art methods.
- Author(s): Somjit Nath ; Subhannita Sarcar ; Biswendu Chatterjee ; Rhishita Chourashi ; Nabendu Sekhar Chatterjee
- Source: IET Computer Vision, Volume 12, Issue 6, p. 826 –833
- DOI: 10.1049/iet-cvi.2017.0585
- Type: Article
- + Show details - Hide details
-
p.
826
–833
(8)
The proposed method indicates an inexpensive, portable, and easily accessible method for the quantitative analysis of medical samples for the detection of disease in the enzyme-linked immunosorbent assay (ELISA). The procedure follows a point-of-care diagnostic model and attends to the several challenges in healthcare system in rural settings. The proposed technique will alleviate the inconveniences faced by the average citizen of a country with insufficient resources to implement an affordable healthcare administration for its entire population. A smartphone is used to procure images of an ELISA containing para-nitrophenol samples which is then fed into a machine learning algorithm, specifically artificial neural network. The introduction of two relatively new technologies in medical aid – the smartphone and machine learning not only reduces cost and time of detection, but also presents ample possibility for further development. The predictions result in highly accurate diagnostic labels. The same method can be used for blood samples for the prediction of presence of any disease, provided adequate training set has been deployed.
- Author(s): Guorong Cai ; Songzhi Su ; Wenli He ; Yundong Wu ; Shaozi Li
- Source: IET Computer Vision, Volume 12, Issue 6, p. 834 –843
- DOI: 10.1049/iet-cvi.2017.0266
- Type: Article
- + Show details - Hide details
-
p.
834
–843
(10)
Road detection is a fundamental component of autonomous driving systems since it provides valid space and candidate regions of objects for driving decision. The core of road detection methods is extracting effective and discriminative features. Since two-dimensional (2D) and 3D features are complementary, the authors propose a robust multi-feature combination and optimisation framework for stereo image pairs, called Feature++. First, several 2D and 3D features such as Gabor and plane are, respectively, extracted after the generation of 2D super-pixel and a 3D depth image from stereo matching. Second, the combined features are fed into a three-layer shallow neural network classifier to decide whether a super-pixel is road region or not. Finally, the classified results are further refined using fully connected conditional random field (CRF), taking the content information into consideration. We extensively evaluate the performance of four 2D features, four 3D features, and their combinations. Experiments conducted on the KITTI ROAD benchmark show that (i) the combinations of 2D and 3D features greatly improve the road detection performance and (ii) using CRF as a refinement step is necessary. Overall, their proposed ‘Feature + +’ method outperforms most manually designed features, and is comparable with state-of-the-art methods that are based on deep learning methods.
- Author(s): Aytac Cavent and Nazli Ikizler-Cinbis
- Source: IET Computer Vision, Volume 12, Issue 6, p. 844 –854
- DOI: 10.1049/iet-cvi.2017.0471
- Type: Article
- + Show details - Hide details
-
p.
844
–854
(11)
This study presents a novel representation based on hierarchical histogram of local feature sequences for human interaction recognition. The authors’ method basically combines the power of discriminative sequence mining and histogram representation for the effective recognition of human interactions. Our framework involves extracting visual features from the videos first, and then mining sequences of the visual features that occur consequently in space and time. After the mining step, we represent each video with a histogram pyramid of such sequences. We also propose to use soft clustering in the visual word construction step, such that more information-rich histograms can be obtained. The authors’ experimental results on challenging human interaction recognition data sets indicate that the proposed algorithm performs on par with the state-of-the-art methods.
- Author(s): Hazem Hiary ; Heba Saadeh ; Maha Saadeh ; Mohammad Yaqub
- Source: IET Computer Vision, Volume 12, Issue 6, p. 855 –862
- DOI: 10.1049/iet-cvi.2017.0155
- Type: Article
- + Show details - Hide details
-
p.
855
–862
(8)
Flower classification is a challenging task due to the wide range of flower species, which have a similar shape, appearance or surrounding objects such as leaves and grass. In this study, the authors propose a novel two-step deep learning classifier to distinguish flowers of a wide range of species. First, the flower region is automatically segmented to allow localisation of the minimum bounding box around it. The proposed flower segmentation approach is modelled as a binary classifier in a fully convolutional network framework. Second, they build a robust convolutional neural network classifier to distinguish the different flower types. They propose novel steps during the training stage to ensure robust, accurate and real-time classification. They evaluate their method on three well known flower datasets. Their classification results exceed 97% on all datasets, which are better than the state-of-the-art in this domain.
- Author(s): Chuan Lin ; Guili Xu ; Yijun Cao
- Source: IET Computer Vision, Volume 12, Issue 6, p. 863 –872
- DOI: 10.1049/iet-cvi.2017.0661
- Type: Article
- + Show details - Hide details
-
p.
863
–872
(10)
In the mammalian primary visual cortex, the response of the classical receptive field (CRF) to visual stimuli can be suppressed by inhibition of non-CRF (nCRF) neurons. Although many biologically plausible models based on these centre–surround interaction properties have been proposed, most of these models have failed to account for two important behaviours of neurons in the primary visual cortex (V1). First, saturation properties of neuron response. Second, the properties of fixational eye movements (FEyeMs). In the present study, the authors proposed a biologically motivated counter detection approach based on these properties. The authors’ work is significant in that they utilised a simple threshold method to ensure that CRF responses were observed within a meaningful range, and multichannel filter bank was proposed to simulate the influence of FEyeMs on nCRF. Both methods effectively preserved object contours and inhibition isolated textures. Extensive experiments indicated that the authors’ model can preserve more object contours and suppress more textures than previous biologically based models.
- Author(s): Lingfeng Qiao ; Hongya Tuo ; Jiexin Wang ; Chao Wang ; Zhongliang Jing
- Source: IET Computer Vision, Volume 12, Issue 6, p. 873 –881
- DOI: 10.1049/iet-cvi.2017.0438
- Type: Article
- + Show details - Hide details
-
p.
873
–881
(9)
Zero-shot learning (ZSL) aims to classify the objects without any training samples. Attributes are used to transfer knowledge from the training set to testing one in ZSL. Most ZSL methods based on Direct Attribute Prediction (DAP) assume that attributes are independent of each other. In this study, the authors explore the relationship between attributes and propose Joint Attribute Chain Prediction (JACP). Attribute chains are introduced to represent the relations. Conditional probabilities of attributes are estimated orderly along the chain to calculate the joint posteriors of the testing classes without independence assumptions. To reduce the estimation error, attribute relation clustering algorithm is presented to group the long chain into some unrelated small chains. When the max length of chains is one, JACP is essentially identical with DAP. Experiments on three data sets for zero-shot problem demonstrate the classification accuracy and efficiency of the authors’ algorithm. The results show that mining attribute relations can greatly improve the performance of ZSL effectively.
- Author(s): Zhenchong Zhao and Xiaodan Wang
- Source: IET Computer Vision, Volume 12, Issue 6, p. 882 –891
- DOI: 10.1049/iet-cvi.2017.0546
- Type: Article
- + Show details - Hide details
-
p.
882
–891
(10)
Naïve Bayes (NB) classifier has shown amazing performance in many real applications. However, the true probability distributions are usually unknown and tend to be quite complicated with high feature dimensions. Incorrect estimation models will decrease classification performance. In this study, a new method named multi-segment NB classifier is proposed to reduce errors caused by improper estimation models by implementing the classification directly in the likelihood space rather than through calculating posterior probability. The estimation of the conditional probability distribution is treated as a non-linear projection method which maps the original features into the likelihood space. Then, the mapped data is divided into some successive sub-segments and the classifier in each segment is trained by the corresponding sub-dataset, respectively. The discriminant functions are learned through a distance-measure method instead of a probability-based way and the parameters of the former classifier are used in the next training process in order to decrease the searching space. Experimental results on benchmark datasets demonstrate the effectiveness of the proposed method.
- Author(s): Ruigang Fu ; Biao Li ; Yinghui Gao ; Ping Wang
- Source: IET Computer Vision, Volume 12, Issue 6, p. 892 –899
- DOI: 10.1049/iet-cvi.2017.0636
- Type: Article
- + Show details - Hide details
-
p.
892
–899
(8)
Most of the traditional convolution neural network (CNN)-based classification models are flat classifiers, which have an underlying assumption that all classes are equally difficult to distinguish. However, visual separability between different object categories is highly uneven in the real world. Recently, hierarchical classification has been proven effective for CNNs, more and more attempts have been made to exploit category hierarchies in CNN models. In this study, the authors propose a novel hierarchical CNN architecture, called coarse-to-fine CNN. It is simple, with a proposed coarse-to-fine layer on the top of a generic CNN. The coarse-to-fine layer is inspired by the Bayesian equation, where the coarse prediction can affect the fine prediction directly. Arbitrary CNNs can perform the hierarchical classification by adding the proposed layer. The training of a coarse-to-fine CNN is end-to-end, it can be optimised by typical stochastic gradient descent. In the test phase, it outputs multiple hierarchical predictions simultaneously. Experimental results on the benchmark datasets MNIST, CIFAR-10, and CIFAR-100 show clear advantages over the compared baselines.
- Author(s): Jian Lian ; Sujuan Hou ; Xiaodan Sui ; Fangzhou Xu ; Yuanjie Zheng
- Source: IET Computer Vision, Volume 12, Issue 6, p. 900 –907
- DOI: 10.1049/iet-cvi.2018.0016
- Type: Article
- + Show details - Hide details
-
p.
900
–907
(8)
Various image pre-processing tasks in optical coherence tomography (OCT) systems involve reversing degradation effects (e.g. deblurring). Current deblurring research mainly focuses on how to build suitable degradation models using deconvolution operators. However, model-based solutions may not work well in many scenarios. To solve this problem, the authors propose a non-model architecture, called a deep convolutional neural network, to address parameter-free situations. The proposed solution employs a deep learning strategy to bridge the gap between traditional model-based methods and neural network architectures. Experiments on retinal OCT images demonstrate that the proposed approach achieves superior performance compared with the state-of-the-art model-based OCT deblurring methods.
- Author(s): Peng Yao ; Hua Zhang ; Yanbing Xue ; Shengyong Chen
- Source: IET Computer Vision, Volume 12, Issue 6, p. 908 –918
- DOI: 10.1049/iet-cvi.2017.0599
- Type: Article
- + Show details - Hide details
-
p.
908
–918
(11)
MeshStereo (MS) and cross-scale cost filtering (CSCF) are two most recently celebrated models for stereo matching. On one hand, MS model enlightens for fast solving the dense stereo correspondence problem according to a region-based opinion. On the other hand, CSCF model could generate more robust matching cost volumes than single scale. In this study, the authors weave these two models together for attaining greater and faster disparity estimation. With CSCF, more powerful initial volumes of matching cost are computed and they are conducted as the data term of MS energy function model. More importantly, the novel-fused stereo model also draws a closer connection between multi-scale aggregated and global algorithms. Integrating the advantages of both stereo models, they name the presented one as MS with cross-scale (MSCS). Performance evaluations on Middlebury v.2 and v.3 stereo data sets demonstrate that the proposed MSCS outperforms other four most challenging stereo matching algorithms; and also performs better on Microsoft i2i stereo videos. In addition, thanks to this novel-fused model, MSCS requires fewer iteration times for optimising and makes it surprisingly possesses a much faster execution time.
- Author(s): Qingqiang Wu ; Guanghua Xu ; Min Li ; Longting Chen ; Xin Zhang ; Jun Xie
- Source: IET Computer Vision, Volume 12, Issue 6, p. 919 –924
- DOI: 10.1049/iet-cvi.2017.0536
- Type: Article
- + Show details - Hide details
-
p.
919
–924
(6)
Many of current human pose estimation methods based on depth images require training stage. However, the training stage costs huge work on making samples. And many methods for human pose occlusion condition cannot work well. In this study, a novel approach to estimate human pose with a depth image called model-based recursive matching (MRM) is introduced. A human skeleton model with customised parameters is created based on T-pose to fit different body types. The authors use depth image and 3D point cloud corresponding to input. In contrast to previous work, the proposed method avoids training step and can give an accurate estimation in the case of the human occlusion condition. They demonstrate the method by comparing to the method Kinect offered by using random forest on 20 human poses. And the ground truth of coordinates of pose joint is made by the motion capture system. The result shows that the proposed method not only works well on the general human pose but also can deal with human occlusion better. And the authors’ method can be also applied to the disabled people and other creatures.
- Author(s): Seyed Ali Asghar Abbaszadeh Arani ; Ehsanollah Kabir ; Reza Ebrahimpour
- Source: IET Computer Vision, Volume 12, Issue 6, p. 925 –932
- DOI: 10.1049/iet-cvi.2017.0645
- Type: Article
- + Show details - Hide details
-
p.
925
–932
(8)
In this study, a method for holistic recognition of handwritten Farsi words is proposed, which fuses the outputs of right-to-left (RtL) and left-to-right (LtR) hidden Markov models (HMMs). The experimental results on 16,000 images of 200 names of Iranian cities, from the ‘Iranshahr 3’ are presented and compared with those methods using only RtL or LtR models. Experimental results show that the main sources of error are similar beginnings or similar endings of the words. Since RtL and LtR models when dealing with the words behave differently, there is notable error diversity between the two classifiers in such a way that their combination increases the recognition rate. Compared to the RtL-HMM, the product of output scores of the RtL and LtR-HMMs reduces the classification error to about 6, 6 and 3%, for three different feature sets. A subjective error analysis on the results is also provided.
- Author(s): Avinash Ratre and Vinod Pankajakshan
- Source: IET Computer Vision, Volume 12, Issue 6, p. 933 –940
- DOI: 10.1049/iet-cvi.2017.0469
- Type: Article
- + Show details - Hide details
-
p.
933
–940
(8)
The anomaly detection and localisation (ADL) gains remarkable interest as dealing with the complex surveillance videos for detecting the abnormal behaviour is tedious. The human effort in monitoring and classifying the abnormal object is inaccurate and time-consuming; therefore, the method is proposed using the Tucker tensor decomposition (TTD) and classification of the objects using Gaussian mixture model (GMM). Initially, the object is detected in the frames for easy recognition using simple background subtraction. The TTD decomposes the tensor as core tensor and factor matrices and the two decomposed tensors are compared using the cosine similarity measure that determines the location of the object in the frame. Finally, the features including shape and speed of the object are extracted that is used for classification using the GMM that follows the maximum posterior probability principle to detect and locate the anomaly in the video. The experimentation for anomaly detection proves that the proposed TTD and TTD-GMM method attains a higher rate of multiple object tracking precision, accuracy, sensitivity, and specificity at 0.96375, 0.975, 1, and 1, respectively.
Extended cuckoo search-based kernel correlation filter for abrupt motion tracking
Automatic underwater moving object detection using multi-feature integration framework in complex backgrounds
Superpixel texture analysis for classification of breast masses in dense background
Moving shadow detection based on stationary wavelet transform and Zernike moments
Perceptual quality evaluation for motion deblurring
Robust non-rigid point set registration method based on asymmetric Gaussian and structural feature
Structured RNN for human interaction
Smartphone camera-based analysis of ELISA using artificial neural network
Combining 2D and 3D features to improve road detection based on stereo cameras
Histograms of sequences: a novel representation for human interaction recognition
Flower classification using deep convolutional neural networks
Contour detection model based on neuron behaviour in primary visual cortex
Joint attribute chain prediction for zero-shot learning
Multi-segments Naïve Bayes classifier in likelihood space
CNN with coarse-to-fine layer for hierarchical classification
Deblurring retinal optical coherence tomography via a convolutional neural network with anisotropic and double convolution layer
MSCS: MeshStereo with Cross-Scale Cost Filtering for fast stereo matching
Human pose estimation method based on single depth image
Combining RtL and LtR HMMs to recognise handwritten Farsi words of small- and medium-sized vocabularies
Tucker tensor decomposition-based tracking and Gaussian mixture model for anomaly localisation and detection in surveillance videos
Most viewed content
Most cited content for this Journal
-
Brain tumour classification using two-tier classifier with adaptive segmentation technique
- Author(s): V. Anitha and S. Murugavalli
- Type: Article
-
Driving posture recognition by convolutional neural networks
- Author(s): Chao Yan ; Frans Coenen ; Bailing Zhang
- Type: Article
-
Local directional mask maximum edge patterns for image retrieval and face recognition
- Author(s): Santosh Kumar Vipparthi ; Subrahmanyam Murala ; Anil Balaji Gonde ; Q.M. Jonathan Wu
- Type: Article
-
Fast and accurate algorithm for eye localisation for gaze tracking in low-resolution images
- Author(s): Anjith George and Aurobinda Routray
- Type: Article
-
‘Owl’ and ‘Lizard’: patterns of head pose and eye pose in driver gaze classification
- Author(s): Lex Fridman ; Joonbum Lee ; Bryan Reimer ; Trent Victor
- Type: Article