IET Computer Vision
Volume 13, Issue 1, February 2019
Volumes & issues:
Volume 13, Issue 1
February 2019
-
- Author(s): Guisik Kim and Junseok Kwon
- Source: IET Computer Vision, Volume 13, Issue 1, p. 1 –7
- DOI: 10.1049/iet-cvi.2018.5359
- Type: Article
- + Show details - Hide details
-
p.
1
–7
(7)
Here, the authors propose a novel tracking algorithm that can automatically modify the initial configuration of a target to improve the tracking accuracy in subsequent frames. To achieve this goal, the authors’ method analyses the likelihood landscape (LL) for the image patch described by the initial configuration. A good configuration has a unimodal distribution with a steep shape in the LL. Using the LL analysis, the authors’ method improves the initial configuration, resulting in more accurate tracking results. The authors improve the conventional LL analysis based on two ideas. First, the authors’ method analyses the LL in the RGB space rather than the grey space. Second, the method introduces an additional criterion for a good configuration: a high likelihood value at the mode. The authors further enhance their method through post-processing of the visual tracking results at each frame, where the estimated bounding boxes are modified by the LL analysis. The experimental results demonstrate that the authors’ advanced LL analysis helps improve the tracking accuracy of several baseline trackers on a visual tracking benchmark data set. In addition, the authors’ simple post-processing technique significantly enhances the visual tracking performance in terms of precision and success rate.
- Author(s): Wenbin Xie ; Hong Yin ; Meini Wang ; Yan Shao ; Bosi Yu
- Source: IET Computer Vision, Volume 13, Issue 1, p. 8 –14
- DOI: 10.1049/iet-cvi.2018.5256
- Type: Article
- + Show details - Hide details
-
p.
8
–14
(7)
A novel abnormity detection method is presented which combines the low-rank structured sparse representation and reduced dictionary learning. The multi-scale three-dimensional gradient is used as low-level feature by encoding the spatiotemporal information. A group of reduced sparse dictionaries is learnt by low-rank approximation based on the structured sparsity property of the video sequence. The contribution of this study is three-fold: (i) the normal feature clusters can be represented effectively by the reduced dictionaries which are learnt based on the low-rank nature of the data; (ii) the size of dictionary is determined adaptively by the sparse learning method according to the scene, which makes the representation more compact and efficient; and (iii) the proposed abnormity detection method is of low time complexity and real-time detection can be obtained. The authors have evaluated the proposed method against the state-of-the-art methods on the public datasets and very promising results have been achieved.
- Author(s): Sachin Chaudhary and Subrahmanyam Murala
- Source: IET Computer Vision, Volume 13, Issue 1, p. 15 –22
- DOI: 10.1049/iet-cvi.2018.5020
- Type: Article
- + Show details - Hide details
-
p.
15
–22
(8)
Recognition of human actions from videos can be improved if depth information is available. Depth information certainly helps in segregating foreground motion from the background. Single image depth estimation (SIDE) is a commonly used method for the analysis of weather degraded images. In this study, the idea of SIDE is extended to human action recognition (HAR) on datasets where depth information is not available. Several depth-based HAR algorithms are available but all of them are using the depth information given with the dataset. Some other methods are using depth motion map which refers to the depth of motion in a temporal direction. Here, a new depth-based end-to-end deep network is proposed for HAR in which the frame-wise depth is estimated and this estimated depth is used for processing instead of RGB frame. As colour information is not required for estimating motion, a single channel depth map is used for estimating motion in the video. It makes the system computationally efficient. The proposed method is tested and verified on three benchmark datasets namely JHMDB, HMDB51 and UCF101. The proposed method outperforms the existing state-of-the-art methods for HAR on all the three tested datasets.
- Author(s): Michael George ; Babita Roslind Jose ; Jimson Mathew ; Pranjali Kokare
- Source: IET Computer Vision, Volume 13, Issue 1, p. 23 –30
- DOI: 10.1049/iet-cvi.2018.5240
- Type: Article
- + Show details - Hide details
-
p.
23
–30
(8)
The spread of surveillance cameras has necessitated the monitoring of large quantities of surveillance video feeds. A manual monitoring system is near impossible due to the large man-hour requirements. Recently, automatic abnormal activity detection has been an area of interest among researchers. A spatio-temporal feature, histogram of optical flow orientation and magnitude (HOFM), has produced impressive ability in detecting abnormal activities. The authors propose a novel non-uniform spatio-temporal region resembling parallelepipeds, from which they extract the HOFM features. Autoencoders can be configured to detect abnormal patterns. The authors have used these abilities of the autoencoders to detect abnormalities in the HOFM features extracted from their novel spatio-temporal regions of the video feeds. The autoencoders are trained on the HOFM features of the videos containing no abnormalities. The autoencoders are then fed with the HOFM features of the videos to be tested for abnormal activities, and these are detected based on the abilities of the autoencoders to reconstruct these features. The proposed method is tested on the standard abnormality detection datasets: UCSD Ped1, UCSD Ped2, Subway Entrance, Subway Exit, and UMN.
- Author(s): Murari Mandal ; Mallika Chaudhary ; Santosh Kumar Vipparthi ; Subrahmanyam Murala ; Anil Balaji Gonde ; Shyam Krishna Nagar
- Source: IET Computer Vision, Volume 13, Issue 1, p. 31 –43
- DOI: 10.1049/iet-cvi.2018.5206
- Type: Article
- + Show details - Hide details
-
p.
31
–43
(13)
In this study, new feature descriptors are designed for medical image retrieval and change detection applications, respectively. Inspired by isomerism, the authors propose a novel feature descriptor named antithetic isomeric cluster pattern (ANTIC). The ANTIC is defined by the two properties: cluster patterns and antithetic isomerism (ANTI). The cluster pattern corresponds to successive pixel intensity differences at antithetical orientations. Furthermore, the ANTI is characterised by two aspects: first, the clusters are oppositely oriented (antithetical) to each other and second, both adhere to a defined isomeric property. The ANTIC identifies the lines and corner point information in the local neighbourhood across various directions. To attain enhanced robustness, they further proposed multiresolution ANTIC by integrating the multiresolution Gaussian filter. Moreover, to reduce the feature dimensionality, they extended their work to rotation invariant features. The proposed method outperforms other widely used feature descriptors in biomedical and retinopathy image retrieval applications. In addition, they extracted spatiotemporal features by designing intra-ANTIC and inter-ANTIC to detect motion changes in video sequences. They validated the effectiveness of these features by conducting experiments on CDNet 2014 dataset. The proposed descriptor achieves better performance in various challenging conditions for change detection as compared to other state-of-the-art techniques.
- Author(s): Changda Xing ; Zhisheng Wang ; Fanliang Meng ; Chong Dong
- Source: IET Computer Vision, Volume 13, Issue 1, p. 44 –52
- DOI: 10.1049/iet-cvi.2018.5027
- Type: Article
- + Show details - Hide details
-
p.
44
–52
(9)
Edge-preserving filters have been applied to Multi-Scale Decomposition (MSD) for fusion of infrared and visible images. Traditional edge-preserving MSDs may hardly make satisfied structural separation from details to cause fusion performance degradation. To suppress this challenge, the authors propose a novel fusion of infrared and visible images with Gaussian smoothness and joint bilateral filtering iteration decomposition (MSD-Iteration). This method consists of three steps. First, source images are decomposed by the Gaussian smoothness and joint bilateral filtering iteration. The implementation includes the fine-scale detail removal with Gaussian filtering, edge and structure extraction with joint bilateral filtering iteration, and detail obtaining at multi-scales. The decomposition has edge-preserving and scale-aware properties to improve detail acquisition. Second, rules are designed to conduct the layer combination. For the rule of base layers, saliency maps are constructed by Laplacian and Gaussian low-pass filters to calculate initial weight maps. A guided filter is further applied to determine final weight maps for the combination. Meanwhile, they use the regional average energy weighting to obtain decision maps at multi-scales by constructing intensity deviation to combine detail layers. Third, they implement the reconstruction with the combined layers. Sufficient experiments are presented to evaluate MSD-Iteration, and experimental results validate the superiority of the authors’ method.
- Author(s): Hao Zhou ; Anqi Han ; Haodong Yang ; Jun Zhang
- Source: IET Computer Vision, Volume 13, Issue 1, p. 53 –60
- DOI: 10.1049/iet-cvi.2018.5035
- Type: Article
- + Show details - Hide details
-
p.
53
–60
(8)
Image semantic segmentation is a challenging problem for low-level computer vision. Recently, deep convolutional neural networks (DCNNs) have been proved to achieve outstanding performance in image semantic segmentation. Most current methods still have some problems in segmenting the object edges and the integrity of objects. In this study, the authors first construct the difference-pooling module in the DCNNs to extract the object edge gradients and get finer boundary in segmentation results. Then the combination of the pyramid pooling module and the atrous spatial pyramid pooling extracts the image global features and the context structure information by building long-distance dependency between pixels, which is just like a simple fully connected conditional random field (CRF). Different from other methods, the proposed method does not need extra pre-processing and post-processing steps, such as extracting gradient features by the traditional algorithm and building context relationships by CRF. Finally, the experimental results on the PASCAL VOC2012 benchmark indicate that the proposed model can obtain the finer boundaries and more complete parts.
- Author(s): Ya Chao ; Xingchen Chen ; Nanfeng Xiao
- Source: IET Computer Vision, Volume 13, Issue 1, p. 61 –70
- DOI: 10.1049/iet-cvi.2018.5002
- Type: Article
- + Show details - Hide details
-
p.
61
–70
(10)
To improve the accuracy of robotic grasp in some uncertain environments, a deep learning-based object-detection method for a five-fingered industrial robot hand model is proposed in this study. The authors first design a five-fingered industrial robot hand model with 21-degrees of freedom (DOF). Based on the sensor data of a 5DT data glove, the industrial robot hand can be controlled in real time. They use the object-detection network's faster regions convolutional neural network and single shot multibox detector to locate the grasp objects. To optimise the robotic grasp detection, two grasp-predictor methods, direct grasp predictor and multi-modal grasp predictor, are applied to obtain the best graspable region. In the simulation designed in this study, cooperating with a 6-DOF robot arm, the five-fingered industrial robot hand can detect an object accurately and grasp it steadily.
- Author(s): Long Gao ; Yunsong Li ; Jifeng Ning
- Source: IET Computer Vision, Volume 13, Issue 1, p. 71 –78
- DOI: 10.1049/iet-cvi.2018.5138
- Type: Article
- + Show details - Hide details
-
p.
71
–78
(8)
Support vector machine (SVM) based tracking algorithms training with dense circulant samples have shown favourable performance due to its strong discriminative power and high efficiency. However, the challenges caused by the circulant sampling remain unaddressed. In this study, the authors give each training sample a weight based on their accuracy to reduce the influence of inaccurate samples. Moreover, they reform the SVM model with weighted circulant training samples. Secondly, they advocate an efficient solution by using the property of circulant matrices to solve the learning problem. Thirdly, a model update strategy is introduced to prevent the tracking models polluted by wrong samples. Experimental results on large benchmark datasets with 50 and 100 video sequences demonstrate that the authors’ tracking algorithms achieve state-of-art performance in terms of precision and accuracy. In addition, their tracker runs in real time.
- Author(s): Ting Cao ; Weixing Wang ; Susan Tighe ; Shenglin Wang
- Source: IET Computer Vision, Volume 13, Issue 1, p. 79 –85
- DOI: 10.1049/iet-cvi.2018.5337
- Type: Article
- + Show details - Hide details
-
p.
79
–85
(7)
In civil engineering, crack detection using image processing has gained much attention among researchers and transportation agencies. As the crack image often presents a fuzzy boundary and random shape, it is difficult to achieve satisfactory detection performance. This study proposes a crack detection method based on the fractional differential and fractal dimension. This method achieves image enhancement and crack extraction in two stages. First, an image enhancement algorithm based on the fractional differential is applied to solve the fuzzy crack boundary. This algorithm can enhance the crack boundary information significantly while simultaneously maintaining texture details. Second, an improved extraction algorithm based on the fractal dimension is studied. This algorithm can effectively accomplish crack extraction according to shape features. Last, upon comparisons with classic and state-of-the-art methods, the experiment shows that the proposed method can achieve satisfactory results for crack image detection.
Robust visual tracking with adaptive initial configuration and likelihood landscape analysis
Low-rank structured sparse representation and reduced dictionary learning-based abnormity detection
Depth-based end-to-end deep network for human action recognition
Autoencoder-based abnormal activity detection using parallelepiped spatio-temporal region
ANTIC: antithetic isomeric cluster patterns for medical image retrieval and change detection
Fusion of infrared and visible images with Gaussian smoothness and joint bilateral filtering iteration decomposition
Edge gradient feature and long distance dependency for image semantic segmentation
Deep learning-based grasp-detection method for a five-fingered industrial robot hand
Maximum margin object tracking with weighted circulant feature maps
Crack image detection based on fractional differential and fractal dimension
Most viewed content
Most cited content for this Journal
-
Brain tumour classification using two-tier classifier with adaptive segmentation technique
- Author(s): V. Anitha and S. Murugavalli
- Type: Article
-
Driving posture recognition by convolutional neural networks
- Author(s): Chao Yan ; Frans Coenen ; Bailing Zhang
- Type: Article
-
Local directional mask maximum edge patterns for image retrieval and face recognition
- Author(s): Santosh Kumar Vipparthi ; Subrahmanyam Murala ; Anil Balaji Gonde ; Q.M. Jonathan Wu
- Type: Article
-
Fast and accurate algorithm for eye localisation for gaze tracking in low-resolution images
- Author(s): Anjith George and Aurobinda Routray
- Type: Article
-
‘Owl’ and ‘Lizard’: patterns of head pose and eye pose in driver gaze classification
- Author(s): Lex Fridman ; Joonbum Lee ; Bryan Reimer ; Trent Victor
- Type: Article