IET Computer Vision
Volume 13, Issue 4, June 2019
Volumes & issues:
Volume 13, Issue 4
June 2019
-
- Author(s): Yingkun Xu ; Xiaolong Zhou ; Shengyong Chen ; Fenfen Li
- Source: IET Computer Vision, Volume 13, Issue 4, p. 355 –368
- DOI: 10.1049/iet-cvi.2018.5598
- Type: Article
- + Show details - Hide details
-
p.
355
–368
(14)
Deep learning has been proved effective in multiple object tracking, which confronts the difficulties of frequent occlusions, confusing appearance, in-and-out objects, and lack of enough labelled data. Recently, deep learning based multi-object tracking methods make a rapid progress from representation learning to network modelling due to the development of deep learning theory and benchmark setup. In this study, the authors summarise and analyse deep learning based multi-object tracking methods which are top-ranked in the public benchmark test. First, they investigate functionality of deep networks in these methods, and classify the methods into three categories as description enhancement using deep features, deep network embedding, and end-to-end deep network construction. Second, they review deep network structures in these methods, and detail the usage and training of these networks for multi-object tracking problem. Through experimental comparison of tracking results in the benchmarks in total and by group, they finally show the effectiveness of deep networks for tracking employed in different manners, and compare the advantages of these networks and their robustness under different tracking conditions. Moreover, they analyse the limitations of current methods, and draw some useful conclusions to facilitate the exploration of new directions for multi-object tracking.
Deep learning for multiple object tracking: a survey
-
- Author(s): Fateme Mostajer Kheirkhah and Habibollah Asghari
- Source: IET Computer Vision, Volume 13, Issue 4, p. 369 –375
- DOI: 10.1049/iet-cvi.2018.5028
- Type: Article
- + Show details - Hide details
-
p.
369
–375
(7)
The leaves of plants have rich information in recognition of plants. In general, agriculture experts accomplish information extraction from the leaves. Since the leaves contain useful features for recognising various types of plants, so these features can be extracted and applied by automatic image recognition algorithms to classify plant species. In this study, the authors investigate a novel approach for recognition of plant species using GIST texture features. Then, the principal and suitable features are selected by principal component analysis (PCA) algorithm. In the classification step, three different approaches such as Patternnet neural network, support vector machine, and K-nearest neighbour (KNN) algorithms were applied to the extracted features. For evaluation of the authors’ approach, they applied their proposed algorithm on three famous datasets. In comparison to some widely used features, the results show that their approach outperforms the other methods in the case of the time and the accuracy. The best results were achieved by applying PCA algorithm to GIST feature vector and using the Cosine KNN classifier.
- Author(s): Congcong Jin ; Jihua Zhu ; Yaochen Li ; Shanmin Pang ; Lei Chen ; Jun Wang
- Source: IET Computer Vision, Volume 13, Issue 4, p. 376 –384
- DOI: 10.1049/iet-cvi.2018.5296
- Type: Article
- + Show details - Hide details
-
p.
376
–384
(9)
Recently, the low-rank and sparse (LRS) matrix decomposition has been introduced as an effective mean to solve the multi-view registration. It views each available relative motion as a block element to reconstruct one sparse matrix, which then is used to approximate the low-rank matrix, where global motions can be recovered for multi-view registration. However, this approach is sensitive to the sparsity of the reconstructed matrix and it treats all block elements equally in spite of their varied reliabilities. Therefore, this study proposes an effective approach for multi-view registration by weighted LRS matrix decomposition. On the basis of the inverse symmetry property of relative motions, it first proposes a completion method to reduce the sparsity of the reconstructed matrix. The reduced sparsity of the reconstructed matrix can improve the robustness and efficiency of LRS matrix decomposition. Then, it proposes the weighted LRS matrix decomposition, where each block element is assigned with one estimated weight to denote its reliability. By introducing the weight, more accurate registration results can be efficiently recovered from the estimated low-rank matrix. Experimental results tested on public datasets illustrate the superiority of the proposed approach over the state-of-the-art approaches on robustness, accuracy and efficiency.
- Author(s): Xiaoyue Xu and Ying Chen
- Source: IET Computer Vision, Volume 13, Issue 4, p. 385 –394
- DOI: 10.1049/iet-cvi.2018.5130
- Type: Article
- + Show details - Hide details
-
p.
385
–394
(10)
Existing person re-identification (re-id) models mainly focus on still-image-based module, namely matching person images across non-overlapping camera views. Since video sequence contains much more information than still images and can be easily achieved by tracking algorithms in practical applications, the video re-id has attracted increasing attention in recent years. Distance learning is crucial for a re-id system. However, the computed distances in traditional video-based methods are easily distracted by the randomness of data distribution, especially with small sample size for training. To preferably distinguish different people, a novel regularised hull distance learning video-based person re-id method is proposed. It is advantageous in two aspects: robustness is guaranteed due to expanded video samples by regularised affine hull with limited ones, discriminability is ensured due to penalised hard negative samples more severely. Hence, the discriminability and robustness of the learnt metric are strengthened. Comparisons with the state-of-the-art video-based methods as well as related methods on PRID 2011, iLIDS-VID and MARS datasets demonstrate the superiority of the authors’ method.
- Author(s): Usman Muhammad ; Weiqiang Wang ; Abdenour Hadid ; Shahbaz Pervez
- Source: IET Computer Vision, Volume 13, Issue 4, p. 395 –403
- DOI: 10.1049/iet-cvi.2018.5069
- Type: Article
- + Show details - Hide details
-
p.
395
–403
(9)
The bag-of-words (BoW) model has been widely used for scene classification in recent state-of-the-art methods. However, inter-class similarity among scene categories and very high spatial resolution imagery makes its performance limited in the remote-sensing domain. Therefore, this research presents a new KAZE-based image descriptor that makes use of the BoW approach to substantially increase classification performance. Specifically, a novel multi-neighbourhood KAZE is proposed for small image patches. Secondly, the spatial pyramid matching and BoW representation can be adopted to use the extracted features and make an innovative BoW KAZE (BoWK) descriptor. Third, two bags of multi-neighbourhood KAZE features are selected in which each bag is regarded as separated feature descriptors. Next, canonical correlation analysis is introduced as a feature fusion strategy to further refine the BOWK features, which allows a more effective and robust fusion approach than the traditional feature fusion strategies. Experiments on three challenging remote-sensing data sets show that the proposed BoWK descriptor not only surpasses the conventional KAZE descriptor but also yields significantly higher classification performance than the state-of-the-art methods used now. Moreover, the proposed BoWK approach produces rich informative features to describe the scene images with low-computational cost and a much lower dimension.
- Author(s): Wei Lian ; Junyi Zuo ; Zeyu Ding
- Source: IET Computer Vision, Volume 13, Issue 4, p. 404 –410
- DOI: 10.1049/iet-cvi.2018.5366
- Type: Article
- + Show details - Hide details
-
p.
404
–410
(7)
To address the 3D point matching problem where the pose difference between two point sets is unknown, the authors propose a path following (PF)–based algorithm. This method works by reducing the objective function of robust point matching (RPM) algorithm to a function of point correspondence variable and then using PF for optimisation. By using the 3D similarity transformation which has few parameters, authors’ method needs no regularisation on transformation and, therefore, can handle the case when the pose difference between two point sets is unknown. The authors also propose a novel convex term for use in the PF algorithm which is based on the low-rank nature of authors’ objective function and leads to a PF algorithm which converges quickly. Experimental results demonstrated better robustness of the proposed method over state-of-the-art methods and authors’ method is also efficient.
- Author(s): Rui Li ; Xiaodan Wang ; Lei Lei ; Chongming Wu
- Source: IET Computer Vision, Volume 13, Issue 4, p. 411 –419
- DOI: 10.1049/iet-cvi.2018.5590
- Type: Article
- + Show details - Hide details
-
p.
411
–419
(9)
Recent research developments of extreme learning machine (ELM) with multilayer network architecture lead to a promising high performance with extremely fast training speed for representation learning. In this work, the authors are dedicated to develop an efficient and expressive representation learning method with hierarchical ELM, and proposing a novel architectural unit named as double random hidden layers ELM auto-encoder (DELM-AE). The novel DELM-AE consists of one input layer, two random hidden mapping layers for encoding feature, and one output layer for decoding feature. When stacking DELM-AE in the hierarchical structure, they can construct an H-DELM model, where the input of the current AE is the feature representation learned by the previous one, but the output is identical to the original input information and is not the input. Hence, the H-DELM can reproduce the original input data as much as possible to learn more expressive and compact feature. They validate their method on various widely public datasets, and the results demonstrate that H-DELM can bring significant performance improvements in terms of classification accuracy and robustness compared with existing relevant multilayer ELM and other deep learning algorithms at a slight computational cost.
- Author(s): Sung Woo Park and Junseok Kwon
- Source: IET Computer Vision, Volume 13, Issue 4, p. 420 –427
- DOI: 10.1049/iet-cvi.2018.5346
- Type: Article
- + Show details - Hide details
-
p.
420
–427
(8)
In this study, the authors propose an object proposal algorithm that can accurately propose object candidate regions at each frame, despite noise in a video. Accordingly, they define three orthogonal planes, namely vertical–horizontal, temporal–vertical, and temporal–horizontal planes. As these planes are orthogonal, they are the most compact planes that can span the spatiotemporal space of a video. Their algorithm selects good object proposals for the vertical–horizontal plane with the help of the object proposal results of the other planes. Experimental results demonstrate that the proposed algorithm produces better object proposals than the baseline algorithm and other state-of-the-art methods. In particular, their method provides more accurate object proposals in challenging environments with severe noise and background clutter. In addition, the object proposal results are utilised for visual tracking problems, and the experimental results show that their visual tracker outperforms recent deep-learning-based trackers.
- Author(s): Jian'an Zhang ; Qi Wang ; Yuan Yuan
- Source: IET Computer Vision, Volume 13, Issue 4, p. 428 –434
- DOI: 10.1049/iet-cvi.2018.5402
- Type: Article
- + Show details - Hide details
-
p.
428
–434
(7)
Mahalanobis metric learning is one of the most popular methods for person re-identification. Most existing metric learning methods regularly formulate the person re-identification as an unconstrained optimisation problem and the constraints on the Mahalanobis matrix are seldom imposed. In addition, weights are often used to model the relationships between different variables but they often suffer from boundedness caused by their hand-designed feature. Taking the above two disadvantages into consideration, the authors propose a new metric learning method for person re-identification, which formulates the metric learning problem as a constrained optimisation problem by imposing a constraint on the linear transformation matrix. Furthermore, they treat the weights as unknown variables and introduce a weight learning method instead of designing weight intuitively. Finally, they evaluate the proposed method on two challenging person re-identification databases and show that it performs favourably against the state-of-the-art approaches.
- Author(s): Liantao Wang ; Qingwu Li ; Jianfeng Lu
- Source: IET Computer Vision, Volume 13, Issue 4, p. 435 –441
- DOI: 10.1049/iet-cvi.2018.5325
- Type: Article
- + Show details - Hide details
-
p.
435
–441
(7)
Since the labelling for the positive images/videos is ambiguous in weakly supervised segment annotation, negative mining-based methods that only use the intra-class information emerge. In these methods, negative instances are utilised to penalise unknown instances for ranking their likelihood of being an object, which can be considered as voting in terms of similarity. However, these methods (i) ignore the information contained in positive bags; (ii) only rank the likelihood but cannot generate an explicit decision function. In this study, the authors propose a voting scheme involving not only the definite negative instances but also the ambiguous positive instances to make use of the extra useful information in the weakly labelled positive bags. In the scheme, each instance votes for its label with a magnitude arising from the similarity, and the ambiguous positive instances are assigned soft labels that are iteratively updated during the voting. It overcomes the limitations of voting using only the negative bags. They also propose an expectation kernel density estimation algorithm to gain further insight into the voting mechanism. Experimental results demonstrate the superiority of the authors’ scheme beyond the baselines.
Plant leaf classification using GIST texture features
Multi-view registration based on weighted LRS matrix decomposition of motions
Video-based person re-identification based on regularised hull distance learning
Bag of words KAZE (BoWK) with two-step classification for high-resolution remote sensing images
Low-rank path-following algorithm for 3D similarity registration
Representation learning by hierarchical ELM auto-encoder with double random hidden layers
Orthogonal object proposal and its application
Metric learning by simultaneously learning linear transformation matrix and weight matrix for person re-identification
Weakly supervised segment annotation via expectation kernel density estimation
Most viewed content
Most cited content for this Journal
-
Brain tumour classification using two-tier classifier with adaptive segmentation technique
- Author(s): V. Anitha and S. Murugavalli
- Type: Article
-
Driving posture recognition by convolutional neural networks
- Author(s): Chao Yan ; Frans Coenen ; Bailing Zhang
- Type: Article
-
Local directional mask maximum edge patterns for image retrieval and face recognition
- Author(s): Santosh Kumar Vipparthi ; Subrahmanyam Murala ; Anil Balaji Gonde ; Q.M. Jonathan Wu
- Type: Article
-
Fast and accurate algorithm for eye localisation for gaze tracking in low-resolution images
- Author(s): Anjith George and Aurobinda Routray
- Type: Article
-
‘Owl’ and ‘Lizard’: patterns of head pose and eye pose in driver gaze classification
- Author(s): Lex Fridman ; Joonbum Lee ; Bryan Reimer ; Trent Victor
- Type: Article