IET Computer Vision

Online ISSN 1751-9640
Print ISSN 1751-9632

Following the IET's partnership with Wiley, the updated IET Computer Vision homepage for the current Journal (2013 onwards) can now be found on the Wiley Online Library (WOL) [WOL IET Computer Vision]

This journal was previously known as IEE Proceedings - Vision, Image and Signal Processing 1994-2006. ISSN 1350-245X. more..

Latest content

A monocular image depth estimation method based on weighted fusion and point‐wise convolution
- Author(s): Chen Lei ; Liang Zhengyou ; Sun Yu
- + Show details - Hide details
- p. 1005 –1016 (12)
  Abstract
  The existing monocular depth estimation methods based on deep learning have difficulty in estimating the depth near the edges of the objects in an image when the depth distance between these objects changes abruptly and decline in accuracy when an image has more noises. Furthermore, these methods consume more hardware resources because they have huge network parameters. To solve these problems, this paper proposes a depth estimation method based on weighted fusion and point‐wise convolution. The authors design a maximum‐average adaptive pooling weighted fusion module (MAWF) that fuses global features and local features and a continuous point‐wise convolution module for processing the fused features derived from the (MAWF) module. The two modules work closely together for three times to perform weighted fusion and point‐wise convolution of features of multi‐scale from the encoder output, which can better decode the depth information of a scene. Experimental results show that our method achieves state‐of‐the‐art performance on the KITTI dataset with δ ₁ up to 0.996 and the root mean square error metric down to 8% and has demonstrated the strong generalisation and robustness.
  
  This paper proposes a depth estimation method based on weighted fusion and point‐wise convolution. The authors design a maximum‐average adaptive pooling weighted fusion module (MAWF) that fuses global features and local features and a continuous point‐wise convolution module (CPCM) for processing the fused features derived from the (MAWF) module and reducing the number of network model parameters, which can better decode the depth information of a scene. Experimental results show that our method achieves state‐of‐the‐art performance on the KITTI dataset.image
DASTSiam: Spatio‐temporal fusion and discriminative enhancement for Siamese visual tracking
- Author(s): Yucheng Huang ; Eksan Firkat ; Jinlai Zhang ; Lijuan Zhu ; Bin Zhu ; Jihong Zhu ; Askar Hamdulla
- + Show details - Hide details
- p. 1017 –1033 (17)
  Abstract
  The use of deep neural networks has revolutionised object tracking tasks, and Siamese trackers have emerged as a prominent technique for this purpose. Existing Siamese trackers use a fixed template or template updating technique, but it is prone to overfitting, lacks the capacity to exploit global temporal sequences, and cannot utilise multi‐layer features. As a result, it is challenging to deal with dramatic appearance changes in complicated scenarios. Siamese trackers also struggle to learn background information, which impairs their discriminative ability. Hence, two transformer‐based modules, the Spatio‐Temporal Fusion (ST) module and the Discriminative Enhancement (DE) module, are proposed to improve the performance of Siamese trackers. The ST module leverages cross‐attention to accumulate global temporal cues and generates an attention matrix with ST similarity to enhance the template's adaptability to changes in target appearance. The DE module associates semantically similar points from the template and search area, thereby generating a learnable discriminative mask to enhance the discriminative ability of the Siamese trackers. In addition, a Multi‐Layer ST module (ST + ML) was constructed, which can be integrated into Siamese trackers based on multi‐layer cross‐correlation for further improvement. The authors evaluate the proposed modules on four public datasets and show comparative performance compared to existing Siamese trackers.
3D layout estimation of general rooms based on ordinal semantic segmentation
- Author(s): Hui Yao ; Jun Miao ; Guoxiang Zhang ; Jun Chu
- + Show details - Hide details
- p. 855 –868 (14)
  Abstract
  Room layout estimation aims to predict the location and range of layout planes of interior spaces. Previous works treat each layout plane as an independent individual without considering the ordinal relation between walls, resulting the loss of the wall planes and the lack of integrity. This paper proposes a novel two‐branch neural networks model to estimate 3D layouts of cuboid and non‐cuboid room types. The model embeds the ordinal relation between layout planes into the layout segmentation branch through an proposed ordinal classification loss function, and outputs both pixel‐level layout segmentation maps and layout plane parameter maps. Then, the instance‐level plane parameters of each layout plane are determined by using an instance‐aware pooling layer. Finally, the sharpness of layout edges of the 2D layout semantic segmentation map is optimized by using an improved depth map intersection algorithm. Furthermore, we annotate a large‐scale 3D room layout estimation dataset, InteriorNet‐Layout, to obtain a steady model. Experiments on synthesized real‐world datasets show that the proposed method achieves faster calculation while maintaining high accuracy. Code is available at https://github.com/Hui‐Yao/3D‐ordinal‐layout‐estimation.
  
  imageIn the first stage, the network takes a single RGB image as input, predicts plane geometric parameters (H×W×3) using the layout‐plane parameter decoder; and predicts segmentation probability maps (C×H×W) through the layout segmentation branch. Then the instance‐level parameters (C×3) are obtained through the instance‐aware pooling, as the black arrow. In the second stage, the results are optimized by an improved depth‐map intersection algorithm.
SiamCCF: Siamese visual tracking via cross‐layer calibration fusion
- Author(s): Si Chen ; Huang Huang ; Shunzhi Zhu ; Huarong Xu ; Yifan He ; Da‐Han Wang
- + Show details - Hide details
- p. 869 –882 (14)
  Abstract
  Siamese networks have attracted wide attention in visual tracking due to their competitive accuracy and speed. However, the existing Siamese trackers usually leverage a fixed linear aggregation of feature maps, which does not effectively fuse the different layers of features with attention. Besides, most of Siamese trackers calculate the similarity between the template and the search region through a cross‐correlation operation between the features of the last blocks from the two branches, which might introduce the redundant noise information. In order to solve these problems, this study proposes a novel Siamese visual tracking method via cross‐layer calibration fusion, termed SiamCCF. An attention‐based feature fusion module is employed using local attention and non‐local attention to fuse the features from the deep and shallow layers, so as to capture both local details and high‐level semantic information. Moreover, a cross‐layer calibration module can use the fused features to calibrate the features of the last network blocks and build the cross‐layer long‐range spatial and inter‐channel dependencies around each spatial location. Extensive experiments demonstrate that the proposed method has achieved competitive tracking performance compared with state‐of‐the‐art trackers on challenging benchmarks, including OTB100, OTB2013, UAV123, UAV20L, and LaSOT.
  
  This study proposes a novel Siamese visual tracking method via cross‐layer calibration fusion, termed SiamCCF. We first employ an attention‐based feature fusion module (FFM) by using local attention and non‐local attention to fuse the features from the deep and shallow layers, so as to capture both local details and high‐level semantic information. Moreover, a cross‐layer calibration module (CCM) can use the fused features to calibrate the features of the last network blocks, and build the long‐range spatial and inter‐channel dependencies around each spatial location between the different layers.image
An encoder‐decoder framework with dynamic convolution for weakly supervised instance segmentation
- Author(s): Liangjun Zhu ; Li Peng ; Shuchen Ding ; Zhongren Liu
- + Show details - Hide details
- p. 883 –894 (12)
  Abstract
  In the systems of industrial robotics and autonomous vehicles, instance segmentation is widely employed. However, manually labelling an object outline is time‐consuming. In order to reduce annotation costs, we present a weakly supervised instance segmentation method in this article. A deeply convolutional network is first used to construct multi‐scale feature maps for each object in the input image. After that, the encoder‐decoder framework with dynamic convolution is utilised to enhance model capacity and efficiency, while avoiding the issues of anchor design, proposal selection, and RoIAlign implementation. In particular, Dynamic Heads are used in the encoder to create dynamic convolution kernels, while Instance Heads are used in the decoder to provide the global feature map. With dynamic convolution, each instance can be segmented independently, reducing interference with other instances and improving segmentation accuracy. Under the supervision of projection loss and pixel point colour pairing loss, the contours of each object are finally outlined. On the PASCAL VOC and MS COCO datasets, the proposed method is competitive with more sophisticated approaches. In the VOC dataset, segmentation performance achieved 37.6% average precision with ResNet‐101 and FPN networks. The extensively visualised results demonstrate the effectiveness of the proposed encoder‐decoder framework with dynamic convolution.
  
  To efficiently and effectively maximise utilisation of global image information, we propose a novel method named encoder‐decoder framework with dynamic convolution (EDDC) for weakly instance segmentation. It primarily consists of subnetworks of the backbone and neck, as well as the Dynamic Head and Instance Head. With only box‐level supervision, EDDC significantly reduced the costs of annotations and produced high‐quality segmentation results.image

Most downloaded

Article

content/journals/iet-cvi

Journal

Most cited

Brain tumour classification using two-tier classifier with adaptive segmentation technique
- Author(s): V. Anitha and S. Murugavalli
Driving posture recognition by convolutional neural networks
- Author(s): Chao Yan ; Frans Coenen ; Bailing Zhang
Local directional mask maximum edge patterns for image retrieval and face recognition
- Author(s): Santosh Kumar Vipparthi ; Subrahmanyam Murala ; Anil Balaji Gonde ; Q.M. Jonathan Wu
Fast and accurate algorithm for eye localisation for gaze tracking in low-resolution images
- Author(s): Anjith George and Aurobinda Routray
‘Owl’ and ‘Lizard’: patterns of head pose and eye pose in driver gaze classification
- Author(s): Lex Fridman ; Joonbum Lee ; Bryan Reimer ; Trent Victor

Login

Not registered yet?

Share

Tools

Login to add to favourites

Thank you

Latest tweets

Key

Latest content

A monocular image depth estimation method based on weighted fusion and point‐wise convolution

DASTSiam: Spatio‐temporal fusion and discriminative enhancement for Siamese visual tracking

3D layout estimation of general rooms based on ordinal semantic segmentation

SiamCCF: Siamese visual tracking via cross‐layer calibration fusion

An encoder‐decoder framework with dynamic convolution for weakly supervised instance segmentation

Most downloaded

Most cited

Brain tumour classification using two-tier classifier with adaptive segmentation technique

Driving posture recognition by convolutional neural networks

Local directional mask maximum edge patterns for image retrieval and face recognition

Fast and accurate algorithm for eye localisation for gaze tracking in low-resolution images

‘Owl’ and ‘Lizard’: patterns of head pose and eye pose in driver gaze classification