IET Image Processing
Volume 13, Issue 14, 12 December 2019
Volumes & issues:
Volume 13, Issue 14
12 December 2019
-
- Source: IET Image Processing, Volume 13, Issue 14, p. 2659 –2661
- DOI: 10.1049/iet-ipr.2019.1505
- Type: Article
- + Show details - Hide details
-
p.
2659
–2661
(3)
- Author(s): Heng Liu ; Xiaoyu Zheng ; Jungong Han ; Yuezhong Chu ; Tao Tao
- Source: IET Image Processing, Volume 13, Issue 14, p. 2662 –2672
- DOI: 10.1049/iet-ipr.2018.6545
- Type: Article
- + Show details - Hide details
-
p.
2662
–2672
(11)
Face hallucination aims to produce a high-resolution face image from an input low-resolution face image, which is of great importance for many practical face applications, such as face recognition and face verification. Since the structure of the face image is complex and sensitive, obtaining a super-resolved face image is more difficult than generic image super-resolution. Recently, with great success in the high-level face recognition task, deep learning methods, especially generative adversarial networks (GANs), have also been applied to the low-level vision task – face hallucination. This work is to provide a model evolvement survey on GAN-based face hallucination. The principles of image resolution degradation and GAN-based learning are presented firstly. Then, a comprehensive review of the state-of-art GAN-based face hallucination methods is provided. Finally, the comparisons of these GAN-based face hallucination methods and the discussions of the related issues for future research direction are also provided.
- Author(s): Jiaojiao Qiao ; Huihui Song ; Kaihua Zhang ; Xiaolu Zhang ; Qingshan Liu
- Source: IET Image Processing, Volume 13, Issue 14, p. 2673 –2679
- DOI: 10.1049/iet-ipr.2018.6570
- Type: Article
- + Show details - Hide details
-
p.
2673
–2679
(7)
Recently, extensive studies on a generative adversarial network (GAN) have made great progress in single image super-resolution (SISR). However, there still exists a significant difference between the reconstructed high-frequency and the real high-frequency details. To address this issue, this study presents an SISR approach based on conditional GAN (SRCGAN). SRCGAN includes a generator network that generates super-resolution (SR) images and a discriminator network that is trained to distinguish the SR images from ground-truth high-resolution (HR) ones. Specifically, the discriminator network uses the ground-truth HR image as a conditional variable, which guides the network to distinguish the real images from the SR images, facilitating training a more stable generator model than GAN without this guidance. Furthermore, a residual-learning module is introduced into the generator network to solve the issue of detail information loss in SR images. Finally, the network is trained in an end-to-end manner by optimizing a perceptual loss function. Extensive evaluations on four benchmark datasets including Set5, Set14, BSD100, and Urban100 demonstrate the superiority of the proposed SRCGAN over state-of-the-art methods in terms of PSNR, SSIM, and visual effect.
- Author(s): Jiefu Chen ; Yanli Ji ; Hua Chen ; Xing Xu
- Source: IET Image Processing, Volume 13, Issue 14, p. 2680 –2686
- DOI: 10.1049/iet-ipr.2019.0009
- Type: Article
- + Show details - Hide details
-
p.
2680
–2686
(7)
Owing to the complex structure of Chinese characters and the huge number of Chinese characters, it is very challenging and time consuming for artists to design a new font of Chinese characters. Therefore, the generation of Chinese characters and the transformation of font styles have become research hotspots. At present, most of the models on Chinese character transformation cannot generate multiple fonts, and they are not doing well in faking fonts. In this article, the authors propose a novel method of Chinese character fonts transformation and generation based on generative adversarial networks. The authors’ model is able to generate multiple fonts at once through font style-specifying mechanism and it can generate a new font at the same time if the authors combine the characteristics of existing fonts.
- Author(s): Hanqiao Huang ; Yufei Zha ; Meiyun Zheng ; Peng Zhang
- Source: IET Image Processing, Volume 13, Issue 14, p. 2687 –2693
- DOI: 10.1049/iet-ipr.2018.6672
- Type: Article
- + Show details - Hide details
-
p.
2687
–2693
(7)
Tracking based on correlation filters has demonstrated outstanding performance in recent visual object tracking studies and competitions. However, the performance is limited since the boundary effects are introduced by the intrinsic circular structure. In this study, a tracker, called adversarial correlation filter tracker (ACFT), is proposed to solve the above problem through Generative Adversarial Networks (GANs) that is specifically strong at producing realistic-looking data from noise circumstances. Especially, a mask is generated by the GANs to assist the conventional correlation filter for the spatial regularisation. By overcoming the feature independence of current regularisation in another tracker, the GANs’ mask can be effectively used to identify the robust features for the target variations representation in the temporal domain. Also in the spatial domain, the background features can be substantially suppressed to obtain the optimisation filter for more reliable matching and updating. In verification, the authors evaluate the proposed tracker on the standard tracking benchmarks, and the experimental results show that their tracker outperforms favourably against other state-of-the-art trackers in the measurements of accuracy and robustness.
- Author(s): Xiaolong Jiang ; Zehao Xiao ; Baochang Zhang ; Xianbin Cao
- Source: IET Image Processing, Volume 13, Issue 14, p. 2694 –2705
- DOI: 10.1049/iet-ipr.2018.6699
- Type: Article
- + Show details - Hide details
-
p.
2694
–2705
(12)
Object tracking is challenged by the varying appearances of targets and the real-time requirement. Siamese regression trackers, being one of the most popular tracking paradigms, excel in efficiency but suffer at adaptability to cope with appearance variations. To improve their adaptability, the authors propose a new adaptive Siamese (ASiam) tracker, which integrates a novel adversarial template generation module and a motion-based failure recovery module. The template generation module exploits the temporal coherence and evolution of target appearance variations encoded in preceding tracklets and then generates an adaptive target template online which approximates the varying target in the coming frame. This generation module is optimised via adversarial learning to achieve accurate appearance prediction and sharp template quality. The generated template, together with a search region, are fed into a Siamese tracking backbone to compute an appearance response map via dense similarity computation in a sliding-window way. At frames where the Siamese tracking fails, the failure recovery module is invoked to perform deep frame differencing motion detection to provide a motion response map. By fusing different response maps, the drifted tracker can be re-calibrated. Extensive experiments on the OTB2013, OTB2015, and VOT2016 datasets prove the accuracy and efficiency of the proposed tracker.
- Author(s): Yao Peng and Hujun Yin
- Source: IET Image Processing, Volume 13, Issue 14, p. 2706 –2715
- DOI: 10.1049/iet-ipr.2018.6576
- Type: Article
- + Show details - Hide details
-
p.
2706
–2715
(10)
Facial expression synthesis has drawn increasing attention in computer vision, graphics and animation. Recently, generative adversarial nets (GANs) have become a new perspective for face synthesis and have had remarkable success in generating photorealistic images and image-to-image translation. In this study, the authors present an appearance-based facial expression synthesis framework, ApprGAN, by combining shape and texture and introducing cycle consistency and identity mapping into the adversarial learning. Specifically, given an input face image, a pair of shape and texture generators are trained for synthetic shape deformation and expression detail generation, respectively. Extensive experiments on expression synthesis and cross-database synthesis were conducted, together with comparisons with the existing methods. Results of expression synthesis and quantitative verification on various databases show the effectiveness of ApprGAN in synthesising photorealistic and identity-preserving expressions and its marked improvement over the existing methods.
- Author(s): Songyan Liu ; Chaoyang Zhao ; Yunze Gao ; Jinqiao Wang ; Ming Tang
- Source: IET Image Processing, Volume 13, Issue 14, p. 2716 –2723
- DOI: 10.1049/iet-ipr.2019.0103
- Type: Article
- + Show details - Hide details
-
p.
2716
–2723
(8)
Images can be considered as the combination of two parts: the content and the style. The authors’ approach can leverage this property by extracting a certain unique style from the reference images and combining it to generate images with new contents. With a well-defined style feature extraction module, they propose a novel framework to generate images with various styles and the same content. To train the style specific image generation model efficiently, a double-cycle training strategy is proposed: they input two natural-content pairs simultaneously, extract their style features, and exchange them twice to obtain the reconstruction of the input natural images. What is more, they apply the triplet margin loss to the style feature extracted from the images before and after style exchange and an adversarial discriminator to force the style-exchanged images to be real. They perform experiments on licence-plate image, Chinese characters, and shoes or handbags images generating, obtain photo-realistic results and remarkably improve the corresponding supervised recognition task.
- Author(s): Yumin Tian ; Chenhui Peng ; Di Wang ; Bo Wan
- Source: IET Image Processing, Volume 13, Issue 14, p. 2724 –2734
- DOI: 10.1049/iet-ipr.2018.6388
- Type: Article
- + Show details - Hide details
-
p.
2724
–2734
(11)
The moving target detection and tracking in aerial video is a challenge task because of its moving background, smaller target sizes, lower resolution and limited onboard computing resources. In this study, a high confidence detection method based on background compensation and three-frame-difference method is designed, which can detect moving objects in a dynamic background accurately. First, the authors use local feature extraction and matching for image registration and demonstrate that speed-up robust feature key points are suitable for the stabilisation task. Then, they estimate the global camera motion parameters using affine transformation which are obtained by the random sample consensus algorithm. Finally, they detect moving object by three-frame-difference method. As the detection results of the frame-difference method generally exists ‘empty’ and noise, in order to select the two higher-quality differential images to perform the logic AND operation, they add image quality assessment to the three-frame-difference method to obtain more accurate moving objects. Moreover, the edge detection algorithm and morphological processing are integrated together to further boost the overall detecting performance. The extensive empirical evaluations on aerial videos demonstrate that the proposed detector is very promising for the various challenging scenarios.
- Author(s): Ming Dong ; Xiangmo Zhao ; Xing Fan ; Chao Shen ; Zhanwen Liu
- Source: IET Image Processing, Volume 13, Issue 14, p. 2735 –2743
- DOI: 10.1049/iet-ipr.2018.6696
- Type: Article
- + Show details - Hide details
-
p.
2735
–2743
(9)
Road detection is one of the crucial tasks for scene understanding in autonomous driving. Recently, methods based on deep learning had rapidly grown and addressed this task excellently, because they can extract more abundant features. In this study, the authors consider the visual road detection problem as a classification for each pixel of the given image, which is road or non-road. There is complex illumination encounter in traffic applications, so that the detection model has poor adaptability. They address this problem by proposing a deep network architecture, which combines the network U-Net-prior and domain adaptation model (DAM). U-Net-prior is a modified segmentation network which integrates location prior and shape prior into U-Net. DAM is a model for reducing the gap between training images and test images, which is optimised in adversarial learning to make the features extracted from different datasets close to each other. They validate the effectiveness of each component of the algorithm, and compare the overall architecture with other state-of-the-art methods, and the results show that the architecture achieves top accuracies with the shortest run time in monocular-vision-based methods, simultaneously, compared with the methods based on other sensors, the architecture also achieves a competitive result.
- Author(s): Shilian Wu ; Wei Zhai ; Yang Cao
- Source: IET Image Processing, Volume 13, Issue 14, p. 2744 –2752
- DOI: 10.1049/iet-ipr.2018.6588
- Type: Article
- + Show details - Hide details
-
p.
2744
–2752
(9)
Rapid progress on text image recognition has been achieved with the development of deep-learning techniques. However, it is still a great challenge to achieve a comprehensive license plate recognition in the real scenes, since there are no publicly available large diverse datasets for the training of deep learning models. This paper aims at synthesising of license plate images with generative adversarial networks (GAN), refraining from collecting a vast amount of labelled data. The authors thus propose a novel PixTextGAN that leverages a controllable architecture that generates specific character structures for different text regions to generate synthetic license plate images with reasonable text details. Specifically, a comprehensive structure-aware loss function is presented to preserve the key characteristic of each character region and thus to achieve appearance adaption for better recognition. Qualitative and quantitative experiments demonstrate the superiority of authors’ proposed method in text image synthetisation over state-of-the-art GANs. Further experimental results of license plate recognition on ReId and CCPD dataset demonstrate that using the synthesised images by PixTextGAN can greatly improve the recognition accuracy.
- Author(s): Long Chen and Zhi Zhong
- Source: IET Image Processing, Volume 13, Issue 14, p. 2753 –2762
- DOI: 10.1049/iet-ipr.2018.6363
- Type: Article
- + Show details - Hide details
-
p.
2753
–2762
(10)
Graph-based transductive learning (GTL) is the efficient semi-supervised learning technique which is always employed in that sufficient labeled samples can not be obtained. Conventional GTL methods generally construct a inaccurate graph in feature domain and they are not able to align feature information with label information. To address these issues, we propose an approach called Progressive Graph-based subspace transductive learning (PGSTL) in this paper. PGSTL gradually find the intrinsic relationship between samples that more accurately aligns feature with label. Meanwhile, PGSTL develops a feature affinity matrix in the subspace of original high-dimensional feature space, which effectively reduce the interference of noise points. And then, the representative relation matrix and the feature affinity matrix are optimized by iterative optimization strategy and finally aligned. In this way, PGSTL can not only effectively reduce the interference of noisy points, but also comprehensively consider the information in the feature and label domain of data. Extensive experimental results on various benchmark datasets demonstrate that the PGSTL achieves the best performance compared to some state-of-the-art semi-supervised learning methods.
- Author(s): Fu Sichao ; Liu Weifeng ; Li Shuying ; Zhou Yicong
- Source: IET Image Processing, Volume 13, Issue 14, p. 2763 –2771
- DOI: 10.1049/iet-ipr.2018.6224
- Type: Article
- + Show details - Hide details
-
p.
2763
–2771
(9)
Currently, deep learning (DL) algorithms have achieved great success in many applications including computer vision and natural language processing. Many different kinds of DL models have been reported, such as DeepWalk, LINE, diffusionconvolutional neural networks, graph convolutional networks (GCN), and so on. The GCN algorithm is a variant of convolutional neural network and achieves significant superiority by using a one-order localised spectral graph filter. However, only a one-order polynomial in the Laplacian of GCN has been approximated and implemented, which ignores undirect neighbour structure information. The lack of rich structure information reduces the performance of the neural networks in the graph structure data. In this study, the authors deduce and simplify the formula of two-order spectral graph convolutions to preserve rich local information. Furthermore, they build a layerwise GCN based on this two-order approximation, i.e. two-order GCN (TGCN) for semi-supervised classification. With the two-order polynomial in the Laplacian, the proposed TGCN model can assimilate abundant localised structure information of graph data and then boosts the classification significantly. To evaluate the proposed solution, extensive experiments are conducted on several popular datasets including the Citeseer, Cora, and PubMed dataset. Experimental results demonstrate that the proposed TGCN outperforms the state-of-art methods.
- Author(s): Rui Shao and Xiangyuan Lan
- Source: IET Image Processing, Volume 13, Issue 14, p. 2772 –2777
- DOI: 10.1049/iet-ipr.2018.6687
- Type: Article
- + Show details - Hide details
-
p.
2772
–2777
(6)
Unsupervised visual domain adaptation aims to train a classifier that works well on a target domain given labelled source samples and unlabelled target samples. The key issue in unsupervised visual domain adaptation is how to do the feature alignment between source and target domains. Inspired by the adversarial learning in generative adversarial networks, this study proposes a novel adversarial auto-encoder for unsupervised deep domain adaptation. This method incorporates the auto-encoder with the adversarial learning so that the domain similarity and reconstruction information from the decoder can be exploited to facilitate the adversarial domain adaptation in the encoder. Extensive experiments on various visual recognition tasks show that the proposed method performs favourably against or better than competitive state-of-the-art methods.
- Author(s): Swarup Kr Ghosh ; Biswajit Biswas ; Anupam Ghosh
- Source: IET Image Processing, Volume 13, Issue 14, p. 2778 –2789
- DOI: 10.1049/iet-ipr.2018.6582
- Type: Article
- + Show details - Hide details
-
p.
2778
–2789
(12)
Retinal fundus images are used for the diagnosis and treatment of various eye diseases such as diabetic retinopathy, glaucoma, exudates and so on. The retinal vasculature is difficult to investigate retinal conditions due to the presence of various noises in the retinal image during the capture of the image. Removal of noise is an important aspect for better visibility and diagnosis of the noisy fundus in ophthalmology. This study represents a deep learning based approach to denoising images and restoring features using stack denoising convolutional autoencoder. The proposed scheme is implemented to restore the structural details of fundus as well as to decrease the noise level. Furthermore, the proposed model utilises shared layers with the optimal manner to reduce the noise level of the target image with minimal computational cost. To restore an image, the proposed model brings a patched base training on samples to suppress with one to one manner without any loss of information. To access the denoising effect of the proposed scheme, several standard fundus databases such as DRIVE, STARE and DIARETDB1 have been tested in this study. Comparing the efficiency of the suggested model with state-of-art methods, the proposed scheme gives better result in terms of qualitative and quantitative analysis.
- Author(s): Achmad Abdurrazzaq ; Ismail Mohd ; Ahmad Kadri Junoh ; Zainab Yahya
- Source: IET Image Processing, Volume 13, Issue 14, p. 2790 –2795
- DOI: 10.1049/iet-ipr.2018.6201
- Type: Article
- + Show details - Hide details
-
p.
2790
–2795
(6)
Noise is the information damage that may occur in the image due to the changes in information during the transmission process. In order to overcome these problems, it is necessary to do filtering process on the image. Until now many filtering algorithms have been proposed to remove noise. Most existing methods only work for low level noise. In this study, the authors proposed an efficient and easy-to-understand filtering algorithm using the concept of tropical algebra and singular value decomposition (SVD). The SVD will be used to detect noise in 3 × 3 templates. Furthermore, if noise is detected then new pixels will be obtained by using the concept of tropical operations. The results of this study show that the proposed method provides better results from the existing methods in terms of quantitative and visual.
- Author(s): Xiao Ke ; Jianping Li ; Wenzhong Guo
- Source: IET Image Processing, Volume 13, Issue 14, p. 2796 –2804
- DOI: 10.1049/iet-ipr.2018.6571
- Type: Article
- + Show details - Hide details
-
p.
2796
–2804
(9)
In the field of object detection, the research on the problem of detecting small face is the most extensive, but when there are objects with obvious scale differences in the image, the detection performance is not obvious, which is due to the scale invariance properties of the deep convolutional neural networks. Although in recent years, there have been some methods proposed to solve this problem such as FPN and SNIP, which is based on feature pyramid. However, they have not fundamentally solved the problem. A regional cascade multi-scale detection method has been proposed. First, a global detector and several local detectors have been trained, respectively. The global detector is trained by the original training set, while the local detector is trained by the sub-training set generated by the original training set. Second, the global detector can detect object roughly and the local detectors can produce more detailed results that improve the performance of global detector. Finally, to integrate the detection results of global detector and local detectors as the output, non-maximum suppression methods are used. The method can be carried in any depth model of object detection, has good scalability, and is more suitable for dense face detection.
- Author(s): Feiniu Yuan ; Gang Li ; Xue Xia ; Bangjun Lei ; Jinting Shi
- Source: IET Image Processing, Volume 13, Issue 14, p. 2805 –2812
- DOI: 10.1049/iet-ipr.2019.0012
- Type: Article
- + Show details - Hide details
-
p.
2805
–2812
(8)
To improve recognition accuracy, the authors fuse texture, edge and line information to propose a feature extraction method for smoke recognition. The Canny operator is proposed to generate an edge image from an original image, and then adopt the Hough transform to extract straight lines from the edge image. The lines are rasterised to generate a discrete line image and two local patterns are proposed for the edge and line images. The first one is local boundary summation pattern (LBSP) that computes the sum of binary pixel values along the boundary of a local region around a centre pixel. The second one is called local region summation pattern (LRSP) that sums up the binary values of pixels in a local region around the centre pixel. Besides LBSP and LRSP, LBPs with three mapping modes (LBP_M3) to achieve traditional texture information are also extracted. Finally, the authors concatenate the histograms of LBP_M3, LBSP and LRSP to generate a feature vector, and use support vector machine for classifying and testing. Experiments show that authors’ method outperforms most of existing traditional methods for smoke recognition. Although this method has low dimensional features, it also obtains good performance for multi-class texture classification.
- Author(s): Xin Chen ; Aming Wu ; Yahong Han
- Source: IET Image Processing, Volume 13, Issue 14, p. 2813 –2820
- DOI: 10.1049/iet-ipr.2018.6479
- Type: Article
- + Show details - Hide details
-
p.
2813
–2820
(8)
In recent years, image semantic segmentation based on a convolutional neural network has achieved many advances. However, the development of video semantic segmentation is relatively slow. Directly applying the image segmentation algorithms to each video frame separately may ignore the temporal region continuity inherent in videos. In this study, the authors propose a novel deep neural network architecture with a newly devised spatio-temporal continuity (STC) module for video semantic segmentation. Particularly, the architecture includes an encoding network, an STC module, and a decoding network. The encoding network is used to extract a high-level feature map. The STC module then uses the high-level feature map as input to extract the STC feature map. For decoding, they use four dilated convolutional layers to obtain more abstract representation and a deconvolutional layer to increase the size of the representation. Finally, they fuse the current feature representation and the previous feature representation and get the class probabilities. Thus, this architecture receives a sequence of consecutive video frames and outputs the segmentation result of the current frame. They extensively evaluate the proposed approach on the CamVid and KITTI datasets. Compared with other methods, the authors’ approach not only achieves competitive performance but also has lower complexity.
- Author(s): Mao Li ; Jiancheng Lv ; Chenwei Tang
- Source: IET Image Processing, Volume 13, Issue 14, p. 2821 –2828
- DOI: 10.1049/iet-ipr.2018.6572
- Type: Article
- + Show details - Hide details
-
p.
2821
–2828
(8)
All things follow the cosmic equilibrium rule. As one of the important factors in evaluating images of aesthetic effects, the visual balance has not attracted much attention. In this study, a novel method is proposed to quantify the law of visual balance and automatically evaluate the aesthetic value of images. In the proposed method, the authors first analyse the colour composition of images using the K-means clustering method. Then, based on information aesthetics theory and perspective principles, a calculation method is proposed to find the visual centre of gravity of images. Finally, the aesthetic effect of the image based on visual balance is evaluated according to the positional relationship between the visual centre of gravity and the physical centre of the image. With extensive experimental results, the authors demonstrate qualitatively and quantitatively that the proposed evaluation method is basically consistent with the intuitive experience of most human beings. Moreover, experiments are also conducted on a large and diversified benchmark data set that are competitive with the current state of the art.
- Author(s): Zhichao Cui ; Yuehu Liu ; Fuji Ren
- Source: IET Image Processing, Volume 13, Issue 14, p. 2829 –2839
- DOI: 10.1049/iet-ipr.2019.0023
- Type: Article
- + Show details - Hide details
-
p.
2829
–2839
(11)
This study proposes a vision-based method for traffic sign attribute estimation, i.e. 3D position and pose, from image sequences by binocular or monocular cameras. The method starts with acquiring robust feature correspondences based on homography constraints from image pairs. Then the objective function is designed to integrate the feature correspondences to optimise the parameters of the traffic sign plane in the 3D coordinate. Finally, the sign plane is utilised for attribute estimation. In addition, the authors provide an extension for the raw KITTI dataset, which can be utilised for 3D tasks of traffic sign localisation and pose estimation. In the experiments, three popular methods are employed for comparisons based on the publicly available BelgiumTS and KITTI datasets. The results show that the authors’ method based on SIFT and SURF features can locate the traffic signs with a mean error of ∼0.44 and 0.51 m in the BelgiumTS and KITTI datasets, respectively, and estimate the pose with a mean error of ∼14.45° in the KITTI dataset.
- Author(s): Ting Chen ; Tao Gao ; Xiangmo Zhao
- Source: IET Image Processing, Volume 13, Issue 14, p. 2840 –2849
- DOI: 10.1049/iet-ipr.2018.6665
- Type: Article
- + Show details - Hide details
-
p.
2840
–2849
(10)
Owing to lack of enough face image and invalidation of many traditional face recognition algorithms, face recognition with single training sample is really a great challenge. To solve the above problem, this study proposes a novel local weighted fusion Gabor (LWFG) algorithm. First, one single sample is segmented into a series of block sub-images, and then, each of these sub-images is decomposed into a series of multi-resolution Gabor wavelets with multi-orientation and multi-scale. Second, different orientation Gabor wavelets with the same scale are fused. Next, different scale Gabor wavelets with the same orientation are fused according to the proposed fusion criterion. Third, the fusion Gabor feature histograms are calculated in each of the divided local regions. Meanwhile, every local region's information importance is measured by the proposed local image information content model. Finally, the fusion Gabor wavelet histograms are adaptively weighed by weighting map which calculated from information content model. This study conducted simulation experiments on different face databases under the different conditions including partial occlusion, expression change and illumination variation. The results indicated that the proposed LWFG algorithm is more effective with single training sample.
- Author(s): Qing Qi and Jichang Guo
- Source: IET Image Processing, Volume 13, Issue 14, p. 2850 –2858
- DOI: 10.1049/iet-ipr.2018.6697
- Type: Article
- + Show details - Hide details
-
p.
2850
–2858
(9)
Recently, text images deblurring has achieved advanced development. Unlike previous methods based on hand-crafted priors or assume specific kernel, the authors recognise the text deblurring problem as a semantic generation task, which can be achieved by a generative adversarial network. The structure is an essential property of text images; thus, they propose a structural loss function and a detailed loss function to regularise the recovery of text images. Furthermore, they learn from the coarse-to-fine strategy and present a multi-scale generator, which is utilised for sharpening the generated text images. The model has a robust capability of generating realistic latent images with photo-quality effect. Extensive experiments on the synthetic and real-world blurry images have shown that the proposed network is comparable to the state-of-the-art methods.
- Author(s): Zhijie Wang ; Wei Zhang ; Xuewen Rong ; Yibin Li
- Source: IET Image Processing, Volume 13, Issue 14, p. 2859 –2865
- DOI: 10.1049/iet-ipr.2018.6581
- Type: Article
- + Show details - Hide details
-
p.
2859
–2865
(7)
The generative adversarial network has been shown to produce state-of-the-art results of image generation. In this study, the authors propose a novel adversarial training method to train salient object detection (SOD) models. They train a convolutional SOD network along with a gated adversarial network that discriminates salient maps coming either from the ground truth or from the SOD network. The motivation for our approach is that the adversarial network can detect and correct pixel-wise errors between ground truth salient detection maps and the ones produced by the convolutional network. Our experiments show that the adversarial training approach leads to state-of-the-art performance on MSRA-B, extended complex scene saliency dataset, HKU-IS, DUT, and SOD dataset.
Guest Editorial: Adversarial Learning in Image Processing
Survey on GAN-based face hallucination with its model development
Image super-resolution using conditional generative adversarial network
Learning one-to-many stylised Chinese character transformation and generation by generative adversarial networks
ACFT: adversarial correlation filter for robust tracking
ASiam: adaptive Siamese regression tracking with adversarial template generation and motion-based failure recovery
ApprGAN: appearance-based GAN for facial expression synthesis
Adversarial image generation by combining content and style
High confidence detection for moving target in aerial video
Combination of modified U-Net and domain adaptation for road detection
PixTextGAN: structure aware text image synthesis for license plate recognition
Progressive graph-based subspace transductive learning for semi-supervised classification
Two-order graph convolutional networks for semi-supervised classification
Adversarial auto-encoder for unsupervised deep domain adaptation
SDCA: a novel stack deep convolutional autoencoder – an application on retinal image denoising
Modified tropical algebra based median filter for removing salt and pepper noise in digital image
Dense small face detection based on regional cascade multi-scale method
Fusing texture, edge and line features for smoke recognition
Capturing the spatio-temporal continuity for video semantic segmentation
Aesthetic assessment of paintings based on visual balance
Homography-based traffic sign localisation and pose estimation from image sequence
Single sample description based on Gabor fusion
Blind text images deblurring based on a generative adversarial network
Salient object detection with adversarial training
-
- Author(s): Abdelkader Horch ; Khalifa Djemal ; Abdelkader Gafour ; Nasreddine Taleb
- Source: IET Image Processing, Volume 13, Issue 14, p. 2866 –2876
- DOI: 10.1049/iet-ipr.2019.0122
- Type: Article
- + Show details - Hide details
-
p.
2866
–2876
(11)
Deforestation has become a major problem consisting of a continuous regression of forested areas in the world, and for this purpose, an efficient detection of these changes has become more than necessary. In this work, a new method for deforestation change detection is proposed. This approach is based on a supervised fusion of local texture features extracted from SAR images. ALOS PALSAR (Advanced Land Observation Satellite Phased Array type L-band Synthetic Aperture Radar) multi-temporal data have been used in this work. Normalised radar cross-section (NRCS) and polarimetric features extracted from HH and HV polarised data allowed recognising different categories of land covers termed as NRCS classification. Grey-level co-occurrence matrix (GLCM) texture features were extracted by using a different moving window sizes applied on local regions previously obtained by binarisation of the NRCS results. A total of 300 samples of regions and five GLCM characteristics have been used here. The detection of deforestation appears clearly in the resulted images with a very satisfactory precision of the reached regions, and the obtained results of the proposed supervised approach have indeed led to very good detection results of the deforestation change.
- Author(s): Sebastián Salazar Colores ; Eduardo Ulises Moya-Sánchez ; Juan-Manuel Ramos-Arreguín ; Eduardo Cabal-Yépez
- Source: IET Image Processing, Volume 13, Issue 14, p. 2877 –2887
- DOI: 10.1049/iet-ipr.2018.6403
- Type: Article
- + Show details - Hide details
-
p.
2877
–2887
(11)
Outdoor scenes often contain atmospheric degradation, such as fog or haze, which deteriorate the performance of tracking, autonomous driving and surveillance systems, among others, making dehazing methods an area of considerable interest. However, some dehazing techniques are computationally demanding, generating a trade-off between time-consumption and restoration quality. A new method is proposed for improving outdoor images taken with haze effects while making them less time-consuming. The proposed method was inspired by the Radon transform and tailored for dehazing images by computing the dark channel, in addition to using statistical computations and a heuristic approach to avoid saturated areas. The results obtained were subjected to a reduced-reference image quality dehazing assessment, a full-reference metrics Structural SIMilarity (SSIM) index and peak signal-to-noise ratio (PSNR) over real-world and synthetic outdoor images. The results demonstrate that the proposed method presents an adequate balance between new visible edges, increased gradient and saturated pixels, in addition to obtaining at least a 5% increase in the SSIM index and, a 16% increase in the PSNR index, as well as being 5.37 times faster than the four dehazing methods recently introduced in the literature.
- Author(s): Tao Zhu
- Source: IET Image Processing, Volume 13, Issue 14, p. 2888 –2896
- DOI: 10.1049/iet-ipr.2019.0600
- Type: Article
- + Show details - Hide details
-
p.
2888
–2896
(9)
The over-relaxed monotone fast iterative shrinkage-thresholding algorithm (OMFISTA) needs to satisfy a complex convergence condition with respect to its additional parameters. To simplify the convergence condition, this study proposes a new OMFISTA, termed OMFISTAv2, using a parameter setting strategy which will derive a simple sufficient condition with respect to the additional parameters to guarantee the convergence of OMFISTAv2. Moreover, the authors find experimentally that OMFISTAv2 can accelerate MFISTA in some cases where the system matrix is ill-conditioned or rank-deficient, while OMFISTA cannot.
- Author(s): Yun-Bo Zhao ; Jian-Wu Lin ; Qi Xuan ; Xugang Xi
- Source: IET Image Processing, Volume 13, Issue 14, p. 2897 –2904
- DOI: 10.1049/iet-ipr.2019.0699
- Type: Article
- + Show details - Hide details
-
p.
2897
–2904
(8)
Most video surveillance systems use both RGB and infrared cameras, making it a vital technique to re-identify a person cross the RGB and infrared modalities. This task can be challenging due to both the cross-modality variations caused by heterogeneous images in RGB and infrared, and the intra-modality variations caused by the heterogeneous human poses, camera position, light brightness etc. To meet these challenges, a novel feature learning framework, hard pentaplet and identity loss network (HPILN), is proposed. In the framework existing single-modality re-identification models are modified to fit for the cross-modality scenario, following which specifically designed hard pentaplet loss and identity loss are used to increase the accuracy of the modified cross-modality re-identification models. Based on the benchmark of the SYSU-MM01 dataset, extensive experiments have been conducted, showing that the authors’ method outperforms all existing ones in terms of cumulative match characteristic curve and mean average precision.
- Author(s): Qiuyu Zhang ; Jitian Han ; Yutong Ye
- Source: IET Image Processing, Volume 13, Issue 14, p. 2905 –2915
- DOI: 10.1049/iet-ipr.2019.0667
- Type: Article
- + Show details - Hide details
-
p.
2905
–2915
(11)
In order to improve the problems of security and robustness for existing image encryption algorithms and to reduce the security risks of encryption algorithms against statistical analysis, differential attacks, exhaustive attacks, cropping and noise attacks etc., a novel image encryption algorithm based on image hashing, improved chaotic mapping and DNA coding is proposed. Firstly, extracting the image features and evenly block after pre-processing the original image and fingerprint image. Secondly, the features are generated to be a binary hash sequence through the image hash algorithm, and the generated hash sequence is iterated as the initial parameter of the improved chaotic map and Chen's chaotic system. Finally, Chen's chaotic system is used to generate a random sequence, and the matrix generated by the improved chaotic map and the original image is subjected to DNA calculation and encoding operations to obtain an encrypted image. Experimental results show that the proposed algorithm has better performance on security, larger key space and higher key sensitivity, the pixel correlation coefficient close to 0, the information entropy close to 8, unified average changing intensity and number of pixels change rate values are close to ideal values, and it has better robustness on noise and cropping attacks.
- Author(s): Wei Liu ; Xingzhi Chang ; Jiuzhen Liang ; Zhenjie Hou ; Li Xu
- Source: IET Image Processing, Volume 13, Issue 14, p. 2916 –2928
- DOI: 10.1049/iet-ipr.2018.6626
- Type: Article
- + Show details - Hide details
-
p.
2916
–2928
(13)
To accurately detect defects in patterned fabrics, a novel detection algorithm combining template correction with primitive decomposition (TCPD) method is proposed in this study. First of all, the fabric image is segmented into lattices according to variation regularity. Then, the authors propose an effective anisotropy correction method to reduce the interference of stretching and distortion between lattices. On the basis of the proposed PD method, the corrected lattice is further divided into graphic elements with smaller particle size. The smaller primitives make the boundary of the detection results more accurate. Moreover, a self-supervised threshold selection strategy is presented, which utilises the defect-free regions to obtain threshold. Furthermore, this strategy makes each primitive has corresponding criteria for judging defects. Extensive experiments demonstrate that TCPD method achieves 0.8127 true positive rate, 0.3889 positive predictive value and 0.5261 f value in star-patterned fabrics.
- Author(s): Bin Qiu ; Xiwen Liang ; Zhuo Su ; Ruomei Wang ; Fan Zhou
- Source: IET Image Processing, Volume 13, Issue 14, p. 2929 –2939
- DOI: 10.1049/iet-ipr.2019.0261
- Type: Article
- + Show details - Hide details
-
p.
2929
–2939
(11)
In this study, the authors propose a novel progressive dehazing network to address the single image haze removal problem based on a new mean progressive scattering model. Different from methods that learn atmosphere light and transmission maps with different networks, these two variables are optimised in a unified network. Following the methodology of traditional prior-based methods that estimate a coarse transmission map first, a progressive refinement branch in the decoder has been designed to restore the fine-scale transmission map. To improve the prediction accuracy of the transmission map, a novel binomial truncated loss that assigns weights to error values according to the probabilities of error occurrences has been proposed. An ablation study is conducted to verify the effectiveness of the components in the proposed method. Experiments in the synthetic datasets and real images demonstrate that the proposed method outperforms other state-of-the-art methods.
Supervised fusion approach of local features extracted from SAR images for detecting deforestation changes
Statistical multidirectional line dark channel for single-image dehazing
New over-relaxed monotone fast iterative shrinkage-thresholding algorithm for linear inverse problems
HPILN: a feature learning framework for cross-modality person re-identification
Image encryption algorithm based on image hashing, improved chaotic mapping and DNA coding
Defect inspection research on fabric based on template correction and primitive decomposition
Learning mean progressive scattering using binomial truncated loss for image dehazing
Most viewed content
Most cited content for this Journal
-
Medical image segmentation using deep learning: A survey
- Author(s): Risheng Wang ; Tao Lei ; Ruixia Cui ; Bingtao Zhang ; Hongying Meng ; Asoke K. Nandi
- Type: Article
-
Block-based discrete wavelet transform-singular value decomposition image watermarking scheme using human visual system characteristics
- Author(s): Nasrin M. Makbol ; Bee Ee Khoo ; Taha H. Rassem
- Type: Article
-
Classification of malignant melanoma and benign skin lesions: implementation of automatic ABCD rule
- Author(s): Reda Kasmi and Karim Mokrani
- Type: Article
-
Digital image watermarking method based on DCT and fractal encoding
- Author(s): Shuai Liu ; Zheng Pan ; Houbing Song
- Type: Article
-
Tomato leaf disease classification by exploiting transfer learning and feature concatenation
- Author(s): Mehdhar S. A. M. Al‐gaashani ; Fengjun Shang ; Mohammed S. A. Muthanna ; Mashael Khayyat ; Ahmed A. Abd El‐Latif
- Type: Article