IET Image Processing
Volume 14, Issue 13, November 2020
Volumes & issues:
Volume 14, Issue 13
November 2020
-
- Author(s): Ahmet Gürhanlı
- Source: IET Image Processing, Volume 14, Issue 13, p. 2957 –2964
- DOI: 10.1049/iet-ipr.2019.0761
- Type: Article
- + Show details - Hide details
-
p.
2957
–2964
(8)
Convolutional neural networks (CNNs) play an important role in image recognition applications. Fast training of image recognition systems is a crucial point, because the system should be trained for each new image class. These networks are trained using lengthy calculations. Focus of engineering is on obtaining a fast, but stable optimisation method. Momentum technique which is used in backpropagation algorithms is like a proportional–integral (PI) controller that is widely employed in automatic control systems. It takes the integral of past errors and helps reaching the training targets. Proportional + momentum + derivative (ProMoD) method adds gradient of update matrices to the training process and builds an optimiser such as the widely used PI–derivative controller. The method accelerates the movement toward the target accuracy levels. This is achieved by doing bigger corrections in the beginning using the differences in the calculated update matrices. In this research, ProMoD method is tested on image recognition applications and CNNs. Modified national institute of standards and technology database (MNIST) and Fashion-MNIST datasets are used for evaluating the performance. Experimental results showed that ProMoD might perform much faster in training of CNNs and consume proportionally less power with respect to the momentum and stochastic gradient descent (SGD) techniques.
- Author(s): Ankit Garg and Ashish Negi
- Source: IET Image Processing, Volume 14, Issue 13, p. 2965 –2975
- DOI: 10.1049/iet-ipr.2019.1032
- Type: Article
- + Show details - Hide details
-
p.
2965
–2975
(11)
The evolution of image retargeting technique demands the exploitation of multi-operators since they are capable of preserving the structure and salient objects of the image. However, these multi-operators are mostly based on seam carving with scaling or cropping operators which lead to significant distortions in the retargeted image. This study proposes a new multi-operator scheme which has improved seam carving, through the proposed seam diversion based image retargeting algorithm, integrated with the cropping and warping operator. A total of six different multi-operator schemes have been proposed out of which the MO6 technique gave remarkable results in terms of image quality, least distortion, and lowest run-time. To simplify image retargeting operations, an optimised image distance function was used. The optimised image distance function was formulated which combines bidirectional image Euclidean distance, dominant colour descriptor, and an energy-based coefficient to bypass seams from the point where seams start clashing and hence the defined threshold violates. By integrating cropping and warping into the proposed algorithm, it preserves the salient features of the retargeted images. Typical results have been presented which demonstrates the effectiveness of the proposed methods. User-based subjective analysis has also been carried out which shows that image retargeted using the MO6 technique has high user preference.
- Author(s): Gang Wang ; Yong-guang Chen ; Min Gao ; Suo-chang Yang ; Fu-qiang Feng ; Bernard De Baets
- Source: IET Image Processing, Volume 14, Issue 13, p. 2976 –2986
- DOI: 10.1049/iet-ipr.2019.0949
- Type: Article
- + Show details - Hide details
-
p.
2976
–2986
(11)
Boundaries play a crucial role in various image-based tasks, but many existing non-learning-based boundary detection methods underperform in recognising authentic boundaries from a complex background. In this study, the authors address this problem using the sparseness-constrained colour-opponent response and the superpixel contrast. First, building on the biologically inspired colour-opponency mechanism, the authors elaborate a method to compute the unbiased sparseness-constrained colour-opponent response. In this procedure, locations showing colour variations are enhanced, while the textural locations are preliminarily suppressed by the cue of local sparseness measure. Second, with the help of superpixel segmentation, the authors present an effective approach to obtain the superpixel contrast map. This approach helps to exploit the object shape information in suppressing textures. Consequently, the authors propose a non-learning-based method to detect boundaries in images, combining the unbiased sparseness-constrained colour-opponent response and the overall superpixel contrast map. Experiment results on widely adopted datasets manifest that the authors method outperforms most of the competing methods. In particular, compared with the state-of-the-art surround-modulation method, the proposed method obtains a comparable performance while consuming much less runtime.
- Author(s): Rethinam Sivaraman ; Sundararaman Rajagopalan ; John Bosco Balaguru Rayappan ; Rengarajan Amirtharajan
- Source: IET Image Processing, Volume 14, Issue 13, p. 2987 –2997
- DOI: 10.1049/iet-ipr.2019.0168
- Type: Article
- + Show details - Hide details
-
p.
2987
–2997
(11)
The utility of true random number generators (TRNGs) is not only restricted to session key generation, nonce generation, OTP generation etc. in cryptography. In the proposed work, two ring oscillator (RO) based TRNG structures adopting identical and non-identical ring of inverters have alone been employed for confusion (scrambling) and diffusion (intensity variation) processes for encrypting the greyscale and RGB images. Cyclone IVE EP4CE115F29C7 FPGA was utilised to generate a couple of random synthetic images using the two RO architectures which took a maximum of 520 combinational units and 543 logic registers. The suggested scheme of image encryption was tested on 100 test greyscale images of size 256 × 256. This non-chaos influenced image ciphering has resulted in an approximate average entropy of 7.99 and near-zero correlation figures for the greyscale & RGB cipher images. The attack resistance capability was checked by performing various occlusion and noise attacks on encrypted images.
- Author(s): Mohankumar Shilpa ; Madigondanahalli Thimmaiah Gopalakrishna ; Chikkaguddaiah Naveena
- Source: IET Image Processing, Volume 14, Issue 13, p. 2998 –3005
- DOI: 10.1049/iet-ipr.2020.0001
- Type: Article
- + Show details - Hide details
-
p.
2998
–3005
(8)
In this work, the authors have proposed a method for shadow detection and removal from videos by utilising methods of machine learning. From literature, various algorithms on shadow detection and removal have been accounted with advantages and disadvantages. Here some algorithms have a need for manual alignment and predefined explicit parameters, but fail to give precise outcome in different lighting and ecological surroundings. In this work, the authors propose a three-phase framework. In first stage, key frames are chosen by utilising features based K-means clustering which selects the key frames using features like colour, shape and surface. In second stage, they utilised two-stage segmentation techniques to segment the shadows by marking the region of interest. In the final step, they use threshold based segmentation to remove the shadow in videos. The performance of the proposed method is compared by performance evaluation of all state-of-the-art methods. The proposed strategies are established to achieve superior results in comparison to other state-of-the-art methods.
- Author(s): Hongxia Gao ; Zhanhong Chen ; Binyang Huang ; Jiahe Chen ; Zhifu Li
- Source: IET Image Processing, Volume 14, Issue 13, p. 3006 –3013
- DOI: 10.1049/iet-ipr.2018.5767
- Type: Article
- + Show details - Hide details
-
p.
3006
–3013
(8)
Generative adversarial network (GAN) is one of the most prevalent generative models that can synthesise realistic high-frequency details. However, a mismatch between the input and the output may arise when GAN is directly applied to image super-resolution. To alleviate this issue, the authors adopted a conditional GAN (cGAN) in this study. The cGAN discriminator attempted to guess whether the unknown high-resolution (HR) image was produced by the generator with the aid of the original low-resolution (LR) image. They propose a novel discriminator that only penalises at the scale of the patch and, thus, has relatively few parameters to train. The generator of cGAN is an encoder–decoder with skip connections to shuttle the shared low-level information directly across the network. To better maintain the low-frequency information and recover the high-frequency information, they designed a generator loss function combining adversarial loss term and L1 loss term. The former term is beneficial to the synthesis of fine-grained textures, while the latter is responsible for learning the overall structure of the LR input. The experiments revealed that the proposed method could generate HR images with richer details and less over-smoothness.
- Author(s): Cungang Wu and Chao Huang
- Source: IET Image Processing, Volume 14, Issue 13, p. 3014 –3020
- DOI: 10.1049/iet-ipr.2019.1052
- Type: Article
- + Show details - Hide details
-
p.
3014
–3020
(7)
Traditional cardiovascular ultrasound detection using focused ultrasound can cause a decrease in frame rate, which affects the results of the diagnosis. In order to improve the effect of cardiovascular ultrasound detection, this study used ultrasound vector blood flow imaging technology to improve the image frame rate and introduce planar high frame rate imaging technology. Simultaneously, this work studies the parameters of the image through parameter analysis and combines the advantages of various methods to improve the analysis. In addition, this work proposes a high frame rate blood flow imaging method, and designs simulation experiments to analyse the effectiveness of the method. The results show that the proposed algorithm has a certain effect in two-dimensional vector blood flow imaging, which can be applied to clinical practice, and can provide theoretical reference for subsequent related research.
- Author(s): Jun He ; Yijia Zhao ; Bo Sun ; Lejun Yu
- Source: IET Image Processing, Volume 14, Issue 13, p. 3021 –3027
- DOI: 10.1049/iet-ipr.2019.1317
- Type: Article
- + Show details - Hide details
-
p.
3021
–3027
(7)
Image captioning can be treated as a policy gradient problem. A retrieval model to obtain the discriminability score to distinguish between two images, given the caption for one of them, has been proposed previously; the discriminability score and one of the image captioning evaluation metrics were optimised using policy gradient. Based on this, two methods to evaluate the caption and caption-generating process, referred to as feedback evaluations, are proposed in this study. The results of the evaluations were used to improve the model. First, an auxiliary retrieval loss (ARL) is introduced to evaluate the generated caption to improve the discriminability of the model. ARL has been utilised as a feedback evaluation method because it calculates similarity between the generated caption and convolutional neural network features. With ARL, a higher similarity and better discriminability were achieved. Second, an evaluation reward is introduced to evaluate the captioning process. With ER, the overall evaluation metrics can be improved. A policy gradient was used, and a captioning model could be trained by jointly adjusting the captioning process and captioning itself. The attention long short-term memory network was trained with ARL and ER successively and it demonstrated state-of-the-art performance on the COCO database.
- Author(s): Yepeng Liu ; Xuemei Li ; Qiang Guo ; Caiming Zhang
- Source: IET Image Processing, Volume 14, Issue 13, p. 3028 –3038
- DOI: 10.1049/iet-ipr.2020.0082
- Type: Article
- + Show details - Hide details
-
p.
3028
–3038
(11)
Based on the image self-similarity and singular value decomposition (SVD) techniques, the authors propose an iterative adaptive global denoising method. For the structural differences between image patches, they adaptively determine the size of the search window. In each window, a similar image patch matrix is constructed based on the multi-scale similarity measure. In order to ensure the speed of the method, the adaptive step size and the number of image patches are introduced, and all image patches are denoised in different iterations. This not only ensures the speed of the method, suppresses residual noise, but also reduces the artefacts caused by the fixed step size and the number of image patches. Therefore, the problem of image denoising is converted to the estimation of low-rank matrix. New singular values are estimated according to the noise level, and similar image patch matrices without noise are estimated using them and corresponding singular vectors. Experimental results show that compared with the state-of-the-art denoising algorithms, this method has a higher PSNR and FSIM, and has a good visual effect. The new method can be applied to image and video restoration, target recognition and image classification.
- Author(s): Peng Chen
- Source: IET Image Processing, Volume 14, Issue 13, p. 3039 –3045
- DOI: 10.1049/iet-ipr.2019.1200
- Type: Article
- + Show details - Hide details
-
p.
3039
–3045
(7)
A new machine learning method named as a broad learning system (BLS) has been proposed recently. The advantage of simple, fast, and good generalisation ability make it attracting extensive attention. In this study, by introducing BLS to solving hyperspectral image (HSI) classification, a minimum class variance BLS (MCVBLS) was proposed. Firstly, in order to get spectral–spatial representation of original HSI, spectral–spatial feature learning has been performed to take full advantage of abundant spectral and spatial information of HSI. Then, the authors use MCVBLS to classify the extracted spectral–spatial features. MCVBLS, in contrast to BLS, fully considers the global data structure and discriminant information of the data. MCVBLS enhances the classification performance model by minimising the intra-class distribution structure while maximising the inter-class discriminant information, the measure of placing restrictions on output weights will take more discriminative information and global discriminative structure information into consideration. Conducting an experiment on three benchmark hyperspectral datasets, they demonstrate that the proposed MCVBLS methods are effective for HSI classification, better than other state-of-the-art methods.
- Author(s): Ceren Guzel Turhan and Hasan Sakir Bilge
- Source: IET Image Processing, Volume 14, Issue 13, p. 3046 –3053
- DOI: 10.1049/iet-ipr.2019.1152
- Type: Article
- + Show details - Hide details
-
p.
3046
–3053
(8)
The performances of generative adversarial network (GAN) and autoencoder (AE) models on images have been gathering a great deal of interest in terms of transferring them to three-dimensional (3D) domain. In this study, single image to object reconstruction problem was focused by presenting a novel 2D-to-3D AE model inspired by the recent improvements. To benefit from middle-level features, a model with skip connections was constructed by transferring 2D features to 3D domain. Moreover, the authors considered class-awareness for obtaining a category-agnostic model using limited class-annotations. Apart from recent 3D reconstruction models, they adapted the intersection-over-union score based objective, which is used in the object segmentation model, for improving reconstruction performance. With all these contributions, they call their model as skipped volumetric class-aware AE (SkipVCAE). According to experimental studies, proposed model obtained higher scores than the state-of-the-art model given. The results have proven its performance as a category-specific and category-agnostic model together owing to its class-aware nature. In the further analysis, it was seen that presented model yielded satisfactory results on a single image to object modelling compared to its multi-view version thanks to class-awareness.
- Author(s): Yue Xie ; Hanling Zhang ; Lijun Li
- Source: IET Image Processing, Volume 14, Issue 13, p. 3054 –3065
- DOI: 10.1049/iet-ipr.2019.0651
- Type: Article
- + Show details - Hide details
-
p.
3054
–3065
(12)
Visual object tracking (VOT) based on discriminative correlation filters (DCF) has received great attention due to its higher computational efficiency and better robustness. However, DCF-based methods suffer from the problem of model contamination. The tracker will drift into the background due to the uncertainties brought by shifting among peaks, which will further lead to the issues of model degradation. To deal with occlusions, a novel Occlusion-Handling Tracker Based on Discriminative Correlation Filters (OHDCF) framework is proposed for online visual object tracking, where an occlusion-handling strategy is integrated into the spatial–temporal regularized correlation filters (STRCF). The occlusion-handling tracker follows a hybrid approach to handle partial occlusion and complete occlusion. Specifically, we first present a function to determine whether occlusion occurs. Then, the proposed filter uses block-based and feature-matching methods to determine whether an object is partially occluded or completely occluded. Following this, we use different methods to track the target. Extensive experiments have performed on OTB-100, Temple-Color-128, VOT-2016 and VOT-2018 datasets, the results show that OHDCF achieves promising performance compared to other state-of-the-art trackers. On VOT-2018, OHDCF significantly outperforms STRCF from the challenge with a relative gain of 4.8 in EAO and a gain of 4.6 in Accuracy.
- Author(s): Fei Xue ; Hongbing Ji ; Wenbo Zhang
- Source: IET Image Processing, Volume 14, Issue 13, p. 3066 –3075
- DOI: 10.1049/iet-ipr.2020.0019
- Type: Article
- + Show details - Hide details
-
p.
3066
–3075
(10)
In this work, the authors propose a novel self-supervised learning method based on mutual information to learn representations from the videos without manual annotation. Different video clips sampled from the same video usually have coherence in the temporal domain. To guide the network to learn such temporal coherence, they maximise the mutual information between global features extracted from different clips sampled from the same video (Global-MI). However, maximising the Global-MI leads the network to seek shared content from different video clips and may make the network degenerate to focus on the background of the video. Considering the structure of the video, they further maximise the average mutual information between the global feature and local patches of multiple regions of the video clip (multi-region Local-MI). Their approach, which is called Max-GL, learns the temporal coherence by jointly maximising the Global-MI and multi-region Local-MI. Experiments are conducted to validate the effectiveness of the proposed Max-GL. Experimental results show that the Max-GL can serve as an effective pre-training method for the task of action recognition in videos. Additional experiments for the task of action similarity labelling and dynamic scene recognition also validate the generalisation of the learned representations of the Max-GL.
- Author(s): Leena Silvoster M and Retnaswami Mathusoothana S. Kumar
- Source: IET Image Processing, Volume 14, Issue 13, p. 3076 –3083
- DOI: 10.1049/iet-ipr.2019.0971
- Type: Article
- + Show details - Hide details
-
p.
3076
–3083
(8)
Segmentation of spine Magnetic Resonance Images (MRIs) has become an indispensable process in the diagnosis of lumbar disc degeneration, causing low back pain. Over the last decade of years, computer-directed diagnosis of disease, as well as computer-guided spine surgery, is based on the two-dimensional (2D) analysis of mid-sagittal slice of MRI. This work proposes an automatic strategy to extract the 3D segmentation of the normal disc as well as degenerated lumbar intervertebral discs (IVDs) from T2-weighted Turbo Spin Echo MRI of the spine using Connected Component (CC) analysis algorithm and statistical shape analysis. The challenges faced by the IVD segmentation includes (i) partial volume effects (ii) intensity inhomogeneity (iii) grey level overlap of different soft tissues. The proposed method first pre-processes the dataset and enables it for the application of the CC algorithm. The CC (subsets of pixels of the disc) of the spine MRI is extracted and apply statistical shape analysis for the refinement of the segmentation results to detect IVDs. Experimental results of the proposed method show a robust segmentation, accomplishing the dice similarity index of 92.4% and thus achieving a low error rate. Other performance measures such as Precision, Accuracy, JaccardIdx, JaccardDist, Global Consistency Error, Variation of Information, etc were also evaluated. The algorithm is evaluated quantitatively using adequate experiments on a dataset of 15 MRI scans, of different scenarios such as healthy and degenerate disc and this proposed method is verified as a promising accurate method for the automatic segmentation of IVD.
- Author(s): Salah Ameer
- Source: IET Image Processing, Volume 14, Issue 13, p. 3084 –3088
- DOI: 10.1049/iet-ipr.2019.1428
- Type: Article
- + Show details - Hide details
-
p.
3084
–3088
(5)
The idea of the proposed image thresholding scheme is simply to consider the histogram as a 2D plot rather than a 1D function. The data can now be represented as a two-row matrix. The first row is simply the grey levels of the image and the second row is the corresponding histogram values. Multiplying this matrix by its transpose will result in a power-type matrix of size 2 × 2. The best threshold is the one producing a power matrix closer to that of the original image. Many combinations of the eigenvalues are suggested. To increase the correlation with the first row of the matrix, the histogram is replaced by the cumulative histogram. It is noticed that the trace of the matrix produces the best results. Comparative results show the effectiveness of the proposed schemes.
- Author(s): Bo-Lin Jian ; Min-Wei Huang ; Shin-Hsiung Lee ; Her-Terng Yau
- Source: IET Image Processing, Volume 14, Issue 13, p. 3089 –3094
- DOI: 10.1049/iet-ipr.2019.0416
- Type: Article
- + Show details - Hide details
-
p.
3089
–3094
(6)
In this study, a thermal imaging instrument was used to obtain facial thermal image information which was then used to calculate the number of breaths taken. However, small movements were inevitable and the first issue addressed was the means by which image calibration and region selection was to be made. To this end, thermal image sequence data calibration was done using technology that resolved small natural deviations in the nostril area. After these problems had been solved, a Hampel filter was used to process the nostril area signals. The independent component method was used to filter the effects of non-respiratory signals, and the least squares method was employed for smoothing. Savitzky-Golay filtering was used to adjust the signal baseline and the processed nostril region thermal image signals were compared with standard abdominal breathing band signals. Results showed that the difference in the number of breaths per minute was less than 1.5 times. The usual normal respiratory frequency range lies within a range of 0.1–0.5 Hz. In the calculation of ‘coherence obtained from MVAR model’, the spectral coherence analysis results showed the methods proposed in this study can substantially enhance the relevance between 0.15 and 0.2 Hz.
- Author(s): Hanlin Tan ; Huaxin Xiao ; Shiming Lai ; Yu Liu ; Maojun Zhang
- Source: IET Image Processing, Volume 14, Issue 13, p. 3095 –3104
- DOI: 10.1049/iet-ipr.2020.0041
- Type: Article
- + Show details - Hide details
-
p.
3095
–3104
(10)
The goal of image denoising is to recover a clean image from noisy input(s). For single image denoising, utilising similarities (or priors) within and across an image dataset helps recover clean images. As the noise level increases, using multiple frames become feasible, which is defined as burst denoising. In this study, the authors propose a deep residual model with squeeze-and-excitation (SE) modules for the burst denoising. Unlike previous methods, the authors' model does not need an explicit aligning procedure, which is light-weighted and fast. The network contains a noise estimation convolutional neural network, which makes it capable of blind denoising. Besides, by inverting the image processing pipeline and simulating real noise in bursts, their model can suppress real noise blindly. Since denoising performance is closely related to the noise level, frame displacement, and the number of frames (burst length), intensive experiments including ablation study are performed. Quantitative results show that the proposed method performs significantly better than previous state-of-the-art methods V-BM4D and KPN in removing Gaussian noise. Qualitative results show that the proposed method is also effective in removing real noise using bursts and the SE module is key to reduce blur in results.
- Author(s): Aniruddha Mazumdar and Prabin Kumar Bora
- Source: IET Image Processing, Volume 14, Issue 13, p. 3105 –3116
- DOI: 10.1049/iet-ipr.2019.1114
- Type: Article
- + Show details - Hide details
-
p.
3105
–3116
(12)
This study proposes a novel deep learning-based method which can detect different types of image editing operations carried out on images. Unlike most of the existing methods, which can only detect the editing operations considered in the training stage, the proposed method can generalise to manipulations not seen in the training stage. The method is based on the classification of image pairs as either similarly or differently processed using a deep siamese neural network. Once the network learns features that can discriminate different editing operations, it can check whether an image is processed with an editing operation, not present in the training stage, using the one-shot classification strategy. An image forgery detection and localisation technique is also proposed using the trained siamese network. The experimental results show the efficacy of the proposed method in detecting different editing operations and also show the ability in detecting and localising image forgeries.
- Author(s): Meng Chang ; Qi Li ; Zhuang He ; Huajun Feng ; Zhihai Xu
- Source: IET Image Processing, Volume 14, Issue 13, p. 3117 –3126
- DOI: 10.1049/iet-ipr.2019.1175
- Type: Article
- + Show details - Hide details
-
p.
3117
–3126
(10)
Images often suffer from low visual quality due to poor imaging conditions such as low light or hazy weather. The haze imaging model is widely used in contrast enhancement in daylight condition with haze, while the retinex model is universal for low-light conditions. Although their forms and applications are different, they can be unified into a more general form through the proposed observation. Based on this model, the authors can estimate the reflection of the scene more accurately in more complex imaging conditions. In this study, the authors propose a simple but effective method for estimating the reflection and enhancing the image contrast based on a general imaging model. To preserve the image details and control contrast, the authors introduce dark boundary and bright boundary to handle the high-light and low-light conditions, and a guided structure-preserving optimization algorithm is proposed to estimate them. After obtaining the dark and bright boundaries, the reflection is calculated and the image is enhanced accordingly. Different from previous approaches, which were designed for specific applications, the proposed method can be used for more diverse imaging conditions. Experiments show that the proposed method can be applied to many poor imaging conditions and maintain good performance.
- Author(s): Alireza Asadi and Mehdi Ezoji
- Source: IET Image Processing, Volume 14, Issue 13, p. 3127 –3133
- DOI: 10.1049/iet-ipr.2019.1147
- Type: Article
- + Show details - Hide details
-
p.
3127
–3133
(7)
The most important part of the common algorithms for multi-exposure image fusion (MEF) is the selection of features and metrics that are appropriate for weight map extraction. This study presents a structure-based multi-exposure image fusion by employing the phase congruency (PC) of the input image. The main idea behind PC-based analysis is that the locations of image key attributes are at points where frequency components are maximally in phase. PC detects the details of an image invariant to its contrast and also emphasises on the texture- or structure-based features. In this work, alongside intensity-based maps, the extracted PC-based map is utilised for MEF in a pyramidal manner. Several experiments conducted on the benchmark dataset including a variety of natural multi-exposed image sequences to evaluate the proposed algorithm. Quantitative evaluations in terms of MEF structural similarity index and visual quality assessments show that the proposed method achieves better performance and produces comparable fused images in comparison to other approaches.
- Author(s): Binwei Xu ; Haoran Liang ; Ronghua Liang
- Source: IET Image Processing, Volume 14, Issue 13, p. 3134 –3142
- DOI: 10.1049/iet-ipr.2019.1355
- Type: Article
- + Show details - Hide details
-
p.
3134
–3142
(9)
Video summarisation greatly improves the efficiency of people browsing videos and saves storage space. A good video summary should satisfy human visual interestingness and preserve the theme of the original video at the semantic level. Unlike many existing methods that consider only visual features to generate video summaries, this study proposes a method that combines visual and semantic cues to extract important information for dynamic video summarisation. The authors propose visual-verbal saliency consistency to add semantic information and propose a novel attention motion, along with other visual features to fully represent visual interestingness. Based on the importance score of each frame calculated by combining these features, they select an optimal subset of segments to generate an important and interesting summary. They evaluate their method using the SumMe and TVSum datasets and experimental results show that their method generates high-quality video summaries.
- Author(s): Nithya Chidambaram ; Pethuru Raj ; Karruppuswamy Thenmozhi ; Rengarajan Amirtharajan
- Source: IET Image Processing, Volume 14, Issue 13, p. 3143 –3153
- DOI: 10.1049/iet-ipr.2018.5654
- Type: Article
- + Show details - Hide details
-
p.
3143
–3153
(11)
The evergrowing virtualised information technology infrastructure is powered by cloud-centric technology around the world. Cloud-based multimedia storage has become an essential aspect for users and business behemoths. However, as per a survey of Norton, around 3800 breaches have been publicly disclosed with 4.1 billion numbers of records exposed in 2019, which is a 54% rise when compared to 2018. So the data security is a widely quoted barrier for cloud storage. Ciphering the confidential images before transmission and subsequent storage in cloud database needs critical attention for techno-specific applications. In image encryption, chaos-based keys can do better confusion, but the diffusion process using XOR is vulnerable to chosen plain text attack. The proposed colour image encryption scheme innately uses Deoxyribo Nucleic Acid coding that blends well with chaotic cryptosystem for an efficient statistical shift. The ciphered images are stored in authenticated and authorised cloud storage facilities. The experimentation is carried out with the help of Amazon Web Services storage instances. The proposed image encryption scheme offers a strong resistance towards the brute force, occlusion, statistical and differential attacks and yields near-zero correlation and good entropy.
- Author(s): Jianfen Huang ; Liyan Li ; Xiao Wang ; Baoli Lu ; Yuliang Liu
- Source: IET Image Processing, Volume 14, Issue 13, p. 3154 –3160
- DOI: 10.1049/iet-ipr.2019.1095
- Type: Article
- + Show details - Hide details
-
p.
3154
–3160
(7)
Quick response (QR) codes are widely used in many fields. Various QR code recognition approaches have been proposed to improve the accuracy of decoding QR code. However, the recognition of distorted QR codes with one missing position detection pattern (PDP) remains a problem. In this study, based on the vector relationship and the structural features, the authors introduce a new method for decoding distorted QR code with one missing PDP. Three methods, Zxing, Halcon, and the newly proposed method, are used to test the decoding capability. For QR codes with one missing PDP, the experimental results show that the proposed method could meet the recognition angle range as much as 110°, while Zxing fails to recognise, and the angle of decoding for Halcon is 90°. Especially, the proposed method is available at an extremely harsh luminance and contrast environment, e.g. both phases as a 60% discount, when the decoding angle of Halcon is only 35°, while the proposed one better than 2.7 times of it. Besides, the proposed method is more robust to decode the QR codes with a missing PDP under different backgrounds and noisy images.
- Author(s): Yuli Fu ; Junwei Xu ; Youjun Xiang ; Zhen Chen ; Tao Zhu ; Lei Cai ; Weihong He
- Source: IET Image Processing, Volume 14, Issue 13, p. 3161 –3168
- DOI: 10.1049/iet-ipr.2019.1654
- Type: Article
- + Show details - Hide details
-
p.
3161
–3168
(8)
As a prior knowledge, non-local self-similarity (NSS) has been widely utilised in ill-posed problems. Actually, similar textures appear not only in a single scale, but also in different scales. Unlike most existing patch-based methods that only explore NSS in the same scale, a multi-scale patches based image denoising algorithm is proposed in this study. The authors have designed a multi-scale strategy to expand the search space of block-matching, which will increase the probability of finding more similar patches. After that, the weighted nuclear norm minimisation (WNNM) algorithm is employed to reveal latent clean patches. With the join of the multi-scale framework, the performance of WNNM can be improved. The proposed algorithm can be used to solve NSS-based image restoration tasks. In this study, mainly image denoising is studied, and its effectiveness is derived through experiments on widely used test images.
- Author(s): Saadeddine Laaroussi ; Aziz Baataoui ; Akram Halli ; Khalid Satori
- Source: IET Image Processing, Volume 14, Issue 13, p. 3169 –3180
- DOI: 10.1049/iet-ipr.2019.1619
- Type: Article
- + Show details - Hide details
-
p.
3169
–3180
(12)
Image mosaicking is a combination of algorithms that use two or several images to create a single image. The resulting mosaic is a representation of a scene of the used images with a larger field of vision. However, since dynamic objects can exist in the overlap regions of these images, ghosting and parallax effects appear, therefore poor results are obtained. To overcome these unwanted effects and to achieve better results, a new method is presented in this paper. This approach uses a new way to detect dynamic objects in the common areas by using a fractional Brownian motion with a predetermined similarity function instead of a noise function, the Zero Normalized Cross Correlation. Thus, it will ensure that a map is created with each pixel having a unique value based on their surroundings even in homogeneous areas. Furthermore, this new approach combines the previously computed map with the machine learning algorithm A* for a fast and efficient way to find an optimal seamline. Consequently, the obtained experimental results were compared with different methods and better results were obtained as can be seen by a better quality seamline measure, a result mosaic without any artifacts and a faster computation time.
- Author(s): Mao Jiafa ; Huang Wei ; Sheng Weiguo
- Source: IET Image Processing, Volume 14, Issue 13, p. 3181 –3187
- DOI: 10.1049/iet-ipr.2019.1293
- Type: Article
- + Show details - Hide details
-
p.
3181
–3187
(7)
Most existing machine vision-based location methods mainly focus on the spatial positioning schemes using one or two cameras along with non-vision sensors. To achieve an accurate location, both schemes require processing a large amount of data. In this study, the authors propose a novel method, which requires much less amount of data to be processed for measuring target distance using monocular vision. Based on the geometric model of camera imaging, the parameters of the camera (such as camera's focal length and equivalent focal length.), as well as the principle of analogue signal being transformed into a digital signal, the authors derive the relationship among the target distance, field of view, equivalent focal length and camera resolution. Experimental results show that the proposed method can effectively and accurately achieve the target distance measurement.
- Author(s): Shiying Wang and Yan Shen
- Source: IET Image Processing, Volume 14, Issue 13, p. 3188 –3201
- DOI: 10.1049/iet-ipr.2019.1319
- Type: Article
- + Show details - Hide details
-
p.
3188
–3201
(14)
Image fusion aims at aggregating the redundant and complementary information in multiple original images, the most challenging aspect is to design robust features and discriminant model, which enhances saliency information in the fused image. To address this issue, the authors develop a novel image fusion algorithm for preserving the invariant knowledge of the multimodal image. Specifically, they formulate a novel unified architecture based on non-subsampled contourlet transform (NSCT). Their method introduces Quadtree decomposition and Bezier interpolation to extract crucial infrared features. Furthermore, they propose a saliency advertising phase congruency-based rule and local Laplacian energy-based rule for low- and high-pass sub-bands fusion, respectively. In this approach, the fusion image could not only combine the local and global features of the source image to avoid smoothing the edge of the target, but also retain the minor scales details and resists the interference noise of the multi-modal image. Both objective assessments and subjective visions of experimental results indicate that the proposed algorithm performs competitively in both objective evaluation criteria and visual quality.
- Author(s): Ferzan Katırcıoğlu
- Source: IET Image Processing, Volume 14, Issue 13, p. 3202 –3214
- DOI: 10.1049/iet-ipr.2020.0393
- Type: Article
- + Show details - Hide details
-
p.
3202
–3214
(13)
In this study, an enhancement process obtained by applying the heat conduction equation of solid and stagnant fluids on colour images is proposed. After the colour channel stretching, the RGB colour image was converted to the HSI model. The heat conduction equation was applied for each pixel on the I channel of the HSI colour model. The elements of the feature matrix called heat conduction matrix (HCM) can have negative, positive or zero values. A pixel with a small negative HCM value indicates that I needs level enhancement for a good image, whereas a small positive HCM value means that the I level value will be reduced and aligned with its neighbours. High positive or negative values are defined as the edges of the objects and the I level values of such pixels are not changed to protect the edges. In addition, whether HCM is negative or positive, the balanced increment and decrement path at a level I ensures that the mean brightness value performs natural protection. Finally, an enhanced image is obtained by transitioning from the HSI to the RGB colour model. Experimental results show that this method can enhance colour image details better than other methods.
- Author(s): Wenhao He ; Haitao Song ; Yue Guo ; Guibin Bian ; Yuejie Sun ; Xiaowei Zhou ; Xiaonan Wang
- Source: IET Image Processing, Volume 14, Issue 13, p. 3215 –3222
- DOI: 10.1049/iet-ipr.2020.0320
- Type: Article
- + Show details - Hide details
-
p.
3215
–3222
(8)
A challenging aspect of instrument segmentation in robotic surgery is to distinguish different parts of the same instrument. Parts with similar textures are common in a practical instrument and are difficult to distinguish. In this work, the authors introduce an end-to-end recurrent model that comprises a multiscale semantic segmentation network and a refinement model. Specifically, the semantic segmentation network uniformly transforms the input images in multiple scales into a semantic mask, and the refinement model is a single-scale net recurrently optimising the above semantic mask. Through extensive experiments, the authors validate that the models with multiscale inputs perform better than those to fuse encoded feature maps and ones with spatial attention. Furthermore, the authors verify the effectiveness of the proposed model with state-of-the-art performances on several robotic instrument datasets derived from MICCAI Endoscopic Vision Challenges.
- Author(s): Mohammadreza Riahi ; Mohammad Eslami ; Seyed Hamid Safavi ; Farah TorkamaniAzar
- Source: IET Image Processing, Volume 14, Issue 13, p. 3223 –3231
- DOI: 10.1049/iet-ipr.2019.1739
- Type: Article
- + Show details - Hide details
-
p.
3223
–3231
(9)
In action recognition, the dynamic image (DI) approach is recently proposed to code a video signal to a still image. Since DI descriptor is strongly dependent on first frames, it cannot extract dynamics that do not occur in the first frames or even long dynamics. On the other hand, most of the video frames are not informative for the task of action recognition. Therefore, the authors' intuition is that the process of representing a video using all frames is inefficient. Thus, in this study, they proposed to remove the existing redundancy inside the frames and extract some processed informative images based on the information theory which are called key frames. The proposed method is capable enough to extract sufficient frames regardless of the duration and the position of frames in the entire video. Motivated by this method and DI, they proposed a novel key frames dynamic image (KFDI) approach. Experimental results on popular UCF11, Olympic Sports, and J-HMDB datasets show the superiority of the proposed KFDI approach compared to the DI in capturing long dynamics of videos for action recognition. Their experiments show KFDI improves the accuracy 2–6% compared to DI.
- Author(s): Soumen Biswas and Ranjay Hazra
- Source: IET Image Processing, Volume 14, Issue 13, p. 3232 –3242
- DOI: 10.1049/iet-ipr.2020.0214
- Type: Article
- + Show details - Hide details
-
p.
3232
–3242
(11)
An active contour model to segment the images is proposed by combining local binary fitting (LBF) energy function and modified Laplacian of Gaussian (MLoG) energy function. A MLoG energy function based on a new boundary indicator function or edge stop function (ESF) is introduced to smoothen the homogeneous regions and enhance the edge information of objects. Also, MLoG energy term with LBF energy term is incorporated to drive the initial contour towards the object boundary. Finally, the penalty term is replaced with a new optimized potential function, which can improve the corresponding speed function. By adding the optimized area energy term, contour position is accelerated towards the object boundary. Further, the addition of MLoG based on new ESF, makes the proposed model insensitive to the initial contour. Experiments are performed on various real images, MS-COCO 2014 train data set images and Segmentation Evaluation Database images shared in Weizmann Institute of Science website. The proposed model provides better segmentation results compared to the other state of the art models in terms of segmentation accuracy, F-score and CPU execution time. Further, experimental results also prove the robustness of the proposed model in terms of contour initialization, intensity inhomogeneity and noise.
- Author(s): Guanzhao Li ; Jianwei Zhang ; Danni Chen
- Source: IET Image Processing, Volume 14, Issue 13, p. 3243 –3253
- DOI: 10.1049/iet-ipr.2019.0476
- Type: Article
- + Show details - Hide details
-
p.
3243
–3253
(11)
For Chinese font images, when all their strokes are replaced by pattern elements such as flowers and birds, they become flower–bird character paintings, which are traditional Chinese art treasures. The generation of flower–bird painting requires professional painters’ great efforts. How to automatically generate these paintings from font images? There is a huge gap between the font domain and the painting domain. Although many image-to-image translation frameworks have been proposed, they are unable to handle this situation effectively. In this study, a novel method called font-to-painting network (F2PNet) is proposed for font-to-painting translation. Specifically, an encoder equipped with dilated convolutions extracts features of the font image, and then the features are fed into the domain translation module for mapping the font feature space to the painting feature space. The acquired features are further adjusted by the refinement module and utilised by the decoder to obtain the target painting. The authors apply adversarial loss and cycle-consistency loss to F2PNet and further propose a loss term, which is called recognisability loss and makes the generated painting have font-level recognisability. It is proved by experiments that F2PNet is effective and can be used as an unsupervised image-to-image translation framework to solve more image translation tasks.
- Author(s): Wessam M. Salama ; Azza M. Elbagoury ; Moustafa H. Aly
- Source: IET Image Processing, Volume 14, Issue 13, p. 3254 –3259
- DOI: 10.1049/iet-ipr.2020.0122
- Type: Article
- + Show details - Hide details
-
p.
3254
–3259
(6)
AbstractBreast cancer is a major cause of transience amongst women. In this paper, two novel techniques, ResNet50 and VGG-16, are utilised and re-trained to recognise two classes rather than 1000 classes with high accuracy and low computational requirements. In addition, transfer learning and data augmentation are performed to solve the problem of lack of tagged data. To get a better accuracy, the support vector machine (SVM) classifier is utilised instead of the last fully connected layer. Our models performance are verified utilising k-fold cross-validation. Our proposed techniques are trained and evaluated on three mammographic datasets: mammographic image analysis society, digital database for screening mammography (DDSM) and the curated breast imaging subset of DDSM. This paper explains end-to-end fully convolutional neural networks without any prepossessing or post-processing. The proposed technique of employing ResNet50 hybridised with SVM achieves the best performance, specifically with the DDSM dataset, producing 97.98% accuracy, 98.46% area under the curve, 97.63% sensitivity, 96.51% precision, 95.97% F1 score and computational time 1.8934 s.
- Author(s): Shibai Yin ; Jin Xin ; Yibin Wang ; Anup Basu
- Source: IET Image Processing, Volume 14, Issue 13, p. 3260 –3272
- DOI: 10.1049/iet-ipr.2019.0873
- Type: Article
- + Show details - Hide details
-
p.
3260
–3272
(13)
Existing dehazing methods based on convolutional neural networks estimate the transmission map by treating channel-wise features equally, which lacks flexibility in handling different types of haze information, leading to the poor representational ability of the network. Besides, the scene lights are predicted by an even illumination prior which does not work for a real situation. To solve these problems, the authors propose a dense residual channel attention network (DRCAN) for estimating the transmission map and use an image segmentation strategy to predict scene lights. Specifically, DRCAN is built based on the proposed dense residual block (DRB) and dense residual channel attention block (DRCAB). DRB extracts the hierarchical features with increasing receptive fields. DRCAB makes the network focus on the features containing heavy haze information. After the transmission map is estimated, fuzzy partition entropy combined with graph cuts is used to segment the transmission map into scene regions covered with varying scene lights. This strategy not only considers the fuzzy intensities of the low-contrast transmission map but also takes spatial correlation into account. Finally, a clear image is obtained by the transmission map and varying scene lights. Extensive experiments demonstrate that our method is comparable to most of existing methods.
- Author(s): Zhenzhou Wang
- Source: IET Image Processing, Volume 14, Issue 13, p. 3273 –3281
- DOI: 10.1049/iet-ipr.2019.1481
- Type: Article
- + Show details - Hide details
-
p.
3273
–3281
(9)
Segmentation of the colour image is challenging because the colour information is lost after being projected into three channels of the colour space. Many state-of-the-art colour image segmentation methods are based on monochrome segmentation in one channel of the colour space. However, the optimal performance of a segmentation method usually could not be achieved in a single colour space due to the complexity and diversity of the colour images. In this study, the authors propose to segment the colour image by fusing the slope difference distribution (SDD) clustering results in different colour spaces. For simplicity, the segmentation approach is designed as two-label segmentation and it could be easily generalised to be multiple-label segmentation. The proposed approach is compared with the state-of-the-art colour image segmentation methods both quantitatively and qualitatively. Experimental results verified the effectiveness of the proposed approach.
Accelerating convolutional neural network training using ProMoD backpropagation algorithm
Structure preservation in content-aware image retargeting using multi-operator
Boundary detection using unbiased sparseness-constrained colour-opponent response and superpixel contrast
Ring oscillator as confusion – diffusion agent: a complete TRNG drove image security
Approach for shadow detection and removal using machine learning techniques
Image super-resolution based on conditional generative adversarial network
Blood flow imaging of high frame rate two-dimensional vector in cardiovascular ultrasound detection
Feedback evaluations to promote image captioning
Adaptive iterative global image denoising method based on SVD
Minimum class variance broad learning system for hyperspectral image classification
Class-aware single image to 3D object translational autoencoder
Occlusion-handling tracker based on discriminative correlation filters
Mutual information guided 3D ResNet for self-supervised video representation learning
Efficient segmentation of lumbar intervertebral disc from MR images
Eigenstructure involving the histogram for image thresholding
Analysis of the correlation between infrared thermal sequence images of nostril area and respiratory rate
Denoising real bursts with squeeze-and-excitation residual network
Siamese convolutional neural network-based approach towards universal image forensics
Toward a general model for reflection recovery and single image enhancement
Multi-exposure image fusion via a pyramidal integration of the phase congruency of input images with the intensity-based maps
Video summarisation with visual and semantic cues
Advanced framework for highly secure and cloud-based storage of colour images
Recognition of distorted QR codes with one missing position detection pattern
Multi-scale patches based image denoising using weighted nuclear norm minimisation
Dynamic mosaicking: combining A* algorithm with fractional Brownian motion for an optimal seamline detection
Target distance measurement method using monocular vision
Multi-modal image fusion based on saliency guided in NSCT domain
Colour image enhancement with brightness preservation and edge sharpening using a heat conduction matrix
Multiscale matters for part segmentation of instruments in robotic surgery
Human activity recognition using improved dynamic image
Active contours driven by modified LoG energy term and optimised penalty term for image segmentation
F2PNet: font-to-painting translation by adversarial learning
Novel breast cancer classification framework based on deep learning
Image dehazing with uneven illumination prior by dense residual channel attention network
Robust segmentation of the colour image by fusing the SDD clustering results from different colour spaces
Most viewed content
Most cited content for this Journal
-
Medical image segmentation using deep learning: A survey
- Author(s): Risheng Wang ; Tao Lei ; Ruixia Cui ; Bingtao Zhang ; Hongying Meng ; Asoke K. Nandi
- Type: Article
-
Block-based discrete wavelet transform-singular value decomposition image watermarking scheme using human visual system characteristics
- Author(s): Nasrin M. Makbol ; Bee Ee Khoo ; Taha H. Rassem
- Type: Article
-
Classification of malignant melanoma and benign skin lesions: implementation of automatic ABCD rule
- Author(s): Reda Kasmi and Karim Mokrani
- Type: Article
-
Digital image watermarking method based on DCT and fractal encoding
- Author(s): Shuai Liu ; Zheng Pan ; Houbing Song
- Type: Article
-
Tomato leaf disease classification by exploiting transfer learning and feature concatenation
- Author(s): Mehdhar S. A. M. Al‐gaashani ; Fengjun Shang ; Mohammed S. A. Muthanna ; Mashael Khayyat ; Ahmed A. Abd El‐Latif
- Type: Article