IET Image Processing
Volume 14, Issue 14, December 2020
Volumes & issues:
Volume 14, Issue 14
December 2020
-
- Author(s): Yeganeh Madadi ; Vahid Seydi ; Kamal Nasrollahi ; Reshad Hosseini ; Thomas B. Moeslund
- Source: IET Image Processing, Volume 14, Issue 14, p. 3283 –3299
- DOI: 10.1049/iet-ipr.2020.0087
- Type: Article
- + Show details - Hide details
-
p.
3283
–3299
(17)
Learning methods are challenged when there is not enough labelled data. It gets worse when the existing learning data have different distributions in different domains. To deal with such situations, deep unsupervised domain adaptation techniques have newly been widely used. This study surveys such domain adaptation methods that have been used for classification tasks in computer vision. The survey includes the very recent papers on this topic that have not been included in the previous surveys and introduces a taxonomy by grouping methods published on unsupervised domain adaptation into five groups of discrepancy-, adversarial-, reconstruction-, representation-, and attention-based methods.
- Author(s): Qiang Lin ; Zhengxing Man ; Yongchun Cao ; Tao Deng ; Chengcheng Han ; Chuangui Cao ; Linjun Zhang ; Sitao Zeng ; Ruiting Gao ; Weilan Wang ; Jinshui Ji ; Xiaodi Huang
- Source: IET Image Processing, Volume 14, Issue 14, p. 3300 –3313
- DOI: 10.1049/iet-ipr.2019.1690
- Type: Article
- + Show details - Hide details
-
p.
3300
–3313
(14)
Functional imaging has successfully been applied to capture functional changes in the pathological tissues of a body in recent years. Nuclear medicine functional imaging has been used to acquire information about areas of concerns (e.g. lesions and organs) in a non-invasive manner, enabling semi-automated or automated decision-making for disease diagnosis, treatment, evaluation, and prediction. Focusing on functional nuclear medicine images, in this study, the authors review existing work on the classification of single-photon emission computed tomography, positron emission tomography, and their hybrid modalities with computed tomography and magnetic resonance imaging images by using convolutional neural network (CNN) techniques. Specifically, they first present an overview of nuclear imaging and the CNN technique, such as nuclear imaging modalities, nuclear image data format, CNN architecture, and the main CNN classification models. According to the diseases of concern, they then classify the existing CNN-based work on the classification of functional nuclear images into three different categories. For the typical work in each of these categories, they present details about their research objectives, adopted CNN models, and achieved main results. Finally, they discuss research challenges and directions for developing technological solutions to classify nuclear medicine images based on the CNN technique.
Deep visual unsupervised domain adaptation for classification tasks: a survey
Classifying functional nuclear images with convolutional neural networks: a survey
-
- Author(s): Xuyan Zou ; Hanwu He ; Yueming Wu ; Youbin Chen ; Mingxi Xu
- Source: IET Image Processing, Volume 14, Issue 14, p. 3314 –3323
- DOI: 10.1049/iet-ipr.2019.1087
- Type: Article
- + Show details - Hide details
-
p.
3314
–3323
(10)
Three-dimensional (3D) point cloud registration is a fundamental key issue in 3D reconstruction, 3D object recognition and augmented reality. In this study, the authors propose a novel local feature descriptor called local angle statistics histogram (LASH) for efficient 3D point cloud registration. LASH forms a description of local shape geometries by encoding their properties on angles between the normal vector of the point and the vector formed by the point and other points in its local neighbourhood. In addition, they propose a 3D point cloud registration algorithm based on LASH. The registration algorithm firstly detects triangle matching points with consistent similarity ratios, and then aggregates each pair of triangular matching points into a set of matching points. They can use these matching sets to calculate multiple transformations between two point clouds. Finally, they use the error function to identify the best transformation and to achieve coarse alignment of the two point clouds. Experiments and comparisons with other global algorithms demonstrate that the proposed approach can be applied to register point clouds with considerable or limited overlaps and is robust to noise.
- Author(s): Shaoqiong Huang ; Mengxing Huang ; Yu Zhang ; Jing Chen ; Uzair Bhatti
- Source: IET Image Processing, Volume 14, Issue 14, p. 3324 –3332
- DOI: 10.1049/iet-ipr.2019.0772
- Type: Article
- + Show details - Hide details
-
p.
3324
–3332
(9)
Pre-segmentation is known as a crucial step in medical image analysis. Many approaches have been proposed to make improvement to both the quality and efficiency of segmentation. However, existing methods are lacking in robustness to the variation in the edges and textures of the target. In order to address these drawbacks, a novel attention Gabor network (AGnet) based on deep learning for medical image segmentation that is capable of automatically paying more attention to the edge and consistently for improvement to the segmentation performance is proposed. The proposed model consists of two components. The first one is to determine the approximate location of the organs of interest in the image using convolution filters, and the other one is to highlight salient edge features intended for a specific segmentation task using Gabor filters. In order to facilitate collaboration in between the two parts, a region attention mechanism based on Gabor maps is suggested. The mechanism improved performance by learning to focus on the salient regions of the image that are useful for the authors' tasks. As indicated by the experimental results, the AGnet is capable of enhancing the prediction performance while maintaining the computational efficiency, which makes it comparable with other state-of-the-art approaches.
- Author(s): Gajendra Kumar Mourya ; Dinesh Bhatia ; Akash Handique
- Source: IET Image Processing, Volume 14, Issue 14, p. 3333 –3340
- DOI: 10.1049/iet-ipr.2019.0690
- Type: Article
- + Show details - Hide details
-
p.
3333
–3340
(8)
Segmentation of the liver from 3D computed tomography volumes plays a significant role in trajectory development for computer-assisted interventional surgery for the liver disease. Despite a lot of studies, liver segmentation remains a challenging task due to the lack of clear edges on most liver boundaries coupled with high variability of both anatomical and intensity patterns. In addition, there is a problem with the segmentation of the left portal vein, in which the size of this vein prominently estimates the liver tumour area. The empirical greedy machine is proposed to make the precise, automated segmentation of the liver as well as the left portal vein. In which the empirical robust nature trains the features of the liver proficiently thereby segmenting the liver from other organs without the omission of adjacent organs and liver lobe region. Hence this proposed method can achieve one of the highest accuracies compared to other segmentation methods and the performance is calculated using several parameters such as volumetric overlap error, relative absolute volume difference (RVD), average symmetric absolute surface distance (ASD), root mean square surface distance, maximum symmetric ASD.
- Author(s): Waseem Waheed ; Mukhalad Al-nasrawi ; Guang Deng
- Source: IET Image Processing, Volume 14, Issue 14, p. 3341 –3354
- DOI: 10.1049/iet-ipr.2019.1577
- Type: Article
- + Show details - Hide details
-
p.
3341
–3354
(14)
Edge-aware smoothing has proved to be a fundamental technique for various image processing and computer vision tasks. In this study, the authors introduce a local, non-iterative, and effective edge-preserving filter namely guided adaptive interpolation filter (GAIF). GAIF can be used as a post-processing step after any smoothing filter to improve its edge preservation performance without reformulation. GAIF has an computation complexity, where N is the total number of pixels in the image. To further increase the efficiency of GAIF at edge-preservation, two techniques are introduced and demonstrated. GAIF efficiency is demonstrated and compared to state-of-the-art techniques on a number of tasks including image smoothing, flash/no-flash image denoising/fusion, single image dehazing, and image details enhancement.
- Author(s): K. Vijila Rani and S. Joseph Jawhar
- Source: IET Image Processing, Volume 14, Issue 14, p. 3355 –3365
- DOI: 10.1049/iet-ipr.2020.0407
- Type: Article
- + Show details - Hide details
-
p.
3355
–3365
(11)
A chronic disorder caused by abnormal growth of the lung cells in the pulmonary tumour. This study suggests a modern automated approach to improve efficiency and decrease the difficulty of lung tumour diagnosis. The proposed algorithm for lung tumour sensing consists of four phases: pre-processing, segmentation, extraction, and characteristics classification. The first stage is the image acquisition here input lung image is read and then resized. The second stage is the image pre-processing here Perona–Malik diffusion with unsharp masking filter is proposed for enhancement purposes. The third stage is the segmentation here the improved histogram–based fast 2D Otsu's thresholding is proposed for lung tumour segmentation purposes. Finally, linear discriminant analysis classifier, support vector machine (SVM) classifier, SVM–sequence minimal optimisation classifier, Naive Bayes classifier, SVM–advance sequence minimal optimisation (SVM–ASMO) classification [proposed] included in the various classifier groups adopted in this report. Overall performance accuracy of 0.962 is obtained using the proposed SVM–ASMO method that helps to diagnose the cancer cells using the feature extraction process, which is done automatically. The specificity, precision, recall, and F1 score of the proposed method is found to be a value of 0.984, 0.974, 0.98, and 0.984,respectively.
- Author(s): Hazique Aetesam ; Kumari Poonam ; Suman Kumar Maji
- Source: IET Image Processing, Volume 14, Issue 14, p. 3366 –3372
- DOI: 10.1049/iet-ipr.2019.1763
- Type: Article
- + Show details - Hide details
-
p.
3366
–3372
(7)
The authors present a proximal approach to hyperspectral image denoising adapted to the mixed noise behaviour of hyperspectral data; named hyperspectral image proximal denoiser (HSIProxDenoiser). A combination of Gaussian-impulse noise has been handled under maximum a posteriori framework using two data fidelity terms. They have incorporated prior information about the data in the form of two regularisation terms, namely Tikhonov–Miller (TM) and total variation (TV). Since TV possesses feature selection capability by setting some of the coefficients to zero, it works well when there are a small number of significant features. On the other hand, TM works well if there are a large number of similar features. Hence, including both regularisation terms can help achieve the desired denoising performance. The resultant optimisation problem is solved using a variant of primal-dual hybrid gradient by splitting the former into different functions and calculating their proximal operators individually. Experimental results over both synthetic as well as real hyperspectral image data validate the potential of the proposed technique both visually and in terms of quantitative metrics.
- Author(s): Sheng Xiang ; Dong Liang ; Shun'ichi Kaneko ; Hirokazu Asano
- Source: IET Image Processing, Volume 14, Issue 14, p. 3373 –3384
- DOI: 10.1049/iet-ipr.2019.0724
- Type: Article
- + Show details - Hide details
-
p.
3373
–3384
(12)
Defect detection is now an active research area for production quality assurance. Traditional visual inspection systems are developed by human beings, which is a time-consuming, labour-intensive, and highly error-prone process, and are therefore unreliable. To overcome these problems, the authors proposed a new method for detecting defects when printing on a 3D micro-textured surface. They utilise an orientation code as the basis to resist the fluctuations in illumination. Based on the consistency of the pixel pairs, they developed a model called multiple paired pixel consistency to represent the statistical relationship between each pixel pair in defect-free images. Finally, based on this model, they designed a defect detection method. Even with different defect sizes, illumination conditions, noise intensities, and other characteristics, the performance of the proposed algorithm is extremely stable and highly accurate, and the recall, precision, and F-measure in most of the results can reach 0.85,0.93, and 0.9, respectively. In addition, the defect detection rate can reach almost 100%. This demonstrates that the authors' approach can achieve state-of-the-art accuracy in real industrial applications.
- Author(s): Shizheng Zhang ; Luwen Huangfu ; Zhifeng Zhang ; Sheng Huang ; Pu Li ; Heng Wang
- Source: IET Image Processing, Volume 14, Issue 14, p. 3385 –3392
- DOI: 10.1049/iet-ipr.2020.0164
- Type: Article
- + Show details - Hide details
-
p.
3385
–3392
(8)
Corners, highly important local features of images and corner finding, play a crucial role in computer vision and image processing, such as object tracking and vehicle detection. Proposing effective and efficient corner detectors is the aim of corner detection. In this study, the authors first present a new measure of corner sharpness termed as the point-to-centroid distance (PCD) and then examine its behaviours, which display beneficial characteristics that help distinguish corners from non-corners. Based on PCD behaviours, the authors propose a novel corner detector. Extensive experimental results demonstrate that the PCD technique is effective and simultaneously efficient for corner detection compared with six other contour-based corner detectors in terms of two commonly used evaluation metrics – average repeatability and localisation error.
- Author(s): Arati Kushwaha ; Ashish Khare ; Om Prakash ; Manish Khare
- Source: IET Image Processing, Volume 14, Issue 14, p. 3393 –3404
- DOI: 10.1049/iet-ipr.2019.0960
- Type: Article
- + Show details - Hide details
-
p.
3393
–3404
(12)
Segmentation of moving object in video with moving background is a challenging problem and it becomes more difficult with varying illumination. The authors propose a dense optical flow-based background subtraction technique for object segmentation. The proposed technique is fast and reliable for segmentation of moving objects in realistic unconstrained videos. In the proposed work, they stabilise the camera motion by computing homography matrix, then they perform statistical background modelling using single Gaussian background modelling approach. Moving pixels are identified using dense optical flow in the background modelled scenario. The dense optical flow provides motion information of each pixel between consecutive frames, therefore for moving pixel identification they compute motion flow vector of each pixel between consecutive frames. To distinguish between foreground and background pixels, they labelled each pixel and thresholding the magnitude of motion flow vector identifies the moving pixels. The effectiveness of the proposed algorithm has been evaluated both qualitatively and quantitatively. The proposed algorithm has been evaluated on several realistic videos of different complex conditions. To assess the performance of the proposed work, the authors compared their algorithm with other state-of-art methods and found that the proposed method outperforms the other methods.
- Author(s): Shun Qin
- Source: IET Image Processing, Volume 14, Issue 14, p. 3405 –3413
- DOI: 10.1049/iet-ipr.2020.0194
- Type: Article
- + Show details - Hide details
-
p.
3405
–3413
(9)
L1-norm regularisation plays an important role in compressed sensing reconstruction and image restoration. However, the discontinuity of L1-norm function makes solving the involved optimisation problem very challenging with traditional optimisation methods. In this article, a simple but efficient algorithm is proposed for the L1-norm regularised compressed sensing and image restoration problem. In the proposed algorithm, the L1-norm regularised optimisation problem is converted to a non-linear optimisation problem with L1-norm approximation by a smoothening function, which then can be solved by existing powerful non-linear optimisation methods. The simulation results show that the proposed algorithm is more efficient and results in a higher accurate solution. Compared to existing methods, the proposed algorithm is very easy to implement and promising for applications in medical and biological imaging.
- Author(s): Michael Joon Seng Goh ; Yeong Shiong Chiew ; Ji Jinn Foo
- Source: IET Image Processing, Volume 14, Issue 14, p. 3414 –3421
- DOI: 10.1049/iet-ipr.2020.0334
- Type: Article
- + Show details - Hide details
-
p.
3414
–3421
(8)
Accurate and robust three-dimensional reconstruction of objects allows for applications in many aspects of modern life. Yet, it typically suffers from outliers and noise which often need to be post-processed. Although many algorithms are able to effectively remove the outliers, most require a certain amount of manual tuning of the parameter(s) or to have a parameter(s) set based on the rule of thumb. New machine learning and artificial intelligence-based methods have also been introduced but may require vast parallel computing resources as well as training data. In the present study, a novel combinatory-distance-based method capable of high accuracy outlier detection named as the sorted distance divergence point (SDDP) is introduced. Results show that SDDP is able to achieve an average accuracy of 98% in outlier detection. Moreover, the introduced distance function and outlier percentage allow clear labelling of inliers and outliers cloud points. Therefore, SDDP presents an attractive enhancement to existing methods; namely, the manual parameter(s) tuning may not be necessary. The adaptability and utility of SDDP is further demonstrated by incorporating SDDP with current methods, to produce a high accuracy outlier detector. When tested with 17 objects with 20–50% outliers, attain F1 and F2 scores averaging 0.960 and 0.968, respectively.
- Author(s): Charles-Alban Deledalle and Jérôme Gilles
- Source: IET Image Processing, Volume 14, Issue 14, p. 3422 –3432
- DOI: 10.1049/iet-ipr.2019.1442
- Type: Article
- + Show details - Hide details
-
p.
3422
–3432
(11)
A new blind image deconvolution technique is developed for atmospheric turbulence deblurring to overcome limitations of ‘generic’ blind deconvolution algorithms that do not take into account the complicated physics of the turbulence. The originality of the proposed approach relies on an actual physical model, known as the Fried kernel, that quantifies the impact of the atmospheric turbulence on the optical resolution of images. While the original expression of the Fried kernel can seem cumbersome at first sight, the authors show that it can be reparameterised in a much simpler form. This simple expression allows to efficiently embed this kernel in the proposed blind atmospheric turbulence deconvolution (BATUD) algorithm. BATUD is an iterative algorithm that alternately performs deconvolution and estimates the Fried kernel by jointly relying on a Gaussian mixture model prior to natural image patches and controlling for the square Euclidean norm of the Fried kernel. Numerical experiments show that the proposed blind deconvolution algorithm behaves well in different simulated turbulence scenarios, as well as on real images. Not only BATUD outperforms state-of-the-art approaches used in atmospheric turbulence deconvolution in terms of image quality metrics but is also faster.
- Author(s): Qingxu Fu ; Xiaoguang Di ; Yu Zhang
- Source: IET Image Processing, Volume 14, Issue 14, p. 3433 –3443
- DOI: 10.1049/iet-ipr.2020.0100
- Type: Article
- + Show details - Hide details
-
p.
3433
–3443
(11)
Low-light images suffer from severe noise and low illumination. In this work, the authors propose an adaptive low-light raw image enhancement network to avoid parameter-handcrafting in current deep learning models and to improve image quality. The proposed method can be divided into two sub-models: brightness prediction and exposure shifting (ES). The former is designed to control the brightness of the resulting image by estimating a guideline exposure time . The latter learns to approximate an exposure-shifting operator ES, converting a low-light image with real exposure time to a noise-free image with guideline exposure time . Additionally, structural similarity loss and image enhancement vector are introduced to promote image quality, and a new campus image data set (CID) for training the proposed model is proposed to overcome the limitations of the existing data sets. In quantitative tests, it is shown that the proposed method has the lowest noise level estimation score compared with the state-of-the-art low-light algorithms, suggesting a superior denoising performance. Furthermore, those tests illustrate that the proposed method is able to adaptively control the global image brightness according to the content of the image scene. Lastly, the potential application in video processing is briefly discussed.
- Author(s): Ali Mirza and Imran Siddiqi
- Source: IET Image Processing, Volume 14, Issue 14, p. 3444 –3455
- DOI: 10.1049/iet-ipr.2019.1070
- Type: Article
- + Show details - Hide details
-
p.
3444
–3455
(12)
This study focuses on cursive text recognition appearing in videos, using a complete framework of deep neural networks. While mature video optical character recognition systems (V-OCRs) are available for text in non-cursive scripts, recognition of cursive scripts is marked by many challenges. These include complex and overlapping ligatures, context-dependent shape variations and presence of a large number of dots and diacritics. The authors present an analytical technique for recognition of cursive caption text that relies on a combination of convolutional and recurrent neural networks trained in an end-to-end framework. Text lines extracted from video frames are preprocessed to segment the background and are fed to a convolutional neural network for feature extraction. The extracted feature sequences are fed to different variants of bi-directional recurrent neural networks along with the ground truth transcription to learn sequence-to-sequence mapping. Finally, a connectionist temporal classification layer is employed to produce the final transcription. Experiments on a data set of more than 40,000 text lines from 11,192 video frames of various News channel videos reported an overall character recognition rate of 97.63%. The proposed work employs Urdu text as a case study but the findings can be generalised to other cursive scripts as well.
- Author(s): S. Pramod Kumar ; Mrityunjaya V. Latte ; Sangeeta K. Siri
- Source: IET Image Processing, Volume 14, Issue 14, p. 3456 –3462
- DOI: 10.1049/iet-ipr.2020.0671
- Type: Article
- + Show details - Hide details
-
p.
3456
–3462
(7)
The authors presented a novel computerised scheme to segment pulmonary nodules using freehand sketch. Here, freehand sketching is considered to identify the location of a nodule and it serves as a natural way and also it provides inferring adaptive information i.e. size, density, texture and mass centre etc. The proposed scheme includes two phases. In first phase using the freehand sketch analysis the multi seed points to select RoI. In the second phase nodule volumetric extraction is done using geometric modelling and implicit surface reconstruction for volumetric analysis. Spherical bins are used for ray triangle intersections and then local implicit surface fitting and blending method for surface reconstruction and depiction. The performance of the proposed scheme is assessed by accuracy and consistency using 112 CT examinations from LIDC. The IoU and ASD were used to assess the discrepancy between proposed method and inter observer agreement in the proposed approach. In estimating the reproducibility, the discrepancy in proposed scheme and the manual contouring by the expert is observed to be on an average of 0.13 ± 0.07 mm and 3.04 ± 1.7 mm respectively. The experiment shows that, the proposed scheme performs reasonably well and demonstrate merit of freehand sketch.
- Author(s): Harmandeep Singh Gill and Baljit Singh Khehra
- Source: IET Image Processing, Volume 14, Issue 14, p. 3463 –3470
- DOI: 10.1049/iet-ipr.2018.5310
- Type: Article
- + Show details - Hide details
-
p.
3463
–3470
(8)
Fruit image classification is an ill-posed problem. Many machine learning techniques have been developed until now to improve the classification problem of fruit images. However, the performance of these techniques depends upon the quality of acquired fruit images. Thus, the performance of competitive fruit classification techniques reduces for images captured under poor environmental conditions, such as haze, fog, smog etc. To overcome this issue, type-II fuzzy-based fruit image improvement approach is employed to improve the visibility of weather degraded fruit images. After that, fruit images will be classified using an integrated classification model. The integrated model combines two well-known models (i.e. CNN and RNN). CNN is utilised to evaluate the discriminative features of fruit images. RNN is utilised to asses sequential labels. Extensive analysis shows that the proposed integrated classification model outperforms competitive fruit image classification techniques in terms of accuracy and coefficient of correlation.
- Author(s): Jian Bai and Xiang-Chu Feng
- Source: IET Image Processing, Volume 14, Issue 14, p. 3471 –3480
- DOI: 10.1049/iet-ipr.2018.5499
- Type: Article
- + Show details - Hide details
-
p.
3471
–3480
(10)
In this study, the authors propose a fractional derivative-based image decomposition and denoising model which decomposes the image into the cartoon component (the component formed by homogeneous regions and with sharp boundaries) and the texture (or noise) component. The cartoon component is modelled by a function of the fractional-order total bounded variation, while the texture component is modelled by an oscillatory function, bounded in the negative Sobolev space norm. The authors give the corresponding minimisation functional, after some transformations, and then the resulting fractional-order partial differential equation can be solved using the Fourier transform. By symmetry and asymmetry of the fractional-order derivative, some generalisations and variants of the proposed model are also introduced. Finally, the authors implement the algorithm by the fractional-order finite difference in the frequency-domain. The experimental results demonstrate that the proposed models make objective and visual improvements compared with other standard approaches in the task of decomposition and denoising.
- Author(s): He Yu and Nannan Yu
- Source: IET Image Processing, Volume 14, Issue 14, p. 3481 –3489
- DOI: 10.1049/iet-ipr.2020.0639
- Type: Article
- + Show details - Hide details
-
p.
3481
–3489
(9)
Conditional generative adversarial networks (GANs) have led to considerable improvements in the task of conditional image generation. However, there are still some problems, including the low definition of generated images, mode collapse, gradient disappearance and slow convergence, which need to be solved. In this study, the authors proposed a novel conditional GAN model, fewer iterations GAN (FI-GAN), that requires fewer iterations to generate controllable and realistic images. First, to improve the performance of feature extraction of the discriminator and data distribution fitting of the generator, two residual blocks are added into the discriminator and generator. Then, two new loss functions, including quality and category loss functions for both the discriminator and generator, are designed to enhance the details and distinguishability of the generated images. Finally, to avoid mode collapse and improve the convergence, the Wasserstein distance is used as the quality loss function. With the CIFAR-10 and large scene understanding datasets, FI-GAN can generate delicate and diverse colour samples with fewer iterations. It achieved better performance evaluated by three objective criteria, the inception score, Fréchet inception distance and kernel maximum mean discrepancy, compared with four methods of conditional GAN.
- Author(s): Alireza Mohammadi Anbaran ; Pooya Torkzadeh ; Reza Ebrahimpour ; Nasour Bagheri
- Source: IET Image Processing, Volume 14, Issue 14, p. 3490 –3498
- DOI: 10.1049/iet-ipr.2019.0264
- Type: Article
- + Show details - Hide details
-
p.
3490
–3498
(9)
Object recognition in the visual cortex of mammals and humans has inspired many computational object recognition models. Hierarchical model and X (HMAX) is a well-known biologically motivated object recognition model with scale and position tolerance and high accuracy. Due to the computational intensive nature, hardware implementation with massive parallel processing is suggested for real-time applications. However, it is important to explore algorithmic trade-offs when mapping an algorithm to are configurable hardware. A direct conversion of the software implementation of an algorithm generally results inefficient hardware resource usage. In this study, the authors propose a novel modification into the HMAX model which makes it suitable for hardware implementation. More precisely, to reduce the number of memory blocks and multipliers of the S2 layer of HMAX produces, they replace the first norm by the second norm, which critically affects the silicon area in an application-specific integrated circuit implementation or the required resources in field-programmable gate array (FPGA). To evaluate the proposed model, they implement a pipelined version of the revised model on a mid-range commercial Xilinx FPGA, i.e. XC6VLX240T platform from a Virtex 6 family of Xilinx using ISE. Compared to the recent hardware implementation of HMAX, the proposed model offers 83% resource degradation in DSP48 slices and 3% in memory blocks.
- Author(s): Feng Jiang ; Na Li ; Lili Zhou
- Source: IET Image Processing, Volume 14, Issue 14, p. 3499 –3507
- DOI: 10.1049/iet-ipr.2019.1761
- Type: Article
- + Show details - Hide details
-
p.
3499
–3507
(9)
Grain segmentation of sandstone images is to partition the images into non-overlapping regions, each of which is an independent mineral grain. However, a sandstone image usually contains hundreds of mineral grains and complicated micro-structures, which makes current segmentation methods inefficient. In this study, the authors propose a three-stage framework for the automatic segmentation of sandstone images. In the first stage, the input sandstone images are pre-segmented into over-segmented mineral superpixels. In the second stage, the instance-independent features are extracted by a specially designed convolutional neural network, and the instance-aware features are extracted by computing histogram statistics and Gabor responses of mineral superpixels. In the third stage, a novel weighted fuzzy clustering algorithm is proposed to cluster the mineral superpixels into different classes, afterwards the adjacent mineral superpixels are merged to yield the complete minerals according to their classes. The experimental results conducted on the sandstone image datasets demonstrate the effectiveness of the proposed method, which evidently outperform the state-of-the-art segmentation methods.
- Author(s): Chunhong Cao ; Wei Duan ; Kai Hu ; Fen Xiao
- Source: IET Image Processing, Volume 14, Issue 14, p. 3508 –3515
- DOI: 10.1049/iet-ipr.2019.0834
- Type: Article
- + Show details - Hide details
-
p.
3508
–3515
(8)
Compressive sensing magnetic resonance (MR) imaging is aimed at achieving high-quality MR image reconstruction by undersampling K-space data. It is crucial to explore prior information since compressive sensing MR imaging relies heavily on some prior assumptions, such as signal's sparse property. In this study, in order to explore the prior information fully, an improved MR image reconstruction model based on compressive sensing theory is proposed, named reference image MR imaging with adaptive tight frame. In the proposed model, an adaptive tight frame is involved to explore the sparse prior information adapt to MR images and the similarity prior information to the target image. Meanwhile, improved adaptive weighting parameters are used to trade off the sparsity between the regions with much similarity and that of little similarity. In addition, the smoothing-based fast iterative shrinkage-threshold algorithm is utilised to tackle the optimisation problem so as to speed up imaging. The experimental results demonstrate that the proposed MR image reconstruction method outperforms some state-of-the-art methods in terms of quantitative results.
- Author(s): Vishnuvarthanan Govindaraj ; Arunprasath Thiyagarajan ; Pallikonda Rajasekaran ; Yudong Zhang ; Rajesh Krishnasamy
- Source: IET Image Processing, Volume 14, Issue 14, p. 3516 –3526
- DOI: 10.1049/iet-ipr.2020.0597
- Type: Article
- + Show details - Hide details
-
p.
3516
–3526
(11)
This research study is intended to deliver effective magnetic resonance (MR) brain image segmentation, which is an ambiguous process in the domain of medical image analysis. In general, MR brain image comprises various tissue structures; and an accurate representation of the above-mentioned regions is essential to have a perfect identification of different grades of tumours, and obtaining effective demarcation of different areas in which the oedema portion is widespread. The accurate representation and identification of the abnormal regions in the MR images can be a vital tool for the radiologists and oncologists to proceed further with the treatment processes. This study aims in developing a novel automated approach that combines self-organising map and interval type-2 fuzzy logic clustering, providing ample knowledge to the clinicians in identifying the aberrant regions present in the patient brain. A non-invasive analysis blended with quicker segmentation results are proffered by the proposed methodology and its functioning abilities have been assessed using comparison metrics such as mean-squared error (MSE), peak signal-to-noise ratio (PSNR), processing time duration, and few other standard metrics. The proposed methodology has offered commendable MSE and PSNR values, which are 0.234778 and 54.847 dB, and it can be undeniably utilised for analysing the patient diseases.
- Author(s): Suncheng Xiang ; Yuzhuo Fu ; Ting Liu
- Source: IET Image Processing, Volume 14, Issue 14, p. 3527 –3535
- DOI: 10.1049/iet-ipr.2020.0166
- Type: Article
- + Show details - Hide details
-
p.
3527
–3535
(9)
This article studies a novel transfer learning problem termed distant domain transfer learning. Different from traditional transfer learning which assumes there is a close relation between source and target data, in this study, the objective is to execute an unseen and unrelated task based on a labelled data set training previously without any samples from intermediate domains. To this end, the authors propose deep unsupervised progressive learning (DUPL) framework and its upgraded version, end-to-end DUPL (eDUPL). eDUPL consists of two components, i.e. (i) translating the style of labelled images from irrelevant source domain to the target domain and (ii) learning a domain adaptation model with progressive learning for testing on the target domain. In comparison, eDUPL can integrate the two components of the framework seamlessly. In general, the proposed method is easy to be implemented and can be viewed as a strong convolutional baseline for distant domain adaptation task. Comprehensive experiments based on VeRi Vehicle, CUB-200-2011 Birds and Oxford5k Buildings data sets are conducted and the results indicate that the proposed method robustly achieves state-of-the-art performances compared with existing approaches, which demonstrates the effectiveness and superiority of the proposed algorithm.
- Author(s): Noushin Hajarolasvadi and Hasan Demirel
- Source: IET Image Processing, Volume 14, Issue 14, p. 3536 –3546
- DOI: 10.1049/iet-ipr.2019.1566
- Type: Article
- + Show details - Hide details
-
p.
3536
–3546
(11)
Recently, video-based facial emotion recognition (FER) has been an attractive topic in the computer vision society. However, processing several hundreds of frames for a single video of a particular emotion is not efficient. In this study, the authors propose a novel approach to obtain a representative set of frames for a video in the eigenspace domain. Principal component analysis (PCA) is applied to a single emotional video extracting the most significant eigenframes representing the temporal motion variance embedded in the video. Given that faces are segmented and normalised, the variance captured by PCA is attributed to the facial expression dynamics. The variation in the temporal domain is mapped to the eigenspace reducing the redundancy. The proposed approach is used to extract the input eigenframes. Later, VGG-16, ResNet50, and 2D and 3D CNN architectures called eigenFaceNet are trained on the RML, eNTERFACE'05, and AFEW 6.0 databases. The experimental results are superior to the state-of-the-art by 8 and 4% for RML, eNTERFACE'05 databases, respectively. The performance achievement is also coupled with a reduction in the computational time.
- Author(s): Huan Yang ; Hongwei Li ; Yuping Duan
- Source: IET Image Processing, Volume 14, Issue 14, p. 3547 –3561
- DOI: 10.1049/iet-ipr.2019.1097
- Type: Article
- + Show details - Hide details
-
p.
3547
–3561
(15)
Rician noise reduction is an essential issue in magnetic resonance imaging (MRI). Recently, learning-based methods have achieved great success in dealing with image restoration problems, which provide fast inference and good performance. One limitation of these methods, however, is that the training procedure is usually noise-level dependent, i.e. the trained models are bound to a specific noise level and lack the ability to automatically adapt to different noise levels. In this study, the authors propose a variational model for Rician noise removal by integrating a noise adaption function into the field of experts image prior, which can adapt to different noise levels. Instead of directly solving the energy minimisation problem, the authors unroll the gradient descent step of the energy functional for several iterations, the time-dependent parameters of which can be learned through a supervised training process. The authors call this methodology as the noise adaptive trainable non-linear reaction–diffusion model. The proposed methodology is robustness against noise level changing and noise distributions. Experimental results over -, - and PD-weighted MRI data set demonstrate that the proposed model can achieve superior performance compared with other methods in terms of both the peak signal-to-noise ratio and the structural similarity index.
- Author(s): Lei Li ; Zhaoqiang Xia ; Huijian Han ; Guiqing He ; Fabio Roli ; Xiaoyi Feng
- Source: IET Image Processing, Volume 14, Issue 14, p. 3562 –3571
- DOI: 10.1049/iet-ipr.2020.0360
- Type: Article
- + Show details - Hide details
-
p.
3562
–3571
(10)
In recent years, image fusion methods based on deep networks have been proposed to combine infrared and visible images for achieving better fusion image. However, issues such as limited training data, scarce reference images and misalignment of multi-source images, still limit the fusion performance. To address these problems, we propose an end-to-end shallow convolutional neural network with structural constraints, which has only one convolutional layer to fuse infrared and visible images. Different from other methods, our proposed model requires less training data and reference images and is more robust to the misalignment of a couple of images. More specifically, the infrared image and the visible image are first provided as inputs to a convolutional layer to extract the information that should be fused; then, all feature maps are concatenated together and fed into a convolutional layer with one channel to obtain the fused image; finally, a structural similarity loss between the fused image and the input infrared and visible images is computed to update the network parameters and eliminate the effects of pixel misalignment. Extensive experiments show the effectiveness of our proposed method on fusion of infrared and visible images with the performance that outperforms the state-of-the-art methods.
- Author(s): Osama A.S. Alkishriwo
- Source: IET Image Processing, Volume 14, Issue 14, p. 3572 –3578
- DOI: 10.1049/iet-ipr.2019.1699
- Type: Article
- + Show details - Hide details
-
p.
3572
–3578
(7)
With the growth of modern digital technologies, demand for transmission multimedia and digital images, which require more storage space and transmission bandwidth, has been increased rapidly. Hence, developing new image compression techniques for reducing data size without degrading the quality of the image, has gained a lot of interest recently. In this study, an adaptive multiresolution image decomposition (AMID) algorithm is proposed and its application for image compression is explored. The developed algorithm is capable of decomposing an image along the vertical, horizontal, and diagonal directions using the pyramidal multiresolution scheme. Compared to the wavelet transform, the AMID can be used for decimation with the guarantees of perfect signal reconstruction. Furthermore, the application of the AMID for image compression is explored and its performance is compared with the state-of-the-art image compression techniques. The performance of compression method is evaluated using peak signal-to-noise ratio and compression ratio. Experimental results have shown promising performance compared with the results of using other image compression approaches.
- Author(s): Jin Sun ; Zhe Zhang ; Liutao Yang ; Jiping Zheng
- Source: IET Image Processing, Volume 14, Issue 14, p. 3579 –3587
- DOI: 10.1049/iet-ipr.2019.0924
- Type: Article
- + Show details - Hide details
-
p.
3579
–3587
(9)
Computational cost-ineffectiveness and ambiguity due to self-occlusion are bottlenecks of vision-based hand gesture recognition. In this study, the authors address these issues by proposing a novel multi-view hand gesture recognition method based on Pareto optimal front. They first present an oriented gradient local binary pattern operator to generate a groupwise gesture feature data set for multi-view hand gesture images. Then they take hand gesture recognition over multi-view hand images as a multi-query image retrieval problem and Pareto optimal front is constructed based on the dissimilarities between the testing images and sample images. The gesture corresponding to the point with the shortest distance to the origin on the Pareto optimal front is the final recognised result. Extensive experiments verify the accuracy and efficiency of their Pareto optimal font-based method.
- Author(s): Guoliang Hu ; Zuofeng Zhou ; Jianzhong Cao ; Huimin Huang
- Source: IET Image Processing, Volume 14, Issue 14, p. 3588 –3595
- DOI: 10.1049/iet-ipr.2019.1525
- Type: Article
- + Show details - Hide details
-
p.
3588
–3595
(8)
The precision of the camera calibration is one of the key factors that affect attitude measurement accuracy in many computer vision tasks. This study proposes a new calibration approach for binocular cameras. Firstly, based on singular value decomposition, the best transformation matrix to the essential matrix is approximated as the initial guess, which is solved in using the Frobenius norm. Secondly, the initial guess is refined through maximum likelihood estimation. A new calculating expression is derived for computing the relative position matrix of the binocular cameras. The Levenberg–Marquardt algorithm is then implemented to refine the initial guess. Large sets of synthesised and real point correspondences were tested to demonstrate the validity of the proposed method. Extensive experiments demonstrated that the proposed method outperforms the state-of-the-art methods. The error rate of the proposed method was 0.5% for the length test and about 1% for the angle test at a range of 1 m. This method can advance three-dimensional (3D) computer vision one additional step from laboratory environments to real-world use.
- Author(s): Haiqiao Liu ; Shibin Luo ; Jiazhen Lu
- Source: IET Image Processing, Volume 14, Issue 14, p. 3596 –3601
- DOI: 10.1049/iet-ipr.2019.1657
- Type: Article
- + Show details - Hide details
-
p.
3596
–3601
(6)
The matching algorithm is an important part of simultaneous location and mapping. Aiming at the problem of large computation and poor real-time performance of two-dimensional lidar traditional correlation scan matching (CSM) algorithm, a multi-resolution auxiliary historical point cloud matching algorithm is proposed, which combines high and low resolution and adopts a single-frame to multi-frame step-by-step matching scheme. The algorithm was carried out on the sweeping robot. Compared with the traditional CSM algorithm and iterative closest points algorithm, the single position accuracy of the method in this study is improved. In the indoor space of ∼10 m × 10 m, the cumulative error is reduced by 16.24 and 33.96%, respectively. Consequently, our algorithm can still manage to process in real-time.
- Author(s): Baiju P.S. ; Deepak Jayan P. ; Sudhish N. George
- Source: IET Image Processing, Volume 14, Issue 14, p. 3602 –3612
- DOI: 10.1049/iet-ipr.2019.1409
- Type: Article
- + Show details - Hide details
-
p.
3602
–3612
(11)
Outdoor monitoring systems are known to exhibit better performance under normal weather conditions, while it lacks effectiveness under inclement conditions. Often video footage captured by the camera under rainy conditions comprises several visual distortions. It eventually leads to flaws when handled with succeeding computer vision algorithms, namely the object identification and tracking. Additionally, eliminating such unpleasant rainy effects is essential prior to the processing of video footage by suitable algorithms. The present work attempts to formulate a new low-rank tensor recovery based deraining algorithm that enables to remove the rain streaks from video footage. The proposed method detects the rain streaks by adopting optical flow estimation along with the brightness features inherent with the rain streaks. A unified framework comprised of tensor singular value decomposition (t-SVD) based weighted nuclear norm minimisation and tensor total variation (TTV) regularisation effectively removes rain streaks and recovers the original rain-free data from the available rainy data. The use of t-SVD enforces the concept of low rankness and also exploits the temporal redundancy among the video frames. Furthermore, TTV regularisation facilitates to promote the temporal continuity for discriminating most of the natural image contents from sparse rain streaks by preserving piece-wise smoothness of video frames. Comprehensive experimental findings based on real and synthetic data with dynamic background show that the rain streaks are more efficaciously eliminated by adopting the proposed method without much loss in the information.
- Author(s): Junjie Deng ; Gege Luo ; Caidan Zhao
- Source: IET Image Processing, Volume 14, Issue 14, p. 3613 –3622
- DOI: 10.1049/iet-ipr.2020.0003
- Type: Article
- + Show details - Hide details
-
p.
3613
–3622
(10)
Underwater image enhancement algorithms improve image quality and indirectly enhance underwater visibility. Although many underwater image enhancement neural networks have been proposed, they require large amounts of data. To reduce the amount of data required while providing better image enhancement, this study proposes an underwater image colour transfer generative adversarial network (UCT-GAN). The authors first design a non-linear mapping function to generate colour cast images according to original images. Then, the authors utilise these image pairs (i.e. colour cast images and corresponding original images) to guide the UCT-GAN in learning the inverse function of the designed non-linear mapping function. Finally, colour cast images are restored via the inverse function. A data augmentation method based on Poisson fusion and block combination is also proposed to overcome the problem of requiring a large amount of training data. Moreover, the authors extend UCT-GAN into a multi-class colour transfer network to achieve an array of underwater image enhancements. Experimental results indicate that the proposed UCT-GAN can more effectively resolve underwater image colour cast compared to existing algorithms.
- Author(s): Yang Zhou ; Xiaoqi Liu ; Yun Zhang ; Haibing Yin ; Yu Lu
- Source: IET Image Processing, Volume 14, Issue 14, p. 3623 –3631
- DOI: 10.1049/iet-ipr.2019.1495
- Type: Article
- + Show details - Hide details
-
p.
3623
–3631
(9)
It can be intuitively inferred that a high-quality depth map can be used to quickly detect the salient region in stereo vision, implying that depth information plays an essential role in stereoscopic visual attention. However, existing methods generally use the depth map as an auxiliary cue to improve the saliency detection performance. In this study, the authors present an algorithm to directly detect the salient object from a high-quality depth image. The proposed algorithm utilises a depth reliability indicator to assess the confidence of a depth image. Depth compactness, a novel feature that incorporates the depth reliability of the super-pixels, is computed as a primary salient feature. Moreover, in order to enhance another salient feature (i.e. depth contrast), they develop a coarse background filtering method to suppress background interference. Experimental results demonstrate that the proposed method performs favourably against the popular depth-aware saliency detection approaches at a lower computational cost.
Automatic 3D point cloud registration algorithm based on triangle similarity ratio consistency
Medical image segmentation using deep learning with feature enhancement
Empirical greedy machine-based automatic liver segmentation in CT images
Guided adaptive interpolation filter
Automatic segmentation and classification of lung tumour using advance sequential minimal optimisation techniques
Proximal approach to denoising hyperspectral images under mixed-noise model
Robust defect detection in 2D images printed on 3D micro-textured surfaces by multiple paired pixel consistency in orientation codes
Corner detection using the point-to-centroid distance technique
Dense optical flow based background subtraction technique for object segmentation in moving camera environment
Simple algorithm for L1-norm regularisation-based compressed sensing and image restoration
Outlier percentage estimation for shape- and parameter-independent outlier detection
Blind atmospheric turbulence deconvolution
Learning an adaptive model for extreme low-light raw image processing
Recognition of cursive video text using a deep learning framework
Volumetric lung nodule segmentation in thoracic CT scan using freehand sketch
Efficient image classification technique for weather degraded fruit images
Image decomposition and denoising using fractional-order partial differential equations
FI-GAN: fewer iterations GAN for rapid synthesis controllable realistic images
Modification and hardware implementation of cortex-like object recognition model
Grain segmentation of sandstone images based on convolutional neural networks and weighted fuzzy clustering
Compressive sensing MR imaging based on adaptive tight frame and reference image
Automated unsupervised learning-based clustering approach for effective anomaly detection in brain magnetic resonance imaging (MRI)
Progressive learning with style transfer for distant domain adaptation
Deep facial emotion recognition in video using eigenframes
Adaptive trainable non-linear reaction diffusion for Rician noise removal
Infrared and visible image fusion using a shallow CNN and structural similarity constraint
Image compression using adaptive multiresolution image decomposition algorithm
Multi-view hand gesture recognition via pareto optimal front
Highly accurate 3D reconstruction based on a precise and robust binocular camera calibration method
Correlation scan matching algorithm based on multi-resolution auxiliary historical point cloud and lidar simultaneous localisation and mapping positioning application
Tensor total variation regularised low-rank approximation framework for video deraining
UCT-GAN: underwater image colour transfer generative adversarial network
Salient object detection via reliability-based depth compactness and depth contrast
Most viewed content
Most cited content for this Journal
-
Medical image segmentation using deep learning: A survey
- Author(s): Risheng Wang ; Tao Lei ; Ruixia Cui ; Bingtao Zhang ; Hongying Meng ; Asoke K. Nandi
- Type: Article
-
Block-based discrete wavelet transform-singular value decomposition image watermarking scheme using human visual system characteristics
- Author(s): Nasrin M. Makbol ; Bee Ee Khoo ; Taha H. Rassem
- Type: Article
-
Classification of malignant melanoma and benign skin lesions: implementation of automatic ABCD rule
- Author(s): Reda Kasmi and Karim Mokrani
- Type: Article
-
Digital image watermarking method based on DCT and fractal encoding
- Author(s): Shuai Liu ; Zheng Pan ; Houbing Song
- Type: Article
-
Tomato leaf disease classification by exploiting transfer learning and feature concatenation
- Author(s): Mehdhar S. A. M. Al‐gaashani ; Fengjun Shang ; Mohammed S. A. Muthanna ; Mashael Khayyat ; Ahmed A. Abd El‐Latif
- Type: Article