IET Image Processing
Volume 14, Issue 11, 18 September 2020
Volumes & issues:
Volume 14, Issue 11
18 September 2020
-
- Author(s): Kai Li ; Shenghao Yang ; Runting Dong ; Xiaoying Wang ; Jianqiang Huang
- Source: IET Image Processing, Volume 14, Issue 11, p. 2273 –2290
- DOI: 10.1049/iet-ipr.2019.1438
- Type: Article
- + Show details - Hide details
-
p.
2273
–2290
(18)
Image super-resolution reconstruction refers to a technique of recovering a high-resolution (HR) image (or multiple images) from a low-resolution (LR) degraded image (or multiple images). Due to the breakthrough progress in deep learning in other computer vision tasks, people try to introduce deep neural network and solve the problem of image super-resolution reconstruction by constructing a deep-level network for end-to-end training. The currently used deep learning models can divide the SISR model into four types: interpolation-based preprocessing-based model, original image processing based model, hierarchical feature-based model, and high-frequency detail-based model, or shared the network model. The current challenges for super-resolution reconstruction are mainly reflected in the actual application process, such as encountering an unknown scaling factor, losing paired LR–HR images, and so on.
- Author(s): Mujtaba Husnain ; Malik Muhammad Saad Missen ; Shahzad Mumtaz ; Mickaël Coustaty ; Muzzamil Luqman ; Jean-Marc Ogier
- Source: IET Image Processing, Volume 14, Issue 11, p. 2291 –2300
- DOI: 10.1049/iet-ipr.2019.0401
- Type: Article
- + Show details - Hide details
-
p.
2291
–2300
(10)
Work on the problem of handwritten text recognition in Urdu script has been an active research area. A significant progress is made in this interesting and challenging field in the last few years. In this study, the authors presented a comprehensive survey for a number of offline and online handwritten text recognition systems for Urdu script written in Nastaliq font style from 2004 to 2019. Following features make their contribution worthwhile and unique among the reviews of a similar kind: (i) their review classifies the existing studies based on types of recognition systems used for Urdu handwritten text, (ii) it covers a very different outlook of the recognition process of the Urdu handwritten text at different granularity levels (e.g. character, word, ligature, or sentence level), (iii) this review article also presents each of surveyed articles in following dimensions: the task performed, its granularity level, dataset used, results obtained, and future dimensions, and (iv) lastly it gives the summary of the surveyed articles according to the granularity levels, publishing years, related tasks or subtasks, and types of classifiers used. In the end, major challenges and tasks related to Urdu handwritten text recognition approaches are also discussed in detail.
Survey of single image super-resolution reconstruction
Urdu handwritten text recognition: a survey
-
- Author(s): Said Benierbah and Mohammed Khamadja
- Source: IET Image Processing, Volume 14, Issue 11, p. 2301 –2309
- DOI: 10.1049/iet-ipr.2018.5942
- Type: Article
- + Show details - Hide details
-
p.
2301
–2309
(9)
In this study, the authors will show that coding the positions of the symbols, instead of their values, can be a good way to implement efficient Slepian–Wolf (SW) coding and can reduce the complexity of both the encoder and the decoder. The authors will also propose a practical distributed video coding (DVC) system that exploits this idea. This system will use binary maps to indicate the positions of the most probable symbols, instead of separating them into bitplanes. Simulations show that this position-based SW coding allows a simple and more efficient DVC system with improved rate-distortion performance, compared to the bitplane-based DVC system that uses the same side information. The memory requirements at the encoder are reduced by about 50% and the number of channel coding operations is also reduced. This DVC system also allows to use an easy way to control quantisation, reduces the decoding latency, and allows fast parallel decoding.
- Author(s): Jieyu Zheng and LingFeng Liu
- Source: IET Image Processing, Volume 14, Issue 11, p. 2310 –2320
- DOI: 10.1049/iet-ipr.2019.1340
- Type: Article
- + Show details - Hide details
-
p.
2310
–2320
(11)
The one-dimensional (2D) chaotic encryption algorithm has good encryption performance. For its properties, such as the excellent complexity, pseudo-randomness, and sensitivity to the initial value of the chaotic sequence. However, compared with other methods, its biggest drawback is that the key space is too small. To address these problems, in this study, the authors introduce an improved 2D logistic sine chaotic map (2D-LSMM). A novel image encryption scheme based on dynamic DNA sequences encryption and improved 2D-LSMM is presented. The logistic map is used to control the input of the sine map. And the encoding and operation rules of DNA sequences are determined by 2D-LSMM chaotic sequences. By implementing dynamic DNA sequence encryption, the encryption process becomes more complicated and harder to be attacked. Simulation experimental results and security analysis show that the authors’ encryption scheme not only achieves proper encryption but can also resist different attacks.
- Author(s): Ziyang Song ; Samr Ali ; Nizar Bouguila
- Source: IET Image Processing, Volume 14, Issue 11, p. 2321 –2332
- DOI: 10.1049/iet-ipr.2019.1029
- Type: Article
- + Show details - Hide details
-
p.
2321
–2332
(12)
Mixture models are broadly applied in image processing domains. Related existing challenges include failure to approximate exact data shapes, estimate correct number of components, and ignore irrelevant features. In this study, the authors develop a statistical self-refinement framework for the background subtraction task by using Dirichlet Process-based asymmetric Gaussian mixture model. The parameters of this model are learned using variational inference methods. They also incorporate feature selection simultaneously within the framework to avoid noisy influence from uninformative features. To validate the proposed framework, they report their results on background subtraction tasks on 8 different datasets for infrared and visible videos.
- Author(s): Mubeen Ghafoor ; Syed Ali Tariq ; Imtiaz A. Taj ; Noman M. Jafri ; Tehseen Zia
- Source: IET Image Processing, Volume 14, Issue 11, p. 2333 –2342
- DOI: 10.1049/iet-ipr.2018.5736
- Type: Article
- + Show details - Hide details
-
p.
2333
–2342
(10)
Palmprint-based human authentication has shown great potential for civil, forensic, and corporate security applications in recent years. Palmprint recognition systems suffer because of large palmprint sizes and the presence of a large number of creases and erroneous minutiae that make the enhancement and matching phases a challenge. In this study, a novel approach is presented based on efficient enhancement and a two-stage matching technique that demonstrates highly accurate identification results. The enhancement approach extracts minutia features from high-quality regions based on local ridge characteristics. The selected minutiae are then matched using a two-stage local and global minutiae neighbour-based matching technique. To demonstrate the performance of the proposed technique, comparisons with open-source algorithms are made based on equal error rate and detection error trade-off graph. The results confirm the efficacy of proposed palmprint enhancement and identification technique.
- Author(s): Yunlong Gao ; Chengyu Yang ; Kuo-Yi Lin ; Jinyan Pan ; Li Li
- Source: IET Image Processing, Volume 14, Issue 11, p. 2343 –2355
- DOI: 10.1049/iet-ipr.2019.0253
- Type: Article
- + Show details - Hide details
-
p.
2343
–2355
(13)
Fuzzy c-means algorithms have been widely utilised in several areas such as image segmentation, pattern recognition and data mining. However, the related studies showed the limitations in facing imbalanced datasets. The maximum fuzzy boundary tends to be located on the largest cluster which is not desirable. The overall fuzzy partition results in false grouping of edge objects and weakens the compactness of cluster. It is important the clusters are delineated by the maximum fuzzy boundary. In this study, a semi-fuzzy c-means algorithm is proposed by combining hard partition and soft partition. This study aims to provide an effective partition for the edge objects, such that the compactness of cluster can be improved. The proposed algorithm integrates the semi-fuzzy c-means method with the size-insensitive integrity-based fuzzy c-means algorithm. In particular, the latter algorithm has the ability to deal with imbalanced data. With the experiment validation, the proposed algorithm is robust and outperforms the two component algorithms by using synthetic and widely known benchmark datasets.
- Author(s): Premanand Pralhad Ghadekar
- Source: IET Image Processing, Volume 14, Issue 11, p. 2356 –2364
- DOI: 10.1049/iet-ipr.2019.0984
- Type: Article
- + Show details - Hide details
-
p.
2356
–2364
(9)
The texture is a repetition of a particular structure. Textures classified into static texture and dynamic texture. There are two approaches for synthesising dynamic textures: image-based approach and physics-based approach. In the proposed work, the synthesis of different dynamic textures is shown. The physics-based approach is used to synthesise dynamic texture videos. Different mathematical models are proposed which are suitable to give appropriate motion to the dynamic textures. The raw frames are created, and these frames are further used to synthesise the dynamic textures using physics laws and mathematical formulae. The flexibility of each model is demonstrated. The proposed model has less specificity and less computational complexity. The proposed algorithms are implemented on the graphics processing unit to reduce the overall execution time and time complexity. High-quality videos are produced, and the evaluation of every frame assures the quality.
- Author(s): Farah Deeba ; She Kun ; Fayaz Ali Dharejo ; Yuanchun Zhou
- Source: IET Image Processing, Volume 14, Issue 11, p. 2365 –2375
- DOI: 10.1049/iet-ipr.2019.1312
- Type: Article
- + Show details - Hide details
-
p.
2365
–2375
(11)
It is very interesting to reconstruct high-resolution computed tomography (CT) medical images that are very useful for clinicians to analyse the diseases. This study proposes an improved super-resolution method for CT medical images in the sparse representation domain with dictionary learning. The sparse coupled K-singular value decomposition (KSVD) algorithm is employed for dictionary learning purposes. Images are divided into two sets of low resolution (LR) and high resolution (HR), to improve the quality of low-resolution images, the authors prepare dictionaries over LR and HR image patches using the KSVD algorithm. The main idea behind the proposed method is that sparse coupled dictionaries learn about each patch and establish the relationship between sparse coefficients of LR and HR image patches to recover the HR image patch for LR image. The proposed method is compared to conventional algorithms in terms of mean peak signal-to-noise ratio and structural similarity index measurements by using three different data set images, including CT chest, CT dental and CT brain images. The authors also analysed the proposed improved method for different dictionary sizes and patch size to obtain a similar high-resolution image. These parameters play an essential role in the reconstruction of the HR images.
- Author(s): Xingjiao Wu ; Shuchen Kong ; Yingbin Zheng ; Hao Ye ; Jing Yang ; Liang He
- Source: IET Image Processing, Volume 14, Issue 11, p. 2376 –2382
- DOI: 10.1049/iet-ipr.2019.1308
- Type: Article
- + Show details - Hide details
-
p.
2376
–2382
(7)
Crowd counting, i.e. count the number of people in a crowded visual space, is emerging as an essential research problem with public security. A key in the design of the crowd counting system is to create a stable and accurate robust model, which requires to process on the feature channels of the counting network. In this study, the authors present a featured channel enhancement (FCE) block for crowd counting. First, they use a feature extraction unit to obtain the information of each channel and encodes the information of each channel. Then use a non-linear variation unit to deal with the encoded channel information, finally, normalise the data and affixed to each channel separately. With the use of the FCE, the positive characteristic channel can be enhanced and weak or negative channel information can be suppressed. The authors successfully incorporate the FCE with two compact networks on the standard benchmarks and prove that the proposed FCE achieves promising results.
- Author(s): Sangeetha Damotharasamy
- Source: IET Image Processing, Volume 14, Issue 11, p. 2383 –2394
- DOI: 10.1049/iet-ipr.2018.5961
- Type: Article
- + Show details - Hide details
-
p.
2383
–2394
(12)
In human tracking, sparse representation successfully localises the human in a video with minimal reconstruction error using target templates. However, the state-of-the-art approaches use colour and local appearance of a human to discriminate the human from the background regions, and hence fail when the human is occluded and appears in the varying illumination environment. In this study, a robust tracking algorithm is proposed that utilises gradient orientation and fine and coarse sparse representation of the target template. Sparse representation-based human appearance model utilises weighted gradient orientation that is insensitive to illumination variation. Coarse and fine representation of sparse code facilitates tracking under varying scales. Subspace learning from image gradient orientation is enforced with occlusion detection during the dictionary updation stage to capture the visual characteristics of the local human appearance that supports tracking under partial occlusion with lesser tracking error. The proposed human tracking algorithm is evaluated on various datasets and shows efficient human tracking performance when compared to the other state-of-the-art approaches. Furthermore, the proposed human tracking algorithm is suitable for surveillance applications.
- Author(s): Wei Ji ; Jing Guo ; Yun Li
- Source: IET Image Processing, Volume 14, Issue 11, p. 2395 –2402
- DOI: 10.1049/iet-ipr.2019.1153
- Type: Article
- + Show details - Hide details
-
p.
2395
–2402
(8)
The image-to-image translation, i.e. from source image domain to target image domain, has made significant progress in recent years. The most popular method for unpaired image-to-image translation is CycleGAN. However, it always cannot accurately and rapidly learn the key features in target domains. So, the CycleGAN model learns slowly and the translation quality needs to be improved. In this study, a multi-head mutual-attention CycleGAN (MMA-CycleGAN) model is proposed for unpaired image-to-image translation. In MMA-CycleGAN, the cycle-consistency loss and adversarial loss in CycleGAN are still used, but a mutual-attention (MA) mechanism is introduced, which allows attention-driven, long-range dependency modelling between the two image domains. Moreover, to efficiently deal with the large image size, the MA is further improved to the multi-head mutual-attention (MMA) mechanism. On the other hand, domain labels are adopted to simplify the MMA-CycleGAN architecture, so only one generator is required to perform bidirectional translation tasks. Experiments on multiple datasets demonstrate MMA-CycleGAN is able to learn rapidly and obtain photo-realistic images in a shorter time than CycleGAN.
- Author(s): Amita Shinde ; Amol Rahulkar ; Chetankumar Patil
- Source: IET Image Processing, Volume 14, Issue 11, p. 2403 –2416
- DOI: 10.1049/iet-ipr.2019.0252
- Type: Article
- + Show details - Hide details
-
p.
2403
–2416
(14)
This study presents a flexible directional filter bank (DFB) by tuning Hermite transform parameters for content-based medical image retrieval (CBMIR). Two imperative Hermite transform parameters such as the scale of Gaussian kernel and the order of Hermite polynomial are tuned to adapt the flexible Hermite orthogonal basis to improve the retrieval performance of the CBMIR system. First, the tuned 1D Hermite filter transformed into 2D diamond shape filter by McClellan transformation and made it adequate for the directional decomposition of images using DFB structure. The dominant features of these directionally decomposed images are extracted by the rotation-invariant local neighbourhood frequency pattern. The mean retrieval precision is used as a high-level criterion to measure retrieval performance. The proposed method is assessed on three medical image databases: two for computed tomography and one for magnetic resonance imaging and achieved 99.41, 89.07, and 88.71% mean precision, respectively, when ten images are returned by the system.
- Author(s): Shan Cao ; Yuqian Yao ; Gaoyun An
- Source: IET Image Processing, Volume 14, Issue 11, p. 2417 –2424
- DOI: 10.1049/iet-ipr.2020.0063
- Type: Article
- + Show details - Hide details
-
p.
2417
–2424
(8)
Capsule neural network is a new and popular technique in deep learning. However, the traditional capsule neural network does not extract features sufficiently before the dynamic routing between capsules. In this study, one double enhanced capsule neural network (E2-Capsnet) that uses AU-aware attention for facial expression recognition (FER) is proposed. The E2-Capsnet takes advantage of dynamic routing between capsules and has two enhancement modules which are beneficial to FER. The first enhancement module is the convolutional neural network with AU-aware attention, which can focus on the active areas of the expression. The second enhancement module is the capsule neural network with multiple convolutional layers, which enhances the ability of the feature representation. Finally, the squashing function is used to classify the facial expression. The authors demonstrate the effectiveness of E2-Capsnet on the two public benchmark datasets, RAF-DB and EmotioNet. The experimental results show that their E2-Capsnet is superior to the state-of-the-art methods. The code is available at https://github.com/ShanCao18/E2-Capsnet.
- Author(s): Gurprem Singh ; Ajay Mittal ; Naveen Aggarwal
- Source: IET Image Processing, Volume 14, Issue 11, p. 2425 –2434
- DOI: 10.1049/iet-ipr.2019.0623
- Type: Article
- + Show details - Hide details
-
p.
2425
–2434
(10)
Image denoising is a thoroughly studied research problem in the areas of image processing and computer vision. In this work, a deep convolution neural network with added benefits of residual learning for image denoising is proposed. The network is composed of convolution layers and ResNet blocks along with rectified linear unit activation functions. The network is capable of learning end-to-end mappings from noise distorted images to restored cleaner versions. The deeper networks tend to be challenging to train and often are posed with the problem of vanishing gradients. The residual learning and orthogonal kernel initialisation keep the gradients in check. The skip connections in the ResNet blocks pass on the learned abstractions further down the network in the forward pass, thus achieving better results. With a single model, one can tackle different levels of Gaussian noise efficiently. The experiments conducted on the benchmark datasets prove that the proposed model obtains a significant improvement in structural similarity index than the previously existing state-of-the-art techniques.
- Author(s): Long-Hua Ma ; Hang-Yu Fan ; Zhe-Ming Lu ; Dong Tian
- Source: IET Image Processing, Volume 14, Issue 11, p. 2435 –2441
- DOI: 10.1049/iet-ipr.2019.0141
- Type: Article
- + Show details - Hide details
-
p.
2435
–2441
(7)
Multi-task cascaded convolutional neural network (MTCNN) is a human face detection architecture which uses a cascaded structure with three stages (P-Net, R-Net and O-Net). The authors intend to reduce the computation time of the whole process of the MTCNN. They find that the non-maximum suppression (NMS) processes after the P-Net occupy over half of the computation time. Therefore, the authors propose a self-fine-tuning method which makes the control of computation time for the NMS process easier. Self-fine-tuning is a training trick which uses hard samples generated by P-Net to retrain P-Net. After self-fine-tuning, the distribution of human face probabilities generated by P-Net is changed, and the tail of distribution becomes thinner. The control of the number of NMS input boxes can be made easier when the distribution has a thinner tail, and choosing a suitable threshold to filter the face boxes will generate less boxes. So the computation time can be reduced. In order to keep the performance of MTCNN, the authors still propose a landmark data set augmentation, which can enhance the performance of the self-fine-tuned MTCNN. From the experiments, it is found that the proposed scheme can significantly reduce the computation time of MTCNN.
- Author(s): Kazy Noor e Alam Siddiquee ; Md. Shabiul Islam ; Mohammad Yasin Ud Dowla ; Karim Mohammed Rezaul ; Vic Grout
- Source: IET Image Processing, Volume 14, Issue 11, p. 2442 –2456
- DOI: 10.1049/iet-ipr.2019.0738
- Type: Article
- + Show details - Hide details
-
p.
2442
–2456
(15)
In this study, specifically for the detection of ripe/unripe tomatoes with/without defects in the crop field, two distinct methods are described and compared from captured images by a camera mounted on a mobile robot. One is a machine learning approach, known as ‘Cascaded Object Detector’ (COD) and the other is a composition of traditional customised methods, individually known as ‘Colour Transformation’: ‘Colour Segmentation’ and ‘Circular Hough Transformation’. The (Viola-Jones) COD generates ‘histogram of oriented gradient’ (HOG) features to detect tomatoes. For ripeness checking, the RGB mean is calculated with a set of rules. However, for traditional methods, colour thresholding is applied to detect tomatoes either from natural or solid background and RGB colour is adjusted to identify ripened tomatoes. This algorithm is shown to be optimally feasible for any micro-controller based miniature electronic devices in terms of its run time complexity of O(n 3) for a traditional method in best and average cases. Comparisons show that the accuracy of the machine learning method is 95%, better than that of the Colour Segmentation Method using MATLAB.
- Author(s): Ram Chandra Barik and Suvamoy Changder
- Source: IET Image Processing, Volume 14, Issue 11, p. 2457 –2468
- DOI: 10.1049/iet-ipr.2019.0527
- Type: Article
- + Show details - Hide details
-
p.
2457
–2468
(12)
In the recent decade, chaos-based image encryption algorithms gained attention due to their pros and cons. The authors suggest one such algorithm of image encryption for different colour space using feed forward initial condition to pursue random permutation by combining 1D logistic map with a series of tent maps. The algorithm adds one more novel step to protect and encrypt binary image by concealing it within a cover grey image using bit-plane decomposition methods. Sensitivity towards initial condition is shown using block division of image by applying a stream of random initial conditions to each block for encryption along with XOR operation which leads the key space high . For better performance analysis, the proposed scheme has been tested with many colour space images including a binary image. The proposed technique is more efficient with respect to time complexity and resistance to vulnerability aspects.
- Author(s): Marwa Qaraqe ; Muhammad Usman ; Kashif Ahmad ; Amir Sohail ; Ali Boyaci
- Source: IET Image Processing, Volume 14, Issue 11, p. 2469 –2479
- DOI: 10.1049/iet-ipr.2019.1051
- Type: Article
- + Show details - Hide details
-
p.
2469
–2479
(11)
The concerns for a healthier diet are increasing day by day, especially in diabetics wherein the aim of healthier diet can only be achieved by keeping a track of daily food intake and glucose-level. As a consequence, there is an ever-increasing need for automatic tools able to help diabetics to manage their diet and also help physicians to better analyse the effects of various types of food on the glucose-level of diabetics. In this paper, we propose an intelligent food recognition and tracking system for diabetics, which is potentially an essential part of a mobile application that we propose to couple food intake with the blood glucose-level using glucose measuring sensors. For food recognition, we rely on several feature extraction and classification techniques individually and jointly using an early and three different late fusion techniques, namely (i) Particle Swarm Optimisation (PSO), (ii) Genetic Algorithms (GA) based fusion and (iii) simple averaging. Moreover, we also evaluate the performance of several handcrafted and deep features and compare the results against state-of-the-art. In addition, we collect a large-scale dataset containing images from several types of local Middle-Eastern food, which is intended to become a powerful support tool for future research in the domain.
- Author(s): Yuqing Peng ; Huifang Tao ; Wei Li ; Hongtao Yuan ; Tiejun Li
- Source: IET Image Processing, Volume 14, Issue 11, p. 2480 –2486
- DOI: 10.1049/iet-ipr.2019.1248
- Type: Article
- + Show details - Hide details
-
p.
2480
–2486
(7)
Gesture is a natural form of human communication, and it is of great significance in human–computer interaction. In the dynamic gesture recognition method based on deep learning, the key is to obtain comprehensive gesture feature information. Aiming at the problem of inadequate extraction of spatiotemporal features or loss of feature information in current dynamic gesture recognition, a new gesture recognition architecture is proposed, which combines feature fusion network with variant convolutional long short-term memory (ConvLSTM). The architecture extracts spatiotemporal feature information from local, global and deep aspects, and combines feature fusion to alleviate the loss of feature information. Firstly, local spatiotemporal feature information is extracted from video sequence by 3D residual network based on channel feature fusion. Then the authors use the variant ConvLSTM to learn the global spatiotemporal information of dynamic gesture, and introduce the attention mechanism to change the gate structure of ConvLSTM. Finally, a multi-feature fusion depthwise separable network is used to learn higher-level features including depth feature information. The proposed approach obtains very competitive performance on the Jester dataset with the classification accuracies of 95.59%, achieving state-of-the-art performance with 99.65% accuracy on the SKIG (Sheffifield Kinect Gesture) dataset.
- Author(s): Alan López-Martinez and Francisco J. Cuevas
- Source: IET Image Processing, Volume 14, Issue 11, p. 2487 –2494
- DOI: 10.1049/iet-ipr.2019.0516
- Type: Article
- + Show details - Hide details
-
p.
2487
–2494
(8)
Within the computer vision field, estimating image vanishing points has many applications regarding robotic navigation, camera calibration, image understanding, visual measurement, 3D reconstruction, among others. Different methods for detecting vanishing points relies on accumulator space techniques, while others employ a heuristic approach such as RANSAC. Nevertheless, these types of methods suffer from low accuracy or high computational cost. To explore a different technique, this paper focuses on improving the efficiency of the metaheuristic search for vanishing points by using a recently proposed population-based method: The Teaching Learning Based Optimisation algorithm (TLBO). The TLBO algorithm is a metaheuristic technique inspired by the teaching–learning process. In our method, the TLBO algorithm is used after a line segment detection, to cluster line segments according to their more optimal vanishing point. Thus, our algorithm detects both orthogonal and nonorthogonal vanishing points in real images. To corroborate the performance of our proposed algorithm, different comparison and tests with other approaches were carried out. The results validate the accuracy and efficiency of our proposed method. Our approach had an average computational time of1.42 seconds and obtained a cumulative focal length error of 1 pixel, and cumulative angular error of 0.1°.
- Author(s): Zhaoxi Wang ; Shengyong Chen ; Rongwei Guo ; Bin Li ; Yangbo Feng
- Source: IET Image Processing, Volume 14, Issue 11, p. 2495 –2502
- DOI: 10.1049/iet-ipr.2019.1016
- Type: Article
- + Show details - Hide details
-
p.
2495
–2502
(8)
Kernel-based extreme learning machine (KELM) solves the problem of random initialisation of extreme learning machine (ELM), and it has a faster learning speed and higher learning accuracy. However, when it comes to a scenario in which the dimensionality of kernel function mapping space is less than the number of samples, the kernel function theoretically cannot be introduced into ELM. To solve this problem, ELM with feature mapping (FM) of kernel function (FM-KELM) is proposed in this study, in which the random FM between the input layer and hidden layer of ELM is replaced with the FM of the kernel function. Moreover, the authors prove that when the regularised parameter C is close to zero, the solution of introduced kernel function is approximately equal to the correct solution. The proposed algorithm is more robust than KELM for the parameter C. Several experimental results show that the proposed algorithm in this study achieves higher classification accuracy without excessive parameter tuning, and the duration of the training and testing process is significantly reduced.
- Source: IET Image Processing, Volume 14, Issue 11, p. 2503 –2511
- DOI: 10.1049/iet-ipr.2019.1189
- Type: Article
- + Show details - Hide details
-
p.
2503
–2511
(9)
Semantic image segmentation treats the issues involved in the object recognition and image segmentation as a combined task. The chief notion of semantic segmentation is to partition the image into visually uniform regions and to discriminate the class of the partitioned regions. Pixel classification is done over the segmented regions by assigning semantic labels. In general, inference frameworks are fed with the combination of low-level features and high-level contextual cues to segment an image. Since these combinations are rarely object consistent, result with minimum classification accuracy because of choosing non-influencing features and cues to track specific objects. To overcome this problem, a nature-inspired meta-heuristic optimization algorithm called Seed Picking Crossover Optimization (SPCO) is proposed to optimize i.e. train the CRF (Conditional Random Field) for choosing relevant feature to segment the object with high accuracy. To meritoriously recognize the objects, a semi-segmentation process is initially performed using Simple Linear Iterative Clustering (SLIC) algorithm. For pixel transformation and pixel association, Dirichlet process mixture model and CRF are employed. Optimized CRFs are used where the parametric optimization is done using the proposed SPCO algorithm. The proposed work results with 84% on classification accuracy and the performance evaluations are done using MSRC-21 dataset.
- Author(s): Meriem Hacini ; Fella Hachouf ; Abdelfatah Charef
- Source: IET Image Processing, Volume 14, Issue 11, p. 2512 –2524
- DOI: 10.1049/iet-ipr.2019.0467
- Type: Article
- + Show details - Hide details
-
p.
2512
–2524
(13)
Fractional computation has been recently designed as a major mathematical tool in image and signal processing fields. This study presents a novel operator established for two-dimensional fractional differentiation. It is developed based on the one-dimensional Charef fractional differentiation extension. A new multi-directional mask is proposed and a new adaptive fractional-order computation is introduced. The proposed method uses the gradient computation properties. It has been applied in edge detection and de-noising problems using real and synthetic images. Obtained results have been compared to those given by integer and fractional useful operators. Results demonstrate that the fractional edge images obtained using the proposed operator has more complete and clear contour information and more abundant texture detail information. The performances have been improved by the proposed method.
- Author(s): Yehu Lv
- Source: IET Image Processing, Volume 14, Issue 11, p. 2525 –2531
- DOI: 10.1049/iet-ipr.2019.0392
- Type: Article
- + Show details - Hide details
-
p.
2525
–2531
(7)
Combining the advantages of the non-local total variation (TV) and the Gabor function, a new Gabor function based non-local TV-Hilbert model is presented to separate the structure and texture components of the image. Computationally, by introducing the dual form of the non-local TV, the authors reformulate the non-local TV-Hilbert minimisation problem into a convex–concave saddle-point problem. In the aspect of solving algorithm, by transforming the Chambolle–Pock's first-order primal–dual algorithm into a different equivalent form. The authors propose a proximal-based primal–dual algorithm to solve the convex–concave saddle-point problem. At last, experimental results demonstrate that the proposed new model outperforms several existing state-of-the-art variational models.
- Author(s): Kshetrimayum Robert Singh and Saurabh Chaudhury
- Source: IET Image Processing, Volume 14, Issue 11, p. 2532 –2540
- DOI: 10.1049/iet-ipr.2019.1055
- Type: Article
- + Show details - Hide details
-
p.
2532
–2540
(9)
Classifications of eight different varieties of rice grain are discussed in this study based on various texture models. Four local texture feature extraction techniques are proposed and three sets of texture features (SET-A, SET-B and SET-C) are formed, for the classification task. Performances of the proposed feature sets are compared with the existing techniques based on, run length matrix, co-occurrence matrix, size zone matrix, neighbourhood grey tone difference matrix and wavelet decomposition, towards classification of rice grain using a back propagation neural network (BPNN). The proposed techniques are also tested against publicly available data from Brodatz's texture data set and their results are compared with other techniques. The classification accuracy by the BPNN classifier is also compared with other statistical classifiers namely, K-nearest neighbour, linear discriminant classifier and Naive Bayes classifier. It is found that, the proposed feature sets yield better classification results on both rice data and Brodatz's data. Results show that, feature SET-B, is able to classify rice grain with an average classification accuracy of 99.63% with a minimum of six features.
- Author(s): Michael Mahesh K and Arokia Renjit J
- Source: IET Image Processing, Volume 14, Issue 11, p. 2541 –2552
- DOI: 10.1049/iet-ipr.2018.6682
- Type: Article
- + Show details - Hide details
-
p.
2541
–2552
(12)
Brain tumour segmentation is the process of separating the tumour from normal brain tissues. A glioma is a kind of tumour, which fires up in the glial cells of the spine or the brain. This study introduces a technique for classifying the severity levels of glioma tumour using a novel segmentation algorithm, named DeepJoint segmentation and the multi-classifier. Initially, the brain images are subjected to pre-processing and the region of interest is extracted. Then, the segmentation of the pre-processed image is done using the proposed DeepJoint segmentation, which is developed through the iterative procedure of joining the grid segments. After the segmentation, feature extraction is carried out from core and oedema tumours using information-theoretic measures. Finally, the classification is done by the deep convolutional neural network (DCNN), which is trained by an optimisation algorithm, named fractional Jaya whale optimiser (FJWO). FJWO is developed by integrating the whale optimisation algorithm in fractional Jaya optimiser. The performance of the proposed FJWO–DCNN with the DeepJoint segmentation method is analysed using accuracy, true positive rate, specificity, and sensitivity. The results depicted that the proposed method produces a maximum accuracy of 96%, which indicates its superiority.
- Author(s): Aoru Xue ; Kai Sheng ; Songming Dai ; Xiaoqiang Li
- Source: IET Image Processing, Volume 14, Issue 11, p. 2553 –2560
- DOI: 10.1049/iet-ipr.2019.1369
- Type: Article
- + Show details - Hide details
-
p.
2553
–2560
(8)
It is well known that the performance of head pose estimation is greatly affected by the bounding box margin of the face and its background. Traditionally, researchers will manually choose a suitable bounding box margin to strike a balance between ensuring sufficient information and minimising background noise. However, head pose estimation is still worse when the background is complex in reality or when the box margin changes slightly. To make estimation results more robust, the authors propose two methods to improve it: (i) a convolutional cropping module that can learn to crop the input image to an attentional area for head pose regression. (ii) Background augmentation that can make the network more robust to the background noise. Rather than using the face landmarking to calculate head pose angles, they use another convolutional neural network to regress the head pose angles, which is independent of the landmark detection results. They evaluate the method on BIWI and AFLW2000 dataset and experimental results show that their approach outperforms many other methods. Besides, they evaluate the method on Pointing'04 dataset using head pose accuracy. Furthermore, the approach is more robust and has a lower variance in realistic scenarios.
- Author(s): Shujin Zhu and Zekuan Yu
- Source: IET Image Processing, Volume 14, Issue 11, p. 2561 –2566
- DOI: 10.1049/iet-ipr.2019.1471
- Type: Article
- + Show details - Hide details
-
p.
2561
–2566
(6)
The guided filter has been acknowledged as an exceptional edge-preserving filter whose output is a locally linear transform of the guidance image. However, the traditional guided filter heavily relies on the guidance image and fails to achieve the desired result when performing image denoising without a clear guidance image. In this study, to address this limitation, the authors propose a simple yet effective guided filter variant for the single image noise removing. They further show that the proposed denoising strategy can be easily realised by using the iterative framework. Moreover, the weak textured patches based image noise estimation is utilised to generate a clear intermediate image which makes the proposed method highly adaptable to the local noise level. Experimental results demonstrate that their proposed algorithm can compete with the state-of-the-art local denoising methods in edge-preserving.
- Author(s): Ruo-Hong Huan ; Luo-Qi Ge ; Peng Yang ; Chao-Jie Xie ; Kai-Kai Chi ; Ke-Ji Mao ; Yun Pan
- Source: IET Image Processing, Volume 14, Issue 11, p. 2567 –2578
- DOI: 10.1049/iet-ipr.2019.0861
- Type: Article
- + Show details - Hide details
-
p.
2567
–2578
(12)
Synthetic aperture radar (SAR) multi-target interactive motion recognition classifies the type of interactive motion and generates descriptions of the interactive motions at the semantic level by considering the relevance of multi-target motions. A method for SAR multi-target interactive motion recognition is proposed, which includes moving target detection, target type recognition, interactive motion feature extraction, and multi-target interactive motion type recognition. Wavelet thresholding denoising combined with a convolutional neural network (CNN) is proposed for target type recognition. The method performs wavelet thresholding denoising on SAR target images and then uses an eight-layer CNN named EilNet to achieve target recognition. After target type recognition, a multi-target interactive motion type recognition method is proposed. A motion feature matrix is constructed for recognition and a four-layer CNN named FolNet is designed to perform interactive motion type recognition. A motion simulation dataset based on the MSTAR dataset is built, which includes four kinds of interactive motions by two moving targets. The experimental results show that the recognition performance of the authors’ Wavelet + EilNet method for target type recognition and FolNet for multi-target interactive motion type recognition are both better than other methods. Thus, the proposed method is an effective method for SAR multi-target interactive motion recognition.
- Author(s): Sunil L. Tade and Vibha Vyas
- Source: IET Image Processing, Volume 14, Issue 11, p. 2579 –2587
- DOI: 10.1049/iet-ipr.2019.1371
- Type: Article
- + Show details - Hide details
-
p.
2579
–2587
(9)
One of the main open challenges in visualisation applications such as cathode ray tube (CRT) monitor, liquid-crystal display (LCD), and organic light-emitting diode (OLED) display is the robustness for high dynamic range (HDR) environs. This is due to the imperfections in the sensor and the incapability to track interest points successfully because of the brightness constancy in visualisation applications. To address this problem, different tone mapping operators are required for visualising HDR images on standard displays. However, these standard displays have different dynamic ranges. Thus, there is a need for a new model to find the best quality tone mapped image for specific kinds of visualisation applications. The authors propose a hybrid deep emperor penguin classifier to accurately classify the tone mapped images for different visualisation applications. Here, a selective deep neural network is trained to predict the quality of a tone-mapped image. Based on this quality, a decision is made as to the suitability of the image for CRT monitor, LCD display or OLED display. Also, they evaluate the proposed model on the TMIQD database and the simulation results prove that the proposed model outperforms the state-of-the-art image quality assessment methods.
- Author(s): Chudan Wu ; Yan Wo ; Guoqing Han ; Zhangyong Wu ; Jiyun Liang
- Source: IET Image Processing, Volume 14, Issue 11, p. 2588 –2596
- DOI: 10.1049/iet-ipr.2018.5716
- Type: Article
- + Show details - Hide details
-
p.
2588
–2596
(9)
Deep neural networks have recently demonstrated high performance for deblurring. However, few methods are designed for both non-uniform image blur estimation and removal with highly efficient. In this study, the authors proposed a fully convolutional network that outputs estimated blur and restored image in one feed-forward pass for the non-uniformly blurred image of any input-size. The proposed network contains two subnets. The parameter estimation subnet P-net predicts pixel-wise parameters of multiple blur types with high accuracy. The output of P-net is used as a condition, which guides the blur removal subnet G-net to restore a high quality latent sharp image. P-net and G-net are ultimately integrated into a single framework called PG-net, which guarantees the consistency of parameter estimation and blur removal, thereby improves algorithm efficiency. Experiment results show that the authors blur parameter estimation method as well as their deblurring method outperforms the comparison methods both quantitatively and qualitatively.
- Author(s): B.V. Rathish Kumar ; Abdul Halim ; Rowthu Vijayakrishna
- Source: IET Image Processing, Volume 14, Issue 11, p. 2597 –2609
- DOI: 10.1049/iet-ipr.2019.0885
- Type: Article
- + Show details - Hide details
-
p.
2597
–2609
(13)
In this study, a fourth-order non-linear partial differential equation (PDE) model together with multi-well potential has been proposed for greyscale image segmentation. The multi-well potential is constructed from the histogram of the given image to make the segmentation process fully automatic and unsupervised. Further, the model is refined for effective segmentation of noisy greyscale image. The fourth-order anisotropic term with the multi-well potential is shown to properly segment noisy images. Fourier spectral method in space with semi-implicit convexity splitting in time is used to derive an unconditionally stable scheme. Numerical studies on some standard test images and comparison of results with those in literature clearly depict the superiority of the anisotropic variant of the non-linear PDE model.
- Author(s): Yunsong Zheng ; Hangbin Tong ; Teng Zhao ; Xiaoxia Guo ; Hui Xu ; Ruwu Yang
- Source: IET Image Processing, Volume 14, Issue 11, p. 2610 –2615
- DOI: 10.1049/iet-ipr.2019.1108
- Type: Article
- + Show details - Hide details
-
p.
2610
–2615
(6)
The brain avatar of schizophrenic patients is different from the normal human brain avatar, and it is difficult to overcome the complex environmental effects of the brain through traditional magnetic resonance imaging (MRI). In order to improve the accuracy of MRI in detecting brain information in patients with schizophrenia, this study is based on the support vector machine classification algorithm and combined with multimodal MRI detection method to construct a detection model suitable for patients with schizophrenia. In addition, this study combines the existing test cases to divide the brain into regions and design a comparative experiment to study the accuracy of the model proposed in this study. Finally, the study draws the results by sub-regional comparison. Studies have shown that the algorithm model of this study has certain effects on brain detection in patients with schizophrenia, and can be applied to practice, and can provide theoretical reference for subsequent related research.
- Author(s): Kamini Upadhyay ; Monika Agrawal ; Praveen Vashist
- Source: IET Image Processing, Volume 14, Issue 11, p. 2616 –2625
- DOI: 10.1049/iet-ipr.2019.0969
- Type: Article
- + Show details - Hide details
-
p.
2616
–2625
(10)
Blood vessel segmentation is a vital step in automated diagnosis of retinal diseases. Some retinal diseases progress with structural changes in the vessels whereas in others, vessels may remain unaffected. Segmentation of vessels is inevitable in both the cases. The extracted vessel map can be studied for these structural changes or can be removed to highlight other abnormalities of the retina. This study presents a rule-based retinal blood vessel segmentation algorithm. It implements two multi-scale approaches, local directional-wavelet transform and global curvelet transform, together in a novel manner for vessel enhancement and thereby segmentation. The authors have proposed a generic field-of-view mask for extraction of region-of-interest. Further, a morphological thickness-correction step, to recover vessel-boundary pixels, is also proposed. The significant contribution of this work is, segmentation of fine vessels while preserving the thickness of major vessels. Moreover, the algorithm is robust, as it performs consistently well, on four public databases, DRIVE, STARE, CHASE_DB-1 and HRF. Performance of the proposed algorithm is evaluated in terms of eight measures : accuracy, sensitivity, specificity, precision, F-1 score, G-mean, MCC and AUC, where it has outperformed many other existing methods. Zero data dependency gives the suggested algorithm, an edge over other state-of-the-art supervised methods.
Symbol positions-based Slepian–Wolf coding with application to distributed video coding
Novel image encryption by combining dynamic DNA sequence encryption and the improved 2D logistic sine map
Background subtraction using infinite asymmetric Gaussian mixture models with simultaneous feature selection
Robust palmprint identification using efficient enhancement and two-stage matching technique
Conditional semi-fuzzy c-means clustering for imbalanced dataset
Physics-based dynamic texture analysis and synthesis model using GPU
Sparse representation based computed tomography images reconstruction by coupled dictionary learning algorithm
Feature channel enhancement for crowd counting
Approach to model human appearance based on sparse representation for human tracking in surveillance
Multi-head mutual-attention CycleGAN for unpaired image-to-image translation
New flexible directional filter bank by tuning Hermite transform parameters for content based medical image retrieval
E2-capsule neural networks for facial expression recognition using AU-aware attention
ResDNN: deep residual learning for natural image denoising
Acceleration of multi-task cascaded convolutional networks
Detection, quantification and classification of ripened tomatoes: a comparative analysis of image processing and machine learning
Perceptual accessible image encryption scheme conjugating multiple chaotic maps
Automatic food recognition system for middle-eastern cuisines
Dynamic gesture recognition based on feature fusion network and variant ConvLSTM
Vanishing point detection using the teaching learning-based optimisation algorithm
Extreme learning machine with feature mapping of kernel function
Seed picking crossover optimisation algorithm for semantic segmentation from images
A bi-directional fractional-order derivative mask for image processing applications
Structure–texture image decomposition using a new non-local TV-Hilbert model
Comparative analysis of texture feature extraction techniques for rice grain classification
DeepJoint segmentation for the classification of severity-levels of glioma tumour using multimodal MRI images
Robust landmark-free head pose estimation by learning to crop and background augmentation
Self-guided filter for image denoising
SAR multi-target interactive motion recognition based on convolutional neural networks
Hybrid deep emperor penguin classifier algorithm-based image quality assessment for visualisation application in HDR environments
Non-uniform image blind deblurring by two-stage fully convolution network
Higher order PDE based model for segmenting noisy image
Support vector machine classification combined with multimodal magnetic resonance imaging in detection of patients with schizophrenia
Unsupervised multiscale retinal blood vessel segmentation using fundus images
Most viewed content
Most cited content for this Journal
-
Medical image segmentation using deep learning: A survey
- Author(s): Risheng Wang ; Tao Lei ; Ruixia Cui ; Bingtao Zhang ; Hongying Meng ; Asoke K. Nandi
- Type: Article
-
Block-based discrete wavelet transform-singular value decomposition image watermarking scheme using human visual system characteristics
- Author(s): Nasrin M. Makbol ; Bee Ee Khoo ; Taha H. Rassem
- Type: Article
-
Classification of malignant melanoma and benign skin lesions: implementation of automatic ABCD rule
- Author(s): Reda Kasmi and Karim Mokrani
- Type: Article
-
Digital image watermarking method based on DCT and fractal encoding
- Author(s): Shuai Liu ; Zheng Pan ; Houbing Song
- Type: Article
-
Tomato leaf disease classification by exploiting transfer learning and feature concatenation
- Author(s): Mehdhar S. A. M. Al‐gaashani ; Fengjun Shang ; Mohammed S. A. Muthanna ; Mashael Khayyat ; Ahmed A. Abd El‐Latif
- Type: Article