Intelligent Multimedia Processing and Computer Vision: Techniques and applications

2: Center for Research in Computer Vision (CRCV), University of Central Florida, USA
3: ABV-Indian Institute of Information Technology & Management (IIITM), India
Intelligent multimedia involves the computer processing and understanding of perceptual input from speech, text, videos and images. Reacting to these inputs is complex and involves research from engineering, computer science and cognitive science. Intelligent multimedia processing deals with the analysis of images and videos to extract useful information for numerous applications including medical imaging, robotics, remote sensing, autonomous driving, AR/VR, law enforcement, biometrics, multimedia enhancement and reconstruction, agriculture, and security. Intelligent multimedia processing and computer vision have seen an upsurge over the last few years. With the increasing use of intelligent multimedia processing techniques in various sectors, the requirement for fast and reliable techniques to analyse and process multimedia content is increasing day by day.
Intelligent Multimedia Processing and Computer Vision: Techniques and applications reviews cutting edge research in the areas of intelligent multimedia processing and computer vision techniques and applications with a particular emphasis on interdisciplinary approaches and novel solutions. The book is aimed at practicing engineers, scientists, technology professionals, researchers and advanced students in the fields of multimedia processing and security, image processing, computer vision, biometrics, intelligent and smart technologies, machine learning and deep learning, and autonomous systems.
- Book DOI: 10.1049/PBPC064E
- Chapter DOI: 10.1049/PBPC064E
- ISBN: 9781839537257
- e-ISBN: 9781839537264
- Page count: 369
- Format: PDF
-
Front Matter
- + Show details - Hide details
-
p.
(1)
-
1 Introduction
- + Show details - Hide details
-
p.
1
–6
(6)
Multimedia stands as one of the most demanding and exciting aspects of the information era. The processing of multimedia information has been an active research area contributing to many frontiers of today's science and technology as well as many real-world applications. Traditional multimedia and intelligent multimedia are two different areas of multimedia.
Traditional media encompasses the display of images, graphics, audio, and video with possibly touch and virtual reality (VR) linked in. Intelligent multimedia involves computer processing and the understanding of perceptual input from speech, text, and images. Reacting to these inputs is much more complex and involves research from engineering, computer science, and cognitive science. This is the newest area in multimedia research that has seen an upsurge over the last few years and the one where most organizations, universities, and R&D agencies do not have proper expertise.
With increasing use of intelligent multimedia processing techniques in various fields, the requirement for fast and reliable techniques to analyze and process multimedia content for various purposes is also increasing day to day. For this purpose, artificial intelligence (AI) and machine learning (ML) techniques have been gaining prominence in recent years. This book sheds light on different AI and ML techniques used for intelligent multimedia processing and analysis. Multimedia processing deals with the analysis of images and videos to extract useful information regarding numerous applications, including medical imaging, robotics, remote sensing, autonomous driving, augmented reality/VR, law enforcement, biometrics, multimedia enhancement and reconstruction, agriculture, and security.
This book presents state-of-the-art research in various fields of multimedia processing and computer vision along with the applications of AI, ML, and deep learning (DL) to perform various processing tasks in the abovementioned areas. This book also provides a detailed discussion of the latest trends in processing tools required for computer vision applications. This is an attempt to provide a practical and an adequate platform for researchers and practitioners from all over the world working in the fields of image processing, biometrics, computer vision, ML, and DL.
This book covers cutting-edge research from both academia and industry with a particular emphasis on interdisciplinary approaches, novel techniques, and solutions to provide intelligent multimedia for potential applications. We first cover recent trends, new concepts, and state-of-the-art approaches in the field of multimedia information processing for various emerging applications. We end the book with a chapter on future perspectives and research directions. A brief discussion of the major topics covered in this book is given below.
-
2 State-of-the-art analysis of deep learning techniques for image segmentation
- + Show details - Hide details
-
p.
7
–27
(21)
Image segmentation is an important computer vision problem that captures a particular region of interest by leveraging per-pixel image classification. It has significant applications in various fields, such as biomedical engineering, remote sensing, autonomous driving, etc. There has been a gradual evolution towards deep learning image segmentation methods, including convolutional neural networks, recurrent neural networks, encoder-decoder architectures, and generative adversarial networks. This chapter provides a description of deep learning image segmentation networks, offering insights into their architectural details, performance assessment metrics, advantages, and disadvantages.
-
3 Biometric-based computer vision for boundless possibilities: process, techniques, and challenges
- + Show details - Hide details
-
p.
29
–58
(30)
Computer vision (CV) is a part of artificial intelligence (AI) that helps computers to interpret, understand and develop intuition about the real-world objects and scenes to annotate, classify and identify them with accurate precision. CV techniques have been gaining popularity since its inception and becoming a fundamental part of technological development and digital transformation. To enhance the domain knowledge of CV in the area of human identification and the use of the intrinsic properties of image interpretation and understanding biometrics systems have been proposed with several behavioral and physiological body evidences. Biometrics recognition employs various aspects of AI to enable a computer system to recognize a biometric pattern for identification purposes. It is an inevitable part of multiple applications, such as border control cyber security, 3D faces modeling and recognition, intelligent video surveillance, finger vein recognition, and forensic biometrics where vision techniques have been integrated with and instilled into the biometrics systems in order to perform the desired tasks. The main objective of this chapter is to introduce biometric-based CV and discuss the essential components of biometrics technologies for CV. The discussion also includes different processes, state-of-the-art techniques, challenges of biometric-based CV, application areas, the selection criteria of suitable biometrics, and the future of biometric-based CV applications.
-
4 Channel refinement of fingerprint pre-processing models
- + Show details - Hide details
-
p.
59
–104
(46)
Deep models are the state-of-the-art models for fingerprint pre-processing. However, these models have very high number of parameters, usually in millions. As a result, redundancy is observed among the features learnt by deep learning-based fingerprint pre-processing models. A popular technique to help deep models learn distinct and informative features is channel refinement. A recent study has illustrated the capability of channel refinement to improve generalization of fingerprint enhancement models. Motivated by the above-mentioned study, this chapter delves into presenting a detailed study illustrating the usefulness of channel refinement in reducing redundancy and imparting generalization ability to fingerprint enhancement models. Furthermore, we extend this study to assess whether channel refinement generalizes on fingerprint region of interest (ROI) segmentation. Extensive experiments on 14 challenging publicly available fingerprint databases and a private database of fingerprints of the rural Indian population are conducted to assess the potential of channel refinement on fingerprint pre-processing models.
-
5 A review of deep learning approaches for video-based crowd anomaly detection
- + Show details - Hide details
-
p.
105
–127
(23)
In recent years, the video surveillance system has gained huge demand in public and private places to provide security and safety. Video-based crowd anomaly detection (VCAD) is one of the crucial applications of a surveillance system whose timely detection and localization can prevent massive loss of public or private properties and the lives of many people. Crowd anomalies or abnormal activities can be defined as irregular activities that deviate from normal crowd behavior patterns. Some abnormal activities in crowd scenes are panic, fights, stampedes, congestion, riots, and abandoned luggage, whose real-time detection is paramount. The crowd anomaly detection (CAD) becomes a more challenging task due to the dynamic nature of the crowd, the effect of the cluttered background, daylight changes, shape variation due to perspective distortion, and lack of large-scale ground-truth crowd datasets. Both conventional machine learning and deep learning approaches have been explored to provide different solutions for CAD. The current research trend shows the vast development of deep-learning approaches for CAD. However, state-of-the-art reviews still need to address the comprehensive analysis of deep models, performance evaluation methodologies, open issues, and challenges for VCAD. Therefore, the main objective of this review is to provide an insightful analysis of several deep models for VCAD, their comparative analysis on different datasets based on various performance metrics, and to discuss future research scope for VCAD.
-
6 Natural language and mathematical reasoning
- + Show details - Hide details
-
p.
129
–158
(30)
Every natural language presents certain regularities that have been studied for years. From statistical approaches to machine learning, many authors have found that automatic processing is far more complex than any other brain production. Despite the current status in the field, many applications like chatter-bots, speech recognition, and sentiment analysis, show that there is still an interesting gap between analysis and production of sentences. Despite the solutions used nowadays, the deep essence of linguistic reasoning dynamics keeps mostly not revealed, and many of the current approaches involve the conception of restricted patterns for human linguistic reactions, usage, and interpretation. This chapter presents a perspective of this problem centered in the idea followed by many authors that considers the brain as a complex device working under some kind of fractal rules, and deeply related to entropy. As part of the scope, there is a short review of some of the most paradigmatic proposals of language mathematical descriptions related to entropy and fractals, the relationship between them in this context and a summary of previous publications of the author that aim to reveal the deep nature of natural language, modeling linguistics-related brain activity.
-
7 AI and machine learning in medical data processing
- + Show details - Hide details
-
p.
159
–172
(14)
A seizure is defined as a sudden synchronous activity of group of neurons causing sudden movement of the body. Nearly 10 million people from India are suffering from epilepsy. Electroencephalogram (EEG) is a non-invasive technique to measure the neural activity of brain. EEG signal processing and speech signal processing have applications in seizure detection. Sudden neural activity in the brain is reflected in the EEG signal and is processed using machine learning and deep learning techniques for efficient seizure detection. This chapter gives an overview of different speech processing and signal processing techniques for seizure detection. Deep learning and machine learning techniques are implemented and the results are discussed in this chapter. Different techniques are compared to give a future direction to the researcher to work in this field. Long short-term memory (LSTM) network model is applied for seizure detection and the results are discussed in this chapter.
-
8 Progress of deep learning in digital pathology detection of chest radiographs
- + Show details - Hide details
-
p.
173
–206
(34)
Chest radiographs are one of the primary diagnostic medical imaging modalities in the present clinical medicine. Compared to other medical imaging techniques, this non-invasive imaging modality is cost-effective. As a result, improving the radiography modality-based computer-aided diagnostic methods is a fruitful approach for obtaining reliable diagnostic results. Also, it facilitates a wider clinical community around the globe, especially in low-income countries. Recently, deep learning (DL) has led to a promising performance in pathology detection in chest radiography used for the diagnosis of cancers, respiratory diseases and some infectious diseases. As a result, various DL applications have been proposed for image enhancement, object detection and segmentation, localization and image generation. Therefore, the prime aim of this chapter is to critically review those recent applications to determine the performance gain, technical challenges and future research trends.
-
9 Computer vision and modern machine learning techniques for autonomous driving
- + Show details - Hide details
-
p.
207
–253
(47)
Human vision is one of the most important senses for receiving visual perception. When humans as the most intelligent living creatures look at a scene, they always perform a feature extraction from the information of that scene in order to understand its content. Using the extracted features, humans pay attention to part of the scene that contains more valuable information and then turn their gaze to other parts of it, until they have analyzed all the relevant information. This is a natural and instinctive behavior of humans to gather information from the scenery and the surrounding environment and it happens very quickly. Understanding the content of images in computer vision is not as fast as the human understanding of the observer's scenery, but over many years, there has been an effort to increase the accuracy and speed of computer vision by imitating the behavior of human vision. In recent years, due to significant advances in artificial intelligence and the emergence of deep learning, we are witnessing an increasing growth in computer vision and its related areas, including autonomous driving. Recent works have demonstrated the incredible successes of computer vision and deep learning algorithms in various domains including autonomous driving and robotics. In this chapter, we provide a detailed review of the state-of-the-art computer vision techniques for self-driving cars and some recent research advances in this field. After perceiving the challenges of autonomous driving, we concentrate on five perspectives of autonomous driving from visual perception and computer vision viewpoints. These include: (a) object detection, (b) object tracking, (c) segmentation, (d) deep reinforcement learning, and (e) 3D scene analysis. Each of these viewpoints are then analyzed and evaluated via computer vision techniques using some datasets about object detection, object tracking, segmentation, lane detection, and localization as well as mapping in computer vision. Moreover, we introduce sensors as well as CARLA simulator. Finally, we provide a comprehensive analysis of the localization, mapping, and simultaneously both of them related to computer vision for autonomous driving systems (ADS).
-
10 Dehazing and vision enhancement: challenges and future scope
- + Show details - Hide details
-
p.
255
–276
(22)
Poor visibility of outdoor images has been drastically increased. Applications using computer vision, including surveillance systems, intelligent transportation systems, are not able to function properly due to limited visibility. Numerous image dehazing methods have been introduced as a solution to this problem, and they are crucial in enhancing the functionality of several computer vision systems. The dehazing approaches are intriguing to researchers as a consequence. In order to demonstrate that dehazing techniques could be successfully used in actual practice, this study conducts an extensive examination of the state-of-the-art dehazing approaches. In contrast, it motivates scholars to apply some of these methods for removing haze from hazy images. In this chapter, we discuss several robust mathematical models along with some neural network-based approaches and their implementations in various aspects. Finally, we address several concerns about difficulties and potential future applications of dehazing approaches.
Due to poor visibility conditions, the visibility of outdoor images is drastically decreased. Applications using computer vision, including surveillance systems, intelligent transportation systems, etc., are not able to function properly due to limited visibility. Numerous image dehazing methods have been introduced as a solution to this issue, and they are crucial in enhancing the functionality of several computer vision systems. The dehazing approaches are intriguing to researchers as a consequence. In order to demonstrate that dehazing techniques could be successfully used in actual practice, this study conducts an extensive examination of the state-of-the-art dehazing approaches. In contrast, it motivates scholars to apply some of these methods for removing haze from hazy images. We keep an eye on the robust mathematical models along with some neural network-based approaches and their implementations in various aspects. Finally, we address several concerns about difficulties and potential future applications of dehazing approaches.
-
11 Machine learning and revolution in agriculture: past, present and future
- + Show details - Hide details
-
p.
277
–298
(22)
Agriculture is one of the potential parameters in the economic sector. Traditional modes of farming are not able to meet the growing need of food as the population is increasing rapidly. Agricultural automation is very much essential to meet the supply-demand requirement of food and to minimize the employment issue and problems of food security. The introduction of artificial intelligence (AI) in agriculture has brought revolution by improving the overall accuracy and harvest quality, detecting pests and diseases in plants using applications like drones, smart monitoring systems and robots. Agricultural AI bots can harvest crops in a fast manner and in higher volume, which reduces the need of workers in higher numbers. Machine learning (ML), a subdomain under the umbrella of AI, is also used to capture the quality of seeds, pruning, parameters of soil, application of fertilizer and environmental conditions. In addition, using AI and ML, farmers can solve other challenges like forecasting crop prices, market demand analysis, finding optimal time and conditions for harvesting and sowing, nutrient deficiencies in soil, weight-diet balance using weight prediction systems. Using predictive analysis, ML techniques help to predict right genes for different weather conditions and to reduce chances of crop failures. The aim of this chapter is to provide an insight into the effectiveness of introducing ML in the field of agricultural applications. This chapter includes present scenarios of agricultural need, challenges, application of ML techniques and future development of agricultural applications using ML.
-
12 AI- and ML-based multimedia processing for surveillance
- + Show details - Hide details
-
p.
299
–314
(16)
Nowadays, surveillance systems are yielding the most critical and large volumes of data in the world from various sources. These data require proper management and analysis to produce relevant security information for modern security operations. However, it is still a challenge for humans to vigilantly monitor these large volumes of surveillance data for security assurance. Considering the upsurge in smart technologies, such as artificial intelligence (AI), machine learning (ML), deep learning (DL), and much more, the present security systems can be equipped with these technologies to radically increase the efficacy of the surveillance systems. The self-learning capabilities of AI and ML technologies make great impact on surveillance systems. This chapter comprehensively discusses the possible amalgamation of AI and ML technologies with the modern surveillance systems that will give them a technological edge. It also discusses all new findings based on object detection, visual sentiment analysis, video analytics, vehicle analytics, tracking people for potential crimes to ensure security for the society. Moreover, the specific types of AI and ML surveillance infrastructure being deployed are also discussed.
-
13 Action recognition techniques
- + Show details - Hide details
-
p.
315
–334
(20)
Due to the common availability of surveillance as well as mobile camera, data generation in the form of images and videos has increased a lot. This in turn increased the requirement of data storage and, so manual analysis and interpretation is almost impracticable. Researchers have been working on the development of efficient as well as high-speed technologies capable of intelligent analysis and interpretation of visual data.
Video surveillance is in practice since long, but accuracy and high speed in detection and recognition still remain to be a challenge. This includes a variety of areas of research such as:
Pattern recognition: This aims at classifying the objects or data in the input images based on their inherent characteristic features.
Object tracking: It aims at tracking the object's location and movements in video sequences to monitor its activities and behavior.
Reconstruction: It typically aims at building two-dimensional or three-dimensional models of objects for advanced feature detailing.
Feature extraction: It is the process of reducing the dimensionality of the input image by extracting and selecting a subset of salient features representing the input data.
Segmentation: This provides for the extraction of the region of interest (ROI) from the gallery image based on different features. It divides the probe image into nonoverlapping areas, namely ROI areas and non-ROI areas.
Action recognition is a basic task in computer vision that focuses on identifying and understanding human activities from video stream. It is important for various applications, including video surveillance (static or dynamic), human-computer interaction (HCI), sports video analysis, and autonomous vehicles. Human action recognition covers an extremely large number of research topics in computer vision and has a wide range of applications in visual surveillance. Action recognition in visual surveillance refers to the process of automatically analyzing and understanding human actions and activities captured by surveillance cameras or video feeds. This plays a crucial role in various applications, such as security monitoring, anomaly detection, and behavior understanding.
The following sections present a review of action recognition in visual surveillance, including its challenges, techniques, and recent advancements.
-
14 Conclusion
- + Show details - Hide details
-
p.
335
–338
(4)
This book presents state-of-the-art research in various fields of multimedia processing and computer vision along with the applications of artificial intelligence, machine learning, and deep learning to perform various processing tasks in numerous applications, including medical imaging, robotics, remote sensing, autonomous driving, law enforcement, biometrics, multimedia enhancement and reconstruction, agriculture, and security. The book also provides a detailed discussion of the latest trends in processing tools required for computer vision applications. This is an attempt to provide a practical and adequate platform for researchers and practitioners from all over the world working in the fields of image processing, biometrics, computer vision, machine learning, and deep learning.
This book covered cutting-edge research from reputed research and academic organizations with a particular emphasis on interdisciplinary approaches, novel techniques, and solutions to provide intelligent multimedia for potential applications. We first cover recent trends, new concepts, and state-of-the-art approaches in the field of multimedia information processing for various emerging applications. We end the book with a chapter on future perspectives and research directions. A brief discussion of the major topics covered in this book is given below. The chapter-wise conclusion of this book is as follows.
-
Back Matter
- + Show details - Hide details
-
p.
(1)
Related content
