Streaming Analytics: Concepts, architectures, platforms, use cases and applications
2: Government Arts and Science College, India
3: Department of Computer Science and Engineering, School of Computing, Vel Tech Rangarajan Dr. Sagunthala R&D Institute of Science and Technology, India
4: Department of Computer Science, Brunel University, UK
When digitized entities, connected devices and microservices interact purposefully, we end up with a massive amount of multi-structured streaming (real-time) data that is continuously generated by different sources at high speed. Streaming analytics allows the management, monitoring, and real-time analytics of live streaming data. The topic has grown in importance due to the emergence of online analytics and edge and IoT platforms. A real digital transformation is being achieved across industry verticals through meticulous data collection, cleansing and crunching in real time. Capturing and subjecting those value-adding events is considered to be the prime task for achieving trustworthy and timely insights.
The authors articulate and accentuate the challenges widely associated with streaming data and analytics, describe data analytics algorithms and approaches, present edge and fog computing concepts and technologies and show how streaming analytics can be accomplished in edge device clouds. They also delineate several industry use cases across cloud system operations in transportation and cyber security and other business domains.
The book will be of interest to ICTs industry and academic researchers, scientists and engineers as well as lecturers and advanced students in the fields of data science, cloud/fog/edge architecture, internet of things and artificial intelligence and related fields of applications. It will also be useful to cloud/edge/fog and IoT architects, analytics professionals, IT operations teams and site reliability engineers (SREs).
Inspec keywords: learning (artificial intelligence); cloud computing; data analysis; Big Data; Internet of Things
Other keywords: big data; business data processing; internet of things; data analysis; unsupervised learning; data mining; pattern classification; Internet; cloud computing; learning artificial intelligence
Subjects: Internet software; Mobile, ubiquitous and pervasive computing; Information networks; Data handling techniques; Data security; General and management topics; Unsupervised learning
- Book DOI: 10.1049/PBPC044E
- Chapter DOI: 10.1049/PBPC044E
- ISBN: 9781839534164
- e-ISBN: 9781839534171
- Page count: 468
- Format: PDF
-
Front Matter
- + Show details - Hide details
-
p.
(1)
-
1 Streaming data processing - an introduction
- + Show details - Hide details
-
p.
1
–12
(12)
This chapter presents an overview about the basics of streaming data, core components of streaming data processing architecture, challenges associated with stream processing and recent tools for stream processing.
-
2 Event processing platforms and streaming databases for event-driven enterprises
- + Show details - Hide details
-
p.
13
–38
(26)
The Internet of things (IoT) devices and sensors are found everywhere generating tons of multi-structured data every second. These tiny, trendy, networked and embedded computers communicate and correspond to produce a mammoth amount of poly-structured data. The real beauty is here. When the generated data gets collected, cleansed, and subjected to a variety of deeper and decisive investigations, it is possible to extract and emit out actionable insights out of voluminous IoT data. The discovered knowledge comes handy for product, solution and service that are provided to visualize and realize a plethora of next-generation, multifaceted, state-of-the-art, and intelligent devices, systems, and networks. These smart entities and elements are deployed in critical junctions and environments ranging from industrial plants, manufacturing floors, retail stores, airports to people residences to derive people-centric, event-driven, service-oriented, knowledge-filled, process-optimized, mission-critical, situation-aware, time-sensitive and composite services, and applications.
Thus, the world is tending to be deeply and decisively connected and cognitive. Such an extreme integration results in streams of event data and messages. To bring forth premium use cases, businesses have to be event streaming, storage, and processing systems in place to readily make sense out of the event data. The event-driven architecture (EDA) style toward event-driven applications is maturing and stabilizing. This chapter is dedicated to convey how streaming databases combine well with other EDA components to build and run futuristic streaming applications and services.
-
3 A survey on supervised and unsupervised algorithmic techniques to handle streaming Big Data
- + Show details - Hide details
-
p.
39
–63
(25)
The more data an association has, the more troublesome it is to process, store, and break down; however, on the other hand, the more data the association has, the more precise its expectations can be. Too Big Data accompanies big duty. Big Data computing is typically classified into two sorts based on the process necessities, which are Big Data batch computing and Big Data stream computing. Big Data requires military-grade encryption keys to keep data sheltered and private. This is the place data science comes in. Numerous associations, confronted with the issue of having the option to gauge, channel, and dissect data, are going to data science for arrangements - recruiting data researchers, individuals who are authorities in seeming well and good out of a tremendous measure of data. By and large, this implies utilizing measurable models to make calculations to sort, characterize, and process data. In this paper, an audit of different algorithms essential for dealing with such enormous data streams for classification and clustering is given. These algorithms give us different techniques executed to deal with Big Data.
-
4 Sentiment analysis on streaming data using parallel computing
- + Show details - Hide details
-
p.
65
–75
(11)
Today as everything has become online and everyone has the chance to voice their opinion. Every opinion is important for the success of the company. Therefore, every company has started giving more and more important to this sentiment analysis, thereby making sentiment analysis a huge field in itself and a hot research topic in the field of natural language processing and linguistic communication. As more and more people are having access to the Internet, the Internet is filled with opinions about products, which has led to the data explosion and big data. And as we have to now do the analysis of this large amount of data, we have to find out some mechanism to make this process of analysis faster and more efficient. Parallel computing comes to the rescue in this case. Parallel computing has been a topic of research for a lot of years now. We can use this in the field of sentiment analysis as well. This is what is done in this work. The various ways in which sentiment analysis can be done using parallel computing are compared in terms of efficiency and time taken to compute. Streaming data has become the trend today as data is continuously being added to the Internet. So performing sentiment analysis on streaming data is much more helpful using parallel computing.
-
5 Fog and edge computing paradigms for emergency vehicle movement in smart city
- + Show details - Hide details
-
p.
77
–100
(24)
At the point when things are associated with the cloud to deal with their information in a brought together manner, a few difficulties become basic. It does not generally bode well to move all the information to the cloud and there are a few situations where reaction time is basic. In those cases, conveying computational limit is the arrangement, and there are two fundamental methods of doing that utilizing edge or fog figuring.
This book chapter discusses introduction, background study, overview, comparison, benefits, use cases, challenges, future of edge, and fog compute model along with a case study of streaming analytics in big data for emergency vehicles movement in smart city traffic management and conclusion.
Controlling traffic signal in favor of emergency vehicles like ambulances is the need of the hour with an increase in the number of COVID cases. It will help the patients reach hospitals in time and save their lives. A case study based on signal-free movement of the emergency vehicles by using fog and edge computing paradigms with streaming analytics in big data is explored in this book chapter.
-
6 Real-time stream processing on IoT data for real-world use cases
- + Show details - Hide details
-
p.
101
–120
(20)
It is indisputably clear that real-time stream processing of IoT data can result in a series of personal, professional, and social use cases. In this chapter, we have explained how real-time processing of IoT data may lead to a series of real-world premium services and applications across industry verticals including manufacturing.
Databases have been an extremely important part of application development, irrespective of how they store data. They follow a paradigm where data is passively stored, waiting for commands from an external part to read or modify the data. Basically, these applications are CRUD based with business logic added on top of a process run by humans through a user interface (UI). A problem with these CRUD style applications is that they commonly lead to an infrastructure with lots of ad hoc solutions using messaging systems, ETL products and other techniques for integrating applications in order to pass data between them. Often code is written for specific integrations, and all this causes a mess of interconnections between applications in an organization.
It is better to move away from relying on humans working through a UI to a platform that is able to trigger actions and react on things happening in software. The solution leverages events and event streams. They represent a new paradigm where a system is built to support a flow of data through the business and reacting in real-time to the events occurring. The core idea is that an event stream is a true record of what has happened. Any system or application can read the stream in real time and react on each event. This compares with a central nervous system but for a software defined company, a digital organization needs the software equivalent to a nervous system for connecting all its systems, applications, and processes.
For this to work, we have to treat the streams of everything that is happening within an organization as data and enable continuous queries that process this data. In a traditional database, the data sits passively and an application or a user issues queries to retrieve data. In stream processing, this is inverted. Data is an active and continuous stream of events and queries are passive, reacting to and processing the events in the stream as they arrive. Basically, this is a combination of data storage and processing in real time. This is a fundamental change in how applications are built.
Other experts however believe that there are benefits to using more types of messages besides events. With only events available, they are also used to indirectly request something to happen in another service. This often increases the coupling between services and can create a very entangled choreography. Therefore, besides events, there is a need for two other types of messages: Commands, which represent an intent to change something, and Queries to fulfill a need for information.
-
7 Rapid response system for road accidents using streaming sensor data analytics
- + Show details - Hide details
-
p.
121
–143
(23)
Every year, approximately 1.35 million deaths occur due to car accidents across the globe. There are about 1 in 106 chances of an average person dying in a car accident. Even though India mostly relies on traveling by road transport, it is the least safe mode in the country right now. This work aims to prevent loss of life by providing a rapid response system for road accidents. Many people lose their lives to road transport accidents. Most of these deaths occur due to the patient not reaching a hospital on time or ambulances not reaching them on time. A large number of these fatalities can be prevented by medical help reaching them in the immediate and crucial amount of time after the accident. Currently, the victims depend on another human contacting the helpline number to request assistance. This situation can be improved by developing a more efficient system through automation. This work focuses on designing an automatic rapid response system for such cases, which uses various sensors to detect whether the vehicle has met with an accident (or even is in flames) and sends the location of the vehicle to the nearest fire station and hospital. All of this, including the choice of the fire station and hospital, is automated. Most of these sensors are inexpensive and some are even already found in the vehicles of today. Hence, multiple lives of accident victims can be saved through the proper implementation of this work.
-
8 Applying streaming analytics methods on edge and fog device clusters
- + Show details - Hide details
-
p.
145
–162
(18)
IoT consists of millions of devices that range from voice assistants to smart meter, in store beacons, any touchable devices, etc., and these IoT devices are flooding the world. A study report says that it is expected to touch around 75 billion IoT devices by 2025. Huge amounts of data are being collected from all these devices and the problem is what to do with this data and how to analyse the data? The two possible solutions are edge computing and fog computing. Both the technologies leverage the computing capabilities to the local network. So that it is easy to carry out the heavy computation tasks locally within the network and this image is equal to the work carried out in cloud. Data can be processed in a fog node or in the IoT gateway which is located within the LAN in fog computing. Data can be processed in sensor or on any device itself in edge computing without transferring the data to anywhere. In such a way, the streaming analytics platform helps to build a model that helps to collect and analyse the data to infer the useful findings within an IoT device. Our proposed chapter will outline the contribution of edge and fog computing in streaming analytics or frameworks.
-
9 Delineating IoT streaming analytics
- + Show details - Hide details
-
p.
163
–190
(28)
With the massive surge in the number of developed and deployed IoT devices and sensors in mission-critical environments such as smart homes, hospitals, hotels, warehouses, retail stores, railway stations, ports, etc., the speed, size, scope and structure of IoT data is evolving fast. In this chapter, we specifically focused on capturing, cleaning and crunching IoT streaming data in real time to extract actionable insights for producing real-time and real-world IT and business services and applications.
Streaming data has become an important component of worldwide business houses. The enterprise data architecture has to take this trend into account in order to be futuristic and flexible. Due to the exponential growth of Internet of Things (IoT) devices and sensors, enterprises are keenly strategizing afresh to accommodate IoT data. Typically, IoT data and other data sources such as web applications, security logs, and device interactions are streaming data. Meticulously receiving and subjecting streaming data deftly and quickly to make sense/money out of it are being seen as a viable activity for any enterprise to keep up its edge over its competitors. Analysis of streaming data gives a huge understanding into timely and trustworthy insights for enterprises. Decision-makers and other senior management people can take both tactical and strategically sound decisions with all the clarity and confidence.
In this chapter, we are going to dig deeper in order to explain how all sorts of IoT streaming data can be subjected to a variety of studies with the intention of emitting out actionable insights. Also, how the knowledge discovered gets disseminated to participating and contributing IoT devices to exhibit a kind of adaptive behavior. Also, the insights extracted can be supplied to senior management people to visualize and realize next-generation cognitive IoT products, solutions, and services.
-
10 Describing the IoT data analytics methods and platforms
- + Show details - Hide details
-
p.
191
–221
(31)
As explained in the beginning of this book, the overwhelming leverage of miniaturization, digitization, distribution, consumerization (mobility), consolidation, centralization and industrialization (cloud), compartmentalization (virtualization and containerization), and deeper connectivity technologies has a number of trendsetting and transformational implications on IT as well as businesses across the globe. Edge or fog computing through cloudlets and micro clouds is another potential phenomenon for next-generation IT. There will be a cool convergence in forming and firming up hyper-converged cloud environments to host and deliver smarter and sophisticated applications for the total humanity.
All these advancements are bound to bring forth a number of distinct outputs and opportunities. The principal one among them is the enormous growth in data generation. Further on, there are greater variability, viscosity, and virtuosity in data scope, structure, and speed. That is, with the continuous growth of value-adding data sources and resources, the amount of data getting generated, captured, transmitted, and stored is tremendously huge. As data is turning out to be a strategic asset for any organization to be decisive, distinctive and disciplined in its operations, offerings, and outputs, a host of competent technologies, tips, and tools are being unearthed to smartly stock and subject all incoming and stored data to a variety of deeper investigations to gain actionable insights in time.
Especially extracting and extrapolating knowledge out of data heaps in time goes a long way in empowering every kind of enterprises and endeavors to be exceptionally efficient and effective in their deals, deeds, and deliveries. In this chapter, we would like to dig deeper and dwell at length on the various analytical approaches, frameworks, algorithms, platforms, engines, and methods for squeezing out value-adding and venerable insights out of IoT data.
-
11 Detection of anomaly over streams using isolation forest
- + Show details - Hide details
-
p.
223
–246
(24)
Machine learning algorithms provide useful methods for detecting anomaly over streaming data. There are two major categories of machine learning algorithms, namely supervised and unsupervised algorithms for detecting anomaly. Of these two categories, supervised algorithms need to be trained using huge collection of 'training data', prior to the detection of anomaly in unforeseen data. Here, the training data is required to be labelled data where the label refers to the predefined class. But in reality, labelled data may not always be available to train the machine learning algorithms. In such situations, unsupervised algorithms provide a mechanism of detecting anomaly. In this chapter, the usefulness of unsupervised (clustering) algorithms for anomaly detection over streams is analysed.
-
12 Detection of anomaly over streams using big data technologies
- + Show details - Hide details
-
p.
247
–262
(16)
Anomaly detection serves as a method for identifying and recognizing abnormal events that may occur over data in various application domains. Anomaly detection is very useful as it provides valuable and actionable information such as detection of fraud in financial domain, detection of intrusion in networking etc. Detecting anomaly over streaming data requires efficient tools and techniques as streaming data is continuously flowing one with no start or end. As streaming data is associated with speed, big data-based platforms provide the fundamental base over which machine learning algorithms can be employed so that the detection of anomaly over streaming data can be performed efficiently. This chapter describes Apache Kafka-based architecture in which detection of anomaly over streams is being done using machine learning algorithm.
-
13 Scalable and real-time prediction on streaming data - the role of Kafka and streaming frameworks
- + Show details - Hide details
-
p.
263
–308
(46)
Increasingly data gets generated and streamed from different and distributed data sources. In order to make sense out of any streaming data, the market is flooded with a variety of features-rich streaming data analytics platforms and frameworks. Lately, the role and responsibility of Kafka, an open source streaming data platform, are growing steadily. In this chapter, we have written about the noteworthy contributions of Kafka in producing timely, trendsetting and predictive insights out of streaming data.
A new breed of "Fast Data" architectures has evolved to be stream-oriented, where data is processed as it arrives, providing businesses with a competitive advantage. The demand for stream processing is increasing every day in the digital era. The main reason behind it is that processing only volumes of data is not sufficient but processing data at faster rates and making insights out of it in real time is very essential so that organization can react to changing business conditions/sentiments in real time. And hence, there is a need to understand the concept "stream processing" and technology behind it. This collateral is prepared with the noble intention of articulating the need for scalable and real-time prediction on streaming data. There are competent technologies, tools, and techniques for real-time prediction out of streaming data in a highly elastic manner. All these details are covered in this document in order to enlighten our readers. The major topics illustrated here include; 1. Streaming concepts; 2. Apache Kafka; 3. Apache Kafka streams; 4. Apache Spark; 5. A sample machine learning (ML) application for real-time prediction out of streaming data in a scalable fashion.
-
14 Object detection techniques for real-time applications
- + Show details - Hide details
-
p.
309
–333
(25)
Object detection is a method of identifying and locating items in a continuous image or video stream using computer vision. Background subtraction, optical flow, and frame difference are object detecting approaches. In this study, we will explore Object Detection Algorithms, such as Convolutional Neural Network algorithm families (CNN, R-CNN, Fast R-CNN, Faster R-CNN) and YOLO as well as associated applications and frameworks. Object Detection Algorithms will be used by real-time applications and will get their goals.
This book chapter discusses the Introduction about computer vision, object detection, object detection architectures, object detection methods, object detection techniques, applications of object detection, and implementation sources for object detection.
-
15 EdgeIoTics: leveraging edge cloud computing and IoT for intelligent monitoring of logistics container volumes
- + Show details - Hide details
-
p.
335
–347
(13)
In logistics enterprises, a significant amount of space available in vehicles goes unused. Currently, there is no method to measure and estimate the dimension and volume of shipments in real-time and use this data to prepare an efficient load procedure. This in turn leads logistical services to employ inefficient methods that drastically increase their resource needs and workload. To facilitate this, this chapter offers an intelligent monitoring service that uses Internet of Things (IoT), Edge Cloud, and web services to record and capture data about shipments. It uses this information to efficiently create a load procedure for parcels based on the company's available assets. The proposed system continuously monitors capacity and position throughout container shipping and analyses the data on a remote cloud server to warn partners when a certain condition or violation occurs. In a real-life setting, we tested the system and discovered that it correctly notifies partners when certain undesirable environmental conditions or events that occur.
-
16 A hybrid streaming analytic model for detection and classification of malware using Artificial Intelligence techniques
- + Show details - Hide details
-
p.
349
–365
(17)
In this era of technology and network, the battleship between security experts and malware developers is a never-ending fight as every day a new signatures malware comes into the battle to fight. To compete in this technological battle, AI-based technique would effectively combat and triumph. Various machine learning and deep learning techniques involved in the process of malware detection and classification with good accuracy are analyzed. ML-based techniques perform very well and were able to detect and classify even zero-day malware. A new hybrid methodology for the efficient detection and classification of malware is proposed. The accuracy of malware detection is improved by the proposed ensemble method.
-
17 Performing streaming analytics on tweets (text and images) data
- + Show details - Hide details
-
p.
367
–403
(37)
In today's world, Twitter is one of the most popular and powerful social networking platforms. People interact, voice out their opinions, express their thoughts, start conversations about various issues through "tweets" and the market has started analyzing these tweets to get a better understanding of how the general public feels about their products, schemes, or new policies. This helps them to get their perspective directly and work towards making them more people-friendly. Different organizations and consumers are using various mechanisms to perform the sentiment analysis. In this work, two different methods are proposed to perform Twitter sentiment analysis. One of them is using Node-RED and IBM services where real-time data is harvested i.e. tweets from Twitter using Twitter's API from its developer tool. Data is then stored into the IBM cloudant and then the tweets are classified into positive, negative or neutral based on their sentiment score which lies between −5 and +5. Using IBM Watson's text to speech service, the tweets are converted to the audio format so that the user could listen to them one by one and the IBM Watson image analyzer is used to analyze what the image represents and it also shows the confidence and different features of the image. Using the visual recognition service, the tone of the tweet is analyzed among emotions like joy, anger, disgust, etc. and the Node-RED dashboard can be created to see the sentiment scores of various tweets. The other proposed method is using Python in machine learning where the sentiment analysis of the tweets is performed with the sentiment score using different classifiers and those classifiers are compared based on their precision, recall, f1 score, and accuracy to conclude which one of them is the most appropriate one for this work. Real-time data is harvested from Twitter, converted into a CSV file and analyzed into positive, negative, or neutral according to their sentiment scores using the classifiers.
-
18 Machine learning (ML) on the Internet of Things (IoT) streaming data toward real-time insights
- + Show details - Hide details
-
p.
405
–432
(28)
The world generates an unfathomable amount of data every day. The speed with which the data gets generated, transmitted, ingested, and crunched is nothing but spectacular in the recent past. Expert thinkers and pundits across the globe are of the opinion that data is the transformation agent. Data is positioned as a strategic asset for any institution, innovator, and individual to grow and glow and is being reasoned as the new fuel for bringing in real and sustainable business transformation. It is a universally accepted fact that data can be methodically processed, mined, and analyzed to produce actionable insights. There are batch and real-time data processing methods to make sense of data heaps. Further on, there are integrated data analytics platforms in plenty, to extract hidden patterns, associations, and other useful insights out of data volumes. The data analytics ecosystem grows continuously considering the importance of data-driven insights and insights-driven decisions for businesses as well as people to be agile, adaptive and adroit in their dealings and deeds.
Off late real-time data capture, storage, analytics, decision-making, and action are being insisted upon vehemently considering the evolving business dynamics. Any data or message has to be carefully captured, cleansed, and crunched immediately in order to be really beneficial for businesses and commoners. It is indisputable that the data value goes down sharply with time. Another facet is that agile and autonomous business systems are extremely event-driven. That is, any business event may trigger a suite of events across. Thus, all kinds of event data/messages have to be received and processed in real time in order to activate and automate one or more business operations. In short, for envisaging and realizing real-time services, applications and enterprises, real-time data analytics capability is very much indispensable. Enterprises are therefore keenly strategizing and setting up analytics infrastructure modules with all the clarity and alacrity to make sense out of both internal and external data in time. Such a futuristic and flexible capability helps business houses to be sagaciously steered in the right direction.
-
Back Matter
- + Show details - Hide details
-
p.
(1)