Big Data Recommender Systems - Volume 2: Application Paradigms
2: North Dakota State University, Fargo, ND, USA
3: University of Sydney, Sydney, NSW, Australia
First designed to generate personalized recommendations to users in the 90s, recommender systems apply knowledge discovery techniques to users' data to suggest information, products, and services that best match their preferences. In recent decades, we have seen an exponential increase in the volumes of data, which has introduced many new challenges. Divided into two volumes, this comprehensive set covers recent advances, challenges, novel solutions, and applications in big data recommender systems. Volume 2 covers a broad range of application paradigms for recommender systems over 22 chapters. Volume 1 contains 14 chapters addressing foundations, algorithms and architectures, approaches for big data, and trust and security measures.
Inspec keywords: Big Data; recommender systems
Other keywords: big data recommender systems; application paradigms
Subjects: Search engines; Database management systems (DBMS); Data handling techniques; General and management topics; Information networks
- Book DOI: 10.1049/PBPC035G
- Chapter DOI: 10.1049/PBPC035G
- ISBN: 9781785619779
- e-ISBN: 9781785619786
- Page count: 536
- Format: PDF
-
Front Matter
- + Show details - Hide details
-
p.
(1)
-
1 Introduction to big data recommender systems—volume 2
- + Show details - Hide details
-
p.
1
–7
(7)
The rapid development of e-commerce websites and social networking applications has drastically increased the volumes of online generated data, leading to the term big data. With the rise in Internet population to 3.2 billion worldwide, on the average, 2.5 quintillion bytes of data is generated on daily basis [1]. Such greater volumes of data introduced information overload problem, when it is difficult to find the most relevant information from numerous diverse sources, e.g., websites, blogs, e-commerce, and social networking applications. The growing size of data has forced the research community to think beyond the simple search problem to the next level of filtering of pertinent information [2]. Past few years have seen significant progress in the development of powerful and intelligent tools to process and analyze the complex patterns in big data to extract the knowledge that is more meaningful for users. The potential ability to create intelligence from the analysis of raw data has been successfully applied to diverse areas, such as business, industry, sciences, social media, and e-commerce, to name a few. The ever-growing volume, complexity, and dynamicity of online information have necessitated the use of recommender systems as an appropriate tool for facilitating and accelerating the process of information engineering. The recommender systems apply numerous knowledge discovery techniques on users' historical and contextual data (e.g., location, time, preference, weather, device, and mood) to suggest information, products, and services that best match the user's preferences [3].
-
2 Deep neural networks meet recommender systems
- + Show details - Hide details
-
p.
9
–33
(25)
Deep learning has been widely used in many software disciplines in both academia and industry including computer vision, speech recognition and translation, natural languages processing, search engine, bioinformatics, sensor data processing, finance, etc., due to its scalability in big data environments and accuracy at higher level than ever before. Especially, deep neural networks can utilize the parallel computational power of GPU to accelerate the learning process and ensure higher efficiency for big data problems.
-
3 Cold-start solutions for recommendation systems
- + Show details - Hide details
-
p.
35
–56
(22)
Recommendation systems are essential tools to overcome the choice overload problem by suggesting items of interest to users. However, they suffer from a major challenge which is the so-called cold-start problem. The cold-start problem typically happens when the system does not have any form of data on new users and on new items. In this chapter, we describe the cold-start problem in recommendation systems. We mainly focus on collaborative filtering systems which are the most popular approaches to build recommender systems and have been successfully employed in many real-world applications. Moreover, we discuss multiple scenarios that cold start may happen in these systems and explain different solutions for them.
-
4 Performance metrics for traditional and context-aware big data recommender systems
- + Show details - Hide details
-
p.
57
–69
(13)
Recommender System (RS) concept was coined in the mid-1990s, when researchers took interest in recommendation problems that primarily used the concept of ratings to obtain the user preferences for different items. A lot of work has been exercised and investigated in this area for recommending the most relevant information and contents to users without taking the contextual information, such as date, time, location and event. In the last few years, context-aware recommender systems (CARS) have made tremendous contributions in all domains of life and improved the recommendation process based on the contextual information along with the traditional approaches. The effectiveness of an algorithm can be measured in the sense that how efficiently it returns the recommendation to users/customers with respect to context or occasion. To assess the effectiveness and performance of any recommender algorithms completely, some common metrics are defined to assess the performance of the recommender algorithm beforehand.
-
5 Mining urban lifestyles: urban computing, human behavior and recommender systems
- + Show details - Hide details
-
p.
71
–81
(11)
In the last decade, the digital age has sharply redefined the way we study human behavior. With the advancement of data storage and sensing technologies, electronic records now encompass a diverse spectrum of human activity, ranging from location data [1,2], phone [3,4], and email communication [5] to Twitter activity [6] and opensource contributions on Wikipedia and OpenStreetMap [7,8]. In particular, the study of the shopping and mobility patterns of individual consumers has the potential to give deeper insight into the lifestyles and infrastructure of the region. Credit card records (CCRs) provide detailed insight into purchase behavior and have been found to have inherent regularity in consumer shopping patterns [9]; call detail records (CDRs) present new opportunities to understand human mobility [10], analyze wealth [11], and model social network dynamics [12].
-
6 Embedding principal component analysis inference in expert sensors for big data applications
- + Show details - Hide details
-
p.
83
–105
(23)
The increasing relevance of big data applications in fields as the Internet of Things (IoT) and Industry 4.0 implies that sensors are requested to be secure and accurate. In the last years, sensors are evolving toward complex monitoring functionalities, increasing the complexity of data, meaning that the analysis stage is usually performed away from the sensor layer, i.e., the fog or the cloud. This separation entails issues for response time and security. As a possible way to address this data analysis closer to the edge, embedded machine-learning (ML) techniques have shown to be a good solution, leading to expert sensors. Feature extraction tools, as principal component (PC) analysis (PCA), might offer a solution to reduce the amount of data transmitted through the network, adding additional security because information is not transmitted as raw data. However, PCA is time-consuming and therefore, it should be carefully optimized according to the hardware used in the sensor device. This chapter proposes to embed the PCA inference stage in a low-cost field-programmable system on chip (SoC) (FPSoC) while performing a design space exploration for a general PCA inference problem. To this end, the authors analyze metrics, such as latency, scalability, and usage of hardware resources. The resulting architectures are compared to a multicore OpenMP approach to be executed in an ARM processor, analyzing the advantages of using the FPSoC implementation in speedup.
-
7 Decision support system to detect hidden pathologies of stroke: the CIPHER project
- + Show details - Hide details
-
p.
107
–124
(18)
Currently, it is difficult to find platforms connected to health systems that exploit data in a coherent way and that allow, on the one hand, to send sanitary warnings and on the other, to validate the performance of medical specialists according to the models set by the best practices of the specialty. This chapter aims to explain the CIPHER project, a decision support system (DSS), based on machine-learning (ML) and big data technologies, capable of alerting a clinician when a situation of risk is detected in a patient suffering from a certain pathology, so that could be able to carry out the appropriate measures. CIPHER, is a project born from scratch. For its development, different methodologies, such as design sprint (for product prototyping), navigational development techniques (for product analysis and testing) or SCRUM (for product development), have been applied. In addition, this product has been defined in direct contact with medical specialists and under the umbrella of international standards and models such as ISO 13606, SNOMED, REGICOR or CHADS2. As a result of the development of this product, we have obtained a DSS, which offers health professionals the possibility of receiving alerts from patients who may be at risk of suffering from a specific pathology, based on a series of criteria defined by international standards. Moreover, health professionals would be able to find hidden symptomatology of the pathology mentioned above, which, a priori, are not known.
-
8 Big data analytics for smart grids
- + Show details - Hide details
-
p.
125
–144
(20)
The Internet of Things (IoT) has recently emerged as an enabling technology for the next-generation electricity grid, namely, smart grid (SG). The efficient operation of the smart electricity grid depends on the efficient acquiring, analyzing, and processing of a large volume of data generated by the utilized smart sensors, individual smart meters, energy-consumption schedulers, aggregators, solar radiation sensors, wind-speed meters, and relays. In order to deal with the extreme size of data, the adoption of advanced data analytics, big data management, and powerful monitoring techniques is required. This approach creates huge opportunities and challenges, especially considering the real-time monitoring, load, renewable energy, and prices forecasting, identification and prediction of faults, and integration of electric vehicles, functioning in a mobile SG environment. Among others, intelligent algorithms, robust data analytics, high performance computing (HPC), efficient data network management, and cloud computing (CC) techniques are critical toward the optimized operation of SG. This chapter presents the big data issues faced by SG networks and the corresponding solutions.
-
9 Internet of Things and big data recommender systems to support Smart Grid
- + Show details - Hide details
-
p.
145
–172
(28)
Since its appearance, the Internet of Things (IoT) has completely revolutionized almost all aspects of our lives. Among present and potential numerous and diverse applications of IoT, its utilization in the energy sector is of particular interest. The IoT inclusion in the power industry and Smart Grid (SG) evolution opens a whole world of high-potential opportunities to optimize the grid operation. The realization of SGs utilizing smart metering technology or advanced metering infrastructure with bidirectional IoT-based communication between demand and utility could improve existing energy balancing procedures. Keeping energy consumption and supply in balance with minimal operating costs and optimal grid conditions is not an easy task, especially in presence of renewable energy sources. As the IoT is established on the utilization of a large number of smart things/devices that generate a prodigious amount of data on a daily basis, successfully managing big data represents a key issue. In order to obtain valuable insights and knowledge from data gathered, the appliance of big data analytics is demanded. Hence, effective analysis and utilization of a massive amount of diversity of data that arrive at high speed and can be of uncertain provenance are mandatory in the process of obtaining valuable insights and enable the creation of knowledge-based recommender systems. Big data analytics applied to data gathered from smart meters could be used to make valuable recommendations regarding consumption prediction, demand response and management programs, voltage and frequency control, state estimation, and power quality. The overall operation of SG could be certainly optimized in various aspects by using large-scale near real-time measurements. The general aim of this chapter is to provide an overview of ongoing scientific research, recent technological innovations and breakthroughs, and big data analytics role in making recommendation systems that will facilitate the development and evolution of future global energy systems.
-
10 Recommendation techniques and their applications to the delivery of an online bibliotherapy
- + Show details - Hide details
-
p.
173
–185
(13)
With the rapid progress of economy and society, people have to undertake unprecedented consistent and severe stress. Bibliotherapy is an effective way to help people cope with psychological stress. By selecting and recommending specific reading materials to patients with mental illness or emotional disturbance, it facilitates patients recovery and rehabilitation. Currently existing bibliotherapy requires professional staffwith the background ofboth psychological and library service to give reading recommendation, which is quite labor costly and demanding, and the booklists for variant individuals need to be highly customized by therapists. To address this limitation, this chapter delivers an automatic reading recommendation solution for online bibliotherapy, whose aim is specifically for adolescents to manage their stress coming from study, family, peer relationship, self-cognition, to romantic relationship. The 6-week user study preliminarily demonstrated the effectiveness of the solution, in which the recommended articles hold both high-stress easing effect and good attractiveness. This chapter first gives a brief review of bibliotherapy and recommendation techniques in the literature. Then reports the design and implementation of our reading recommendation system for easing teens psychological stress. Finally some application interfaces are provided to demonstrate the usage of the system.
-
11 Stream processing in Big Data for e-health care
- + Show details - Hide details
-
p.
187
–203
(17)
In this chapter, we will present the stream processing and batch processing. Besides, we will conduct a qualitative comparison of the most popular data processing systems, namely Storm and Spark streaming. We will describe their respective underlying bases and the functionalities they provide and discuss how they can be introduced into e-health care analysis programs.
-
12 How Hadoop and Spark benchmarking algorithms can improve remote health monitoring and data management platforms?
- + Show details - Hide details
-
p.
205
–234
(30)
This chapter introduces the characteristics of e-care platform and the concept of ontology which helps the reader understand the system that will implement big data tools for its migration while also focusing on focuses on the most popular systems in the Hadoop ecosystem, emphasizing MapReduce and Spark.
-
13 Extracting and understanding user sentiments for big data analytics in big business brands
- + Show details - Hide details
-
p.
235
–257
(23)
Consumer behavior has become the niche of the market for every user from a manufacturer to a customer. People are fairly good at expressing what they want, what they like, or even how much they will pay for an item. But they are not very good at accessing where that value comes from. Behavior is triggered from sentiments generated in response to an external stimulus. Sentiments and emotions are the subjects of study of sentiment analysis and opinion mining, and this field of study coincides with rapid growth of social media on the web, e.g. social networks, blogs and Twitter, and for the first time, we have huge volume (big data) of data in digital form with us to analyze. Developing algorithms for computers to recognize emotional expression is a widely studied area, and the study of big data analytics and neuromarketing techniques acts as the most powerful tool to develop these algorithms for better understanding of consumer preferences, purchase behavior and decision patterns. The research aims to extract/read user behavior/sentiment to predict future preferences and to plan the business branding policies. The major objective of this chapter is to perform data analytics of the sample data using Hadoop framework based on crucial metrics related to consumer behavior: (1) customer acquisition cost; (2) customer retention cost; (3) lifetime value; (4) customer satisfaction and happiness; and (5) average purchase amount and behavior. The understanding of these metrics helps in extraction of customer buying trends leading to match the specific customer personas, hence meeting business strategies. The chapter provides a study of user sentiment using neuromarketing techniques and providing data analytics on the user-recorded sentiments based on consumer behavior metrics. The chapter provides an understanding of (1) user sentiments, (2) consumer behavior and neuromarketing process and (3) big data analytics.
-
14 A recommendation system for allocating video resources in multiple partitions
- + Show details - Hide details
-
p.
259
–276
(18)
A recommendation system or recommender aims to deliver meaningful recommendations for items or services to any interested party (e.g., users and applications). Recommenders provide their results on top of the collected data related either to the items' and users' description or ratings defined by users. Recommenders can be adopted in the domain of large-scale data management with significant advantages. Due to huge volumes of data, many techniques consider the separation of data into a number of partitions. Analytics are delivered on top of these data partitions and, accordingly, are aggregated to form the final response into the incoming queries. Data separation techniques can be incorporated to allocate the data into the appropriate partitions, thus, to improve the efficiency in the delivery of analytics. In this chapter, we propose a recommendation system responsible for allocating the data to the most appropriate partition according to their current contents. Our approach facilitates the provision of the analytics for each data partition by collecting “similar” data into the same partition. The aim is to support statistical insights into every partition to efficiently define query execution plans. We adopt a decision-making scheme combined with a naïve Bayesian classifier for deriving the appropriate partition. We focus on the management of streams of video files. The proposed recommender derives the appropriate partition for each incoming video file based on a set of characteristics. We evaluate our scheme through a set of simulations that reveal its strengths and weaknesses.
-
15 A mood-sensitive recommendation system in social sensing
- + Show details - Hide details
-
p.
277
–291
(15)
This chapter reviews a mood-sensitive (MS) recommendation system in social sensing. This work is motivated by the need to provide reliable information recommendation to users in social sensing. The key idea of social sensing is to use humans as sensors to observe and report events in the physical world. We define the measurements from human sensors as claims. A key challenge in social sensing is truth discovery where the goal is to identify truthful claims from the false ones and estimate the reliability of data sources with minimum prior knowledge on both sources and their claims. While current solutions have made progress on addressing this challenge, an important limitation exists: the mood sensitivity of human sensors has not been fully explored. Therefore, the true claims identified by existing schemes can be biased and lead to useless or even misleading recommendations. In this chapter, we present an MS recommendation system that incorporates the mood sensitivity feature into the truth discovery solution. The reviewed recommendation system estimates (i) the correctness and mood neutrality of claims and (ii) the reliability and mood sensitivity of sources. We compare our model with existing truth discovery solutions using four real-world datasets collected from online social media. The results show the reviewed recommendation system outperformed the baselines by finding more correct and mood neutral claims.
-
16 The paradox of opinion leadership and recommendation culture in Chinese online movie reviews
- + Show details - Hide details
-
p.
293
–315
(23)
In this empirical study of online leadership, analysis for movie recommendations on Douban, one of the biggest interest-oriented online Chinese-language social networking systems of its kind, we address the identification of the characteristics of key opinion leaders using a big data processing framework. As an illustrative case study, we focus on a niche subset of popular audience content on Douban: approximately a half million short comments regarding the top 94 most popular South Korean films produced between 2003 and 2012. Raw data samples, including film details, review comments, and user profiles, are harvested via one asynchronous scraping crawler, and then their heterogeneous features are manipulated accordingly. Finally, a parallel association rule-mining (ARM) algorithm is employed for revealing leadership patterns. The proposed framework explains how to extract high-level features that can then be used to gauge the effectiveness of these so-called key leaders and their ability to generate word-of-mouth (WOM) awareness and interest surrounding their recommendations. In turn, researchers can edge closer to determining the kind of charismatic `soft power' appeal of leading reviewers and reviews that are facilitating among follower networks new opportunities to evaluate a film and ultimately to decide to view it.
-
17 Real-time optimal route recommendations using MapReduce
- + Show details - Hide details
-
p.
317
–337
(21)
To avoid complications related to the decision-making process, recommendation systems are introduced to suggest a ranked list of items which most meet special user's requirements. One of the useful types of Recommendation Systems is Route Recommendation System (RRS). The Route Recommendation apps provide a variety of services for their users. Some of these services are beating the traffic, finding the new and ideal route that depends on roads condition, aiding disabled people to find their destination independently, guiding strangers such as tourists in an unfamiliar area, leading pedestrian in emergency, etc. In this chapter, we will present an overview of RRSs and their details. After presenting the basic concepts, we can classify them based on services which they provide. Besides, we are going to discuss about the input data and answer the question “Why it is big?” Our aim is to provide you with a layered architecture of RRSs which can deal with such big data and also be able to serve optimal real-time recommendation. In order to achieve our purpose, the big data technologies mapped to each layer are introduced. Moreover, we will set up a brief discussion about MapReduce paradigm and its strengths as one of the techniques to make parallel computation possible.
-
18 Investigation of relationships between high-level user contexts and mobile application usage
- + Show details - Hide details
-
p.
339
–360
(22)
Along with the widely spreading of smartphones, users leverage various functions of the smartphones in their everyday life. To reveal the behavior of smartphone users, many existing works collect low-level contexts such as location and movement status of users from sensors (e.g., GPS, acceleration sensor) to predict the users' situations when they use smartphones. However, it seems that not only low-level contexts but also high-level contexts (e.g., how busy, how good in health, working/day off, and with whom the user is) have significant impact on smartphone users' behavior. In our previous work, we developed a log-collection system to collect high-level contexts by questioning users directly. In this system, to collect a large amount of logs from general smartphone users from whom we have adopted a game-based approach. So far, we have collected approximately 0.7 millions of logs from about 400 users. In this chapter, we investigate relationships between high-level user contexts and application usage by analyzing a large amount of application usage logs collected through this system. Specifically, we report our experiments which have conducted association rule mining on the collected logs and show some findings. Our study described in this chapter will be a guideline on how to collect big data on user's high-level contexts, and how to apply them for important context-aware applications such as application recommendation.
-
19 Machine learning and stock recommendation
- + Show details - Hide details
-
p.
361
–383
(23)
In this chapter, we develop a neural network (NN) model for stock classification using input features derived from widely known momentum factors and apply it to two problems; long-short strategy construction and stock recommendation. Empirical findings suggest that our model can create a long-short portfolio generating a significant profit and high Sharpe ratio (SR). It is also effective in making buy/hold/sell recommendation, although the evidence is less strong. Our model seems to be more powerful for cross-sectional prediction while having a limited ability for time-series prediction. We also find that economic performance of a model can be very different from its statistical performance. This signifies the importance of choosing an objective function that reflects economic performance and evaluating models from both statistical and economic perspectives.
-
20 The role of smartphone in recommender systems: opportunities and challenges
- + Show details - Hide details
-
p.
385
–405
(21)
The popularity of smartphones in people's daily life brings new opportunities as well as challenges in recommender system. New opportunities include new available context data, e.g., user interaction time (usually from native mobile app) and geo-location data (from equipped GPS sensors). These metainformation provides different ways of inferring user preference, which ultimately improves the recommendation performance. For instance, with record of tap-in and tap-out timestamp, the dwell time can be estimated. It thus provides an opportunity to address the “silent viewing” issue by inferring people's implicit rating, which will benefit conventional recommender systems that suffer from rating-sparsity. At the meantime, new challenges are mainly in two-fold. First, such side information is not included in conventional recommendation model, and thus it is not easy for integration. Also, recommendation services via smartphones is itself a scenario different from traditional PC-based one, which leads to “pitfalls” where existing techniques may fail. Particularly, we focus on two representative recommendation scenarios in smartphones, i.e., app and point-of-interest (POI) recommendation. For the former one, conventional model may recommend apps that users would never download due to the ignorance of potential conflict between candidate apps and installed ones. To recommend POI, failure of modeling physical location may lead to candidates that are too far away. In this chapter, we reveal these issues and describe corresponding solutions.
-
21 Graph-based recommendations: from data representation to feature extraction and application
- + Show details - Hide details
-
p.
407
–454
(48)
Modeling users for the purpose of identifying their preferences and then personalizing services on the basis of these models is a complex task, primarily due to the need to take into consideration various explicit and implicit signals, missing or uncertain information, contextual aspects, and more. In this study, a novel generic approach for uncovering latent preference patterns from user data is proposed and evaluated. The approach relies on representing the data using graphs, and then systematically extracting graph-based features and using them to enrich the original user models. The extracted features encapsulate complex relationships between users, items, and metadata. The enhanced user models can then serve as an input to any recommendation algorithm. The proposed approach is domain-independent (demonstrated on data from movies, music, and business systems) and is evaluated using several state-of-the-art machine-learning methods, on different recommendation tasks, and using different evaluation metrics. Overall, the results show an unanimous improvement in the recommendation accuracy across tasks and domains.
-
22 AmritaDGA: a comprehensive data set for domain generation algorithms (DGAs) based domain name detection systems and application of deep learning
- + Show details - Hide details
-
p.
455
–485
(31)
In recent days, botnet plays an important role in malware distribution. This has been used as a primary approach for the proliferation of the malicious activities via the internet by attackers. To evade blacklisting, recent botnets make use of domain flux or internet protocol (IP) flux. This work focuses on domain flux. Domain flux uses domain generation algorithms (DGAs) to generate a list of domain names based on a seed and these domain names contacts command and control (C&C) server till it gets access permission to the system. This work presents the fully labeled domain name data set entitled as AmritaDGA which can be used for doing research in the field of detecting domain names which are generated using DGAs. We evaluate the efficacy of deep learning architectures with Keras embedding as domain name representation method on AmritaDGA. AmritaDGA is composed of two data sets. The first data set is collected from the publicly available sources. The second data set is collected from an internal real-time network. The performance of the trained model on public data set is evaluated on unseen samples of a public data set and private corpora. Deep learning architectures performed well in most of the cases of test experiments. The baseline system has been made publicly available and the data set is distributed for Detecting Malicious Domain names (DMD 2018) shared task.
-
Back Matter
- + Show details - Hide details
-
p.
(1)