First designed to generate personalized recommendations to users in the 90s, recommender systems apply knowledge discovery techniques to users' data to suggest information, products, and services that best match their preferences. In recent decades, we have seen an exponential increase in the volumes of data, which has introduced many new challenges. Divided into two volumes, this comprehensive set covers recent advances, challenges, novel solutions, and applications in big data recommender systems. Volume 1 contains 14 chapters addressing foundations, algorithms and architectures, approaches for big data, and trust and security measures. Volume 2 covers a broad range of application paradigms for recommender systems over 22 chapters.
Inspec keywords: trusted computing; security of data; recommender systems; Big Data
Other keywords: trust measures; big data recommender systems; recommendation approaches; architectures; algorithms; security measures
Subjects: Search engines; Data handling techniques; Information networks; General and management topics
In the past few years, numerous recommendation approaches have been proposed to address various challenges of recommender systems. However, there are still many open and unresolved issues that require novel and more efficient recommendation solutions to handle big data. The book Big Data Recommender Systems: Recent Trends and Advances consists of two comprehensive volumes. Each volume consists of good quality chapters contributed by world renowned researchers and domain experts. Volume 1 aims to cover the recent advances, issues, novel solutions, and theoretical research on big data recommender systems. The book encompasses original scientific contributions in the form of theoretical foundations, comparative analysis, surveys, case studies, techniques, and tools for recommender systems. A specific focus is devoted to emerging trends and the industry needs associated with utilizing recommender systems. Some of the topics covered in the Volume 1 include benchmarking of recommendation algorithms using Map Reduce, social recommendations, hybrid approaches (HAs), deep learning-based techniques, unstructured big data recommendations, machine learning (ML)-based models, and geo-social recommendations. A special section is included to cover the security and privacy concerns, cyberattacks on recommender systems, and their defensive measures.
A recommender system (RS) is software that provides suggestions to a user in decision making process. The decision making may be for commercial purposes, personalized applications or simple information retrieval. These systems have become an important component of almost all applications that relate to some form of information retrieval and processing by using various search techniques. This chapter provides the background to theoretical foundations of RSs. We start with a traditional approach and discuss some commonly used definitions of RSs. After justifying the need for such systems in the current scenario of information overload, application areas where RSs can be useful and productive are discussed. It includes the discussion on both existing and possible future areas of applications. The list presented is not exhaustive, as many areas have opted to add value to their applications by integrating with some form of RS. The next section gives a brief overview of the phases through which RS passes in order to perform its function. The types of RSs are discussed giving their advantages and disadvantages, and content-based recommenders, collaborative filtering (CF) based recommenders, hybrid recommenders, image-based recommenders, and graph database (GDB)-based recommenders are discussed in detail. After a brief overview of the problems identified with the current RSs, some datasets are listed that may be used for research and evaluation of various RSs.
Recommender or recommendation systems have gained popularity in recent years, and big data is the driving force behind recommendation systems. Recommendation systems changed the way websites communicate with the users by providing a recommendation based on users history such as purchases and searches. Recommendation systems are used in a variety of areas such as movies, music, research articles and social tags. For example, recommendation system in Facebook “People you may know,” Netflix “Because you watched” and YouTube “Recommend for you.” These systems usually produce a list of recommendations in two ways: collaborative and content-based (CB) filtering. Collaborative filtering (CF) is based on a model of prior user behavior, which can be constructed from sole user's action or from the actions of other users who have similar behaviors, while content-based filtering constructs a recommendation on user's behavior such as by using historical browsing information. Apart from these, the hybrid approach can be used by combining two models. While designing, such systems require compute function values at several thousand points and thus are computationally quite extensive. These systems need parallel computations to speed up the search for an acceptable solution that can be recommended through nature-inspired computation. There are many factors that are essential while designing accurate recommendation algorithms. Some of these factors are diversity, recommender persistence, privacy, user demographics, trust and labeling. Recommendation system cannot perform its job without data, and big data supplies the amount of user's data such as past purchase history, browsing history [1,2]. In fact, efficient recommendation system requires big data. The best solution is Hadoop; it is a platform used to store, generate, manage and distribute big data easily around several large server nodes [3-5]. Hadoop offers Hadoop distributed file system (HDFS), which distributes all the data in different clusters and performs parallel operations. This chapter will explore big data issues and specific in Hadoop and HDFS.
In this chapter, we present several approaches designed for providing efficient recommendations in large web systems characterized by bigdata scales. The key feature of the considered approaches is that they all rely on different elements and properties of social/complex network analysis for addressing various deficiencies of legacy and current recommendation systems when very large operational scales emerge. The main challenges of recommendations addressed by the presented approaches are the diversity (novelty) of recommendations, the cold-start problem, scalability and noise filtering issues, as well as the efficiency of developing these approaches and integrating them in operational systems. This chapter aspires to provide an educated overview, leading to a solid fundamental background on how social/complex network analysis can be exploited for more effective recommendations in stringent environments characterized by large scales ofusers, items and associated data, cumulatively referred to as big network data. Furthermore, our work aims at highlighting the design principles that are more interesting for enabling the extension of the presented approaches and their combination with other current state-of-the-art techniques, thus leading to more socioaware and efficient recommendation approaches in the near and longer term future.
Hybrid approaches (HAs) based on Fuzzy-Ontology are one of the core functions to efficiently handle and process massive dataset from diverse heterogeneous sources (DHS). HAs are becoming a noticeable trend in recent times due to its wide range of functionality to tackle all types of problem spaces. HAs are in the high demand for organisations to run their daily business operations as increasing numbers of numerous dataset occur every day. Therefore, big data communications are challenging in the traditional approaches to satisfy the needs of the consumer, as data are often not capturing into the database management systems (DBMS) in a seasonably enough fashion to enable their use subsequently. In addition, big data plays a vital role in containing a plenty of treasures for all the fields in the DBMS. However, one of the main challenges of HAs for the big data integration (DI) system is the inherent difficulty to coherently manage data from DHS, as different data sources have several standards and different major systems. It is practically challenging to integrate diverse data into a global schema to attain what looked forward to. The efficient management of HAs using an existing DBMS presents a challenge because of incompatibility and sometimes inconsistency of data structures. As a result, no common methodological approach is currently in existence to effectively solve every DI problem. The challenges of HAs raise the need to find a better way to efficiently integrate voluminous data from DHS. To handle and align massive dataset efficiently, the HAs algorithm with the logical combination of Fuzzy-Ontology along with big data analysis platform has shown the results in term of improved accuracy. The proposed novel HAs will combine the promising features of Fuzzy-Ontology to search, extract, filter, clean and integrate data to ensure that users can coherently create new consistent of datasets.
This chapter introduces some recent trends in generative and deep-learning (DL) models for hybrid recommendation systems that have proven to be extremely effective in integrating different modalities of data. It is organized into three main sections. The first section considers classic algorithms such as probabilistic matrix factorization and latent Dirichlet allocation and illustrates the generative principle of a hybrid recommendation model called collaborative topic regression that jointly models the latent interests of users and items. The second section presents recommendation models that are exclusively based on DL techniques. This includes models such as Restricted Boltzmann-machine-based CF, autoencoder (AE)-based recommendation, neural CF and recurrent recommender network. Finally, the third section explains models such as collaborative denoising AE and collaborative variational AE that integrates PGMs with DL to create a generative DL framework.
In recent times, the recommender system (RS) has played a significant role in assisting users in selecting the ideal product from a huge amount of data. The gradually increasing amount of customers, services and online information due to yielding the large data analysis can be a problem for service RSs. Approaches which are greatly successful cover a wide variety of recommendation tasks such as video, music, image, books. The recommendation algorithms require a variety of parameters to propose suggestions for new users, and this is where limitations and challenges of the RSs emerge. Recommendation algorithms for unstructured big data and their challenges are discussed in this chapter.
Due to industrialization and urbanization, the rapid rise in the volume and amount of hazardous waste and the disposal of it is becoming a burgeoning problem that the world is facing today. One of the best ways out for this problem is to collect, sort and reuse or recycle these waste. This work proposes christened deep segregation of plastic (DSP) architecture which sorts waste materials into plastic and nonplastic using deep learning technique, convolutional neural network (CNN). CNN is one among the efficient modern machine learning techniques, which is able to provide maximum learning efficiency by taking raw input samples. CNN has become a stateof-the-art method for many of the tasks existing in computer vision (CV). In most of the tasks, it has performed well in comparison to the human. This has performed well in various tasks of CV compared to standard neural network. The developed framework is highly scalable on commodity hardware server. The framework collects data from different sensors, preprocess, and analyze using distributed algorithms. Now this framework is specifically developed for plastic segregation. Moreover, the framework can be easily extended to handle large volumes of other waste categories by adding additional resources. These characteristics have made the proposed framework stand out from any other system of similar kind. The proposed design also consists of a prototype which acts as a real-time classifier. The hardware setup consists of a conveyor belt over which the waste materials are placed, and these are captured by a camera fitted on the system. The captured image is sent to the DSP which classifies it into plastic and nonplastic, and accordingly it is moved to two different bins. This system can reduce the human efforts in separating plastics from nonplastics and also in keeping the environment neat and clean. The performance of the system is analyzed on various data sets. These data sets are collected from public and private sources. Various experiments are run for identifying the optimal parameters for CNN networks and structures. All these experiments are run till 1,000 epoch with varied learning rate 0.01-0.5.
Recommendation has become an important mobile application on location-based social networks (LBSNs), especially when users travel to a new place far away from their home. Compared to traditional recommender systems, this type of recommendation is very challenging. A user on geo-social network usually visits only a very limited number of spatial items (points of interest), resulting in sparse user-item matrix. As most users tend to visit the spatial items nearby their homes, the user-item matrix will become even sparser when users travel to a distant place. Another major challenge is that, users' interests and behavior patterns tend to vary dramatically across different time period and different geographical regions. In this chapter, we focus on effective spatial item recommendation by exploiting both spatial and temporal information on geo-social networks. To solve the sighted challenges, we propose ST-SAGE, a spatial- temporal sparse additive generative (SAGE) model for spatial item recommendation. ST-SAGE considers both personal interests of the users and the preferences of the crowd in the target region at the given time by exploiting both the co-occurrence patterns of spatial items and the content of spatial items. To further alleviate the data sparsity issue, ST-SAGE exploits the geographical correlation by smoothing the crowd's preferences over a well-designed spatial index structure called spatialpyramid. To speed up the training process of ST-SAGE, we implement a parallel version of the model inference algorithm on the GraphLab framework. We conduct extensive experiments, and the experimental results clearly demonstrate that ST-SAGE outperforms the state-of-the-art recommender systems in terms of recommendation effectiveness, model training efficiency and online recommendation efficiency.
Hackers spread malware for various reasons, yet regularly the thought processes are money related. Malignant Android battles intended to take charge card and keeping money-related data from tainted gadgets were most common, regularly notwithstanding utilizing the official Google Play Store to trap casualties into entering their Master card data. Guiltless client does not know about the way that the application which he will download is sheltered or pernicious. The key thought is to manufacture a central server that will accumulate the clients'applications information and play out the static and dynamic investigation of an Android application to locate the risky examples and will group it as malicious or benign. This assignment will require huge processing power. Here, the importance of big data comes into picture. Already existed and suggested frameworks have been tremendously helpful, and huge information is the main impetus behind proposal frameworks. Our planned mechanism additionally plans to gather a lot of client information; for example, it adds up to a number of downloads, clients'audits, consents required by an application, and designers data to give relevant and powerful proposals. There is a need of a dynamic malware investigation system which uses the innovations of graphical user interface (GUI)-based testing, big data examination, and machine figuring out how to identify malignant Android applications. The system can be utilized as a part of conjunction with other existing attempts to enhance the discovery rate of malware.
Big data recommender systems are very vulnerable to attacks, especially to profile injection attacks. So, we should use security mechanisms to protect big data recommender systems from different kinds of attacks. These vulnerabilities and attacks may decrease users' trust in accuracy of recommender systems. In addition, issues related to big data recommender systems and their security problems are based on security challenges in Hadoop architecture which is called Hadoop Distributed File System (HDFS). In this chapter, we investigate a number of known attack models, examine their influence and suggest some solutions to combat them. Furthermore, we represent different methods that are used by attackers to modify an attack so it is not recognized as an attack. We consider important issues in creating secure big data recommender systems, focusing on attack models and their effect on different big data recommender approaches. We know the general effect on systems' ability to predict accurately, and we also know the amount of knowledge that attackers need to know about the system to deploy a realistic attack. In this chapter, we show that the two approaches, i.e. user-based and item-based algorithms, are particularly vulnerable to attack patterns, but hybrid algorithms that are the combination of both user-based and item-based algorithms may present higher stability. Also, we study the basics of relevant research and advanced schemas, and discuss future thoughts.
Recommender systems have become an integral part of many social networks and extract knowledge from a user's personal and sensitive data both explicitly, with the user's knowledge, and implicitly. This trend has created major privacy concerns as users are mostly unaware of what data and how much data is being used and how securely it is used. In this context, several works have been done to address privacy concerns for usage in online social network data and by recommender systems. This paper surveys the main privacy concerns, measurements and privacy-preserving techniques used in large-scale online social networks and recommender systems. It is based on historical works on security, privacy-preserving, statistical modeling and datasets to provide an overview of the technical difficulties and problems associated with privacy preserving in online social networks.
Recommender systems are experiencing a significant boost due to the availability of big data which supply an abundance of user data such as past purchases and browsing history. The benefits are increased when a recommender system can use and combine data that come from multiple sites and illustrate a more complete picture of the user's preferences and interests. However, such data often originate from dispersed, heterogeneous sources, and before processing and analyzing them, it is required to integrate or link them. The problem of linking such data consists of identifying data that refer to the same real-world entity across the heterogeneous sources and is known as record linkage or entity resolution. As these data also concern human activities, privacy issues arise when linking data across different sources. The problem is known as privacy preserving record linkage (PPRL). In this chapter, we propose a parallel protocol for PPRL based on phonetic encodings that exploits novel big data processing engines to provide results of high quality in an efficient manner. Our phonetics encoding scheme extends the work presented by Karakasidis and Verykios that is based on the use of the Soundex phonetic algorithm. This protocol also features noise generation to prevent frequency attacks and encryption of both actual and fake data to enable processing by an untrusted party. However, to cope with the low recall that the existence of a big percentage of dirty data may incur, we propose using Soundex combined with another popular phonetic algorithm, and particularly NYSIIS. By combining two phonetic encodings, our protocol becomes more robust and more tolerant to errors in the matching fields, as it introduces redundancy. Furthermore, as Soundex is particularly vulnerable to errors that occur in the beginning of the encoded text; our protocol deploys another optimization by encoding the reverse of the original text with the second phonetic algorithm.
Security attacks are one of the major threats in today's world. These attacks exploit the vulnerabilities in a system or online sites for financial gain. By doing so, there arises a huge loss in revenue and reputation for both government and private firms. These attacks are generally carried out through malware interception, intrusions, phishing uniform resource locator (URL). There are techniques like signature-based detection, anomaly detection, state full protocol to detect intrusions, blacklisting for detecting phishing URL. Even though these techniques claim to thwart cyberattacks, they often fail to detect new attacks or variants of existing attacks. The second reason why these techniques fail is the dynamic nature of attacks and lack of annotated data. In such a situation, we need to propose a system which can capture the changing trends of cyberattacks to some extent. For this, we used supervised and unsupervised learning techniques. The growing problem of intrusions and phishing URLs generates a need for a reliable architectural-based solution that can efficiently identify intrusions and phishing URLs. This chapter aims to provide a comprehensive survey of intrusion and phishing URL detection techniques and deep learning. It presents and evaluates a highly effective deep learning architecture to automat intrusion and phishing URL Detection. The proposed method is an artificial intelligence (AI)-based hybrid architecture for an organization which provides supervised and unsupervised-based solutions to tackle intrusions, and phishing URL detection. The prototype model uses various classical machine learning (ML) classifiers and deep learning architectures. The research specifically focuses on detecting and classifying intrusions and phishing URL detection.