This book reviews some of the underlying technologies and also some recent applications in a number of fields. In a world increasingly overloaded with data of varying quality, not least via the Internet, computerised tools are becoming useful to "mine" useful data from the mass available.

Inspec keywords: neural nets; electricity supply industry; scientific information systems; medical computing; data mining; geophysics computing

Other keywords: data mining; technical issues; medical diagnosis; neural networks; electricity supply industry; organic chemistry; knowledge discovery; weather forecasting

Subjects: Public utility administration; Neural computing techniques; Knowledge engineering techniques; Biology and medical computing; Database management systems (DBMS); Geophysics computing; Business applications of IT; Public utilities

Book DOI: 10.1049/PBPC001E
Chapter DOI: 10.1049/PBPC001E
ISBN: 9780852967676
e-ISBN: 9781849191715
Page count: 328
Format: PDF

Front Matter
- + Show details - Hide details
- p. (1)
  Add to favourites
  
  Login to add to favourites
  
  Save links to your favourite articles.
  
  Export citations
  
  BibTEX
  
  Endnote
  
  Plain text
  
  RefWorks
Part I: Knowledge discovery and data mining in theory
1 Estimating concept difficulty with cross entropy.
- + Show details - Hide details
- p. 3 –31 (29)
  
  In concept learning, the features used to describe examples can have a drastic effect on the learning algorithm's ability to acquire the target concept. In many poorly understood domains, the representation can be described as being low level. Examples are described in terms of a large number of small measurements. No single measurement is strongly correlated to the target concept; however, all information for classification is believed to be present. Patterns are harder to identify because they are conditional. This is in contrast to problems where a small number of attributes are highly predictive of the concept. A. Clark and C. Thornton (1997) call these type-2 and type-1 problems, respectively. Many current approaches perform very poorly on type-2 problems because the biases which they employ are poorly tuned to the underlying concept. We discuss reasons why attempting to estimate concept difficulty before any learning takes place is desirable. Some current approaches to learning type-2 problems are described, together with several measures used to estimate particular sources of difficulty, their advantages and disadvantages and an estimate based on the Δ_j measure (K. Nazar and M.A. Bramer, 1997) which addresses many of the shortcomings of previous approaches.
  Add to favourites
  
  Login to add to favourites
  
  Save links to your favourite articles.
  
  Export citations
  
  BibTEX
  
  Endnote
  
  Plain text
  
  RefWorks
2 Analysing outliers by searching for plausible hypotheses.
- + Show details - Hide details
- p. 32 –45 (14)
  
  The handling of anomalous or outlying observations in a dataset is one of the most important tasks in data preprocessing. It is important for three reasons. First, outlying observations can have a considerable influence on the results of an analysis. Secondly, although outliers are often measurement or recording errors, some of them can represent phenomena of interest, something significant from the viewpoint of the application domain. Thirdly, for many applications, exceptions identified can often lead to the discovery of unexpected knowledge. We propose an algorithm for outlier analysis. In this approach, outliers detected by statistical methods and the context in which these outliers occur are examined by acquiring and applying relevant domain knowledge. In particular, we try to establish domain-specific hypotheses which may be used to explain the data points.
  Add to favourites
  
  Login to add to favourites
  
  Save links to your favourite articles.
  
  Export citations
  
  BibTEX
  
  Endnote
  
  Plain text
  
  RefWorks
3 Attribute-value distribution as a technique for increasing the efficiency of data mining.
- + Show details - Hide details
- p. 46 –63 (18)
  
  An approach to rule discovery has been presented in which the strategy of targeting a restricted class of rules is combined with a technique for their efficient discovery. Attribute values in the dataset are distributed among the outcome classes in the dataset in such a way that the attribute values associated with an outcome class are more likely, on a heuristic basis, to appear as conditions of a discovered rule with this outcome class on the RHS. The discovered rules are exact rules with additional properties depending on the heuristic used to distribute attribute values among outcome classes in the dataset.
  Add to favourites
  
  Login to add to favourites
  
  Save links to your favourite articles.
  
  Export citations
  
  BibTEX
  
  Endnote
  
  Plain text
  
  RefWorks
4 Using background knowledge with attribute-oriented data mining.
- + Show details - Hide details
- p. 64 –86 (23)
  
  This chapter describes two major themes: partial values and database background knowledge. First, we show that the partial-value data model is a useful extension to the relational-database model. As well as increasing the expressivity of the data model, partial values allow the data miner to deal with concept hierarchies rigorously. We show how an iterative procedure allows us to calculate aggregate proportions for a database table. The use of this iterative procedure is well founded in statistical theory, being a maximum-likelihood estimator. It is possible that the Newton-Raphson procedure described by J.M. Jamshidian and R.I. Jennrich (1997) will provide a means of speeding up the solution of the maximum-likelihood equations. Secondly, we demonstrate how it is possible to use background knowledge about the database. We have indicated how to reengineer the database using logic programming and integrity constraints. The aggregation algorithms were extended to the multiattribute case and we have shown how they are computed in the case where integrity constraints limit the allowed combinations of attribute values.
  Add to favourites
  
  Login to add to favourites
  
  Save links to your favourite articles.
  
  Export citations
  
  BibTEX
  
  Endnote
  
  Plain text
  
  RefWorks
5 A development framework for temporal data mining.
- + Show details - Hide details
- p. 87 –113 (27)
  
  Data mining is a field which potentially offers nonexplicitly stored knowledge for a particular application domain. In most application areas that have been studied for data mining, the time at which something happened is also known and recorded (e.g. the date and time when a point-of-sale transaction took place, or a patient's temperature was taken). Most existing approaches, however, take a static view of an application domain so that the discovered knowledge is considered to be valid indefinitely on the time line. If data mining is to be used as a vehicle for better decision making, the existing approaches will in most cases lead into not very significant or interesting results. Consider, for example, a possible association between butter and bread (i.e. people who buy butter also buy bread) among the transactions of a supermarket. If someone looks at all transactions that are available, say for the past ten years, that association might be-with a certain possibility-true. If, however, the highest concentration of people who bought butter and bread can be found up to five years ago, then the discovery of the association is not significant for the present and the future of the supermarket framework, an SQL-like mining language is also proposed. With this language, any temporal data-mining task can easily be expressed.
  Add to favourites
  
  Login to add to favourites
  
  Save links to your favourite articles.
  
  Export citations
  
  BibTEX
  
  Endnote
  
  Plain text
  
  RefWorks
6 An integrated architecture for OLAP and data mining.
- + Show details - Hide details
- p. 114 –136 (23)
  
  Data mining and online analysis processing (OLAP) are two complementary techniques for analysis of large amounts of data in data-warehouse environments to deal with decision-support queries. In this chapter we have examined the gap between these two techniques, and proposed a feedback sandwich model to combine OLAP and data mining. An integrated architecture has also been proposed. Our model and architecture differ from other proposals in that they take care of the overall process of OLAP and data mining, offer flexibility for both loosely-coupled and tightly-coupled combinations of OLAP and data mining (because no particular structure of the extended OLAP/data-mining engine is specified) and allow the feedback of discovered knowledge to enhance future OLAP/data mining. We hope that the opinions presented in this chapter will stimulate more research efforts on the integration of OLAP and data mining.
  Add to favourites
  
  Login to add to favourites
  
  Save links to your favourite articles.
  
  Export citations
  
  BibTEX
  
  Endnote
  
  Plain text
  
  RefWorks
Part II: Knowledge discovery and data mining in practice
7 Empirical studies of the knowledge discovery approach to health-information analysis.
- + Show details - Hide details
- p. 139 –159 (21)
  
  This chapter describes a number of empirical studies in the use of the data mining approach to the analysis of health information. The examples serve to highlight the factors perceived as influencing the success or otherwise of the data mining approach in each case and to illustrate the difficulties which may be encountered during the data mining process and how they may be overcome.
  Add to favourites
  
  Login to add to favourites
  
  Save links to your favourite articles.
  
  Export citations
  
  BibTEX
  
  Endnote
  
  Plain text
  
  RefWorks
8 Direct knowledge discovery and interpretation from a multilayer perceptron network which performs low-back-pain classification
- + Show details - Hide details
- p. 160 –179 (20)
  
  At some time in their lives 60-80 percent of the population will experience one episode of low-back pain (LBP), of whom 90 percent will get better within six to eight weeks without need for treatment or investigation. The remaining ten percent incur 70-90 percent of the medical costs arising from low-back pain and represent a challenge to health practitioners in providing an accurate diagnosis and successful management. The aim of this chapter is to discover the key inputs which a low-back pain multilayer perceptron network uses to classify selected training-case examples using a knowledge discovery method and to show how a rule can then be directly induced from each training example. Preliminary results are presented of the top-ranked key inputs which the LBP MLP uses to classify all training cases for each diagnostic class. It is shown how the validation of the top-ranked key inputs by the domain experts can lead to the validation of the LBP MLP network during both training and testing.
  Add to favourites
  
  Login to add to favourites
  
  Save links to your favourite articles.
  
  Export citations
  
  BibTEX
  
  Endnote
  
  Plain text
  
  RefWorks
9 Discovering knowledge from low-quality meteorological databases.
- + Show details - Hide details
- p. 180 –203 (24)
  
  Meteorological societies and universities worldwide frequently collect vast amounts of data from satellites and weather stations. Given a collection of datasets, we were asked to examine a sample of such data and look for patterns which may exist between certain geographical locations over time. The overall aim of the work is to generate a set of rules which can be used to predict certain grid squares a number of months in advance; these predictions can be used by meteorologists to make mid-long-term forecasts.
  Add to favourites
  
  Login to add to favourites
  
  Save links to your favourite articles.
  
  Export citations
  
  BibTEX
  
  Endnote
  
  Plain text
  
  RefWorks
10 A meteorological knowledge-discovery environment.
- + Show details - Hide details
- p. 204 –226 (23)
  
  Geophysical data is the most important material that meteorologists use to model the behaviour of the Earth's atmospheres and oceans. Although most research dedicated to explanation and prediction has been based on the application of a specific statistical or artificial intelligence technique, only a few endeavours have tackled the holistic nature of the subject. One possible way of approaching this target is the application of knowledge discovery techniques. The authors designed the meteorology and data mining environment (MADAME), which resulted in a promising platform for further research being carried out in this area. The work was motivated by a project in which the authors were involved, and which had the objective of establishing the feasibility of forecasting high-intensity rainfall over different areas of the territory of Hong Kong using data mining techniques with a view of improving the existing landslide warning system.
  Add to favourites
  
  Login to add to favourites
  
  Save links to your favourite articles.
  
  Export citations
  
  BibTEX
  
  Endnote
  
  Plain text
  
  RefWorks
11 Mining the organic compound jungle - a functional programming approach.
- + Show details - Hide details
- p. 227 –239 (13)
  
  The main aim of this work was to conduct a feasibility trial to determine whether data mining had the potential to provide an enabling technology for the pharmaceutical industry, to provide researchers with the capability of determining the common key characteristics of compounds that determine their functionality, irrespective of compound size. A secondary aim of this work was to investigate whether the powerful lazy evaluation of the functional programming language, Gofer, could be applied to this task, to which it would appear to be ideally suited.
  Add to favourites
  
  Login to add to favourites
  
  Save links to your favourite articles.
  
  Export citations
  
  BibTEX
  
  Endnote
  
  Plain text
  
  RefWorks
12 Data mining with neural networks - an applied example in understanding electricity consumption patterns.
- + Show details - Hide details
- p. 240 –303 (64)
  
  Data mining is the process of analysing data in order to extract useful information. Many techniques exist that can be used in the analysis of data, one of which is artificial neural networks. We explain what a neural network is and how one was used in analysing electricity consumption data from a utility in the UK.
  Add to favourites
  
  Login to add to favourites
  
  Save links to your favourite articles.
  
  Export citations
  
  BibTEX
  
  Endnote
  
  Plain text
  
  RefWorks
Back Matter
- + Show details - Hide details
- p. 304 (1)
  Add to favourites
  
  Login to add to favourites
  
  Save links to your favourite articles.
  
  Export citations
  
  BibTEX
  
  Endnote
  
  Plain text
  
  RefWorks

Knowledge Discovery and Data Mining

Buy e-book PDF

Front Matter

Login to add to favourites

Part I: Knowledge discovery and data mining in theory

1 Estimating concept difficulty with cross entropy.

Login to add to favourites

2 Analysing outliers by searching for plausible hypotheses.

Login to add to favourites

3 Attribute-value distribution as a technique for increasing the efficiency of data mining.

Login to add to favourites

4 Using background knowledge with attribute-oriented data mining.

Login to add to favourites

5 A development framework for temporal data mining.

Login to add to favourites

6 An integrated architecture for OLAP and data mining.

Login to add to favourites

Part II: Knowledge discovery and data mining in practice

7 Empirical studies of the knowledge discovery approach to health-information analysis.

Login to add to favourites

8 Direct knowledge discovery and interpretation from a multilayer perceptron network which performs low-back-pain classification

Login to add to favourites

9 Discovering knowledge from low-quality meteorological databases.

Login to add to favourites

10 A meteorological knowledge-discovery environment.

Login to add to favourites

11 Mining the organic compound jungle - a functional programming approach.

Login to add to favourites

12 Data mining with neural networks - an applied example in understanding electricity consumption patterns.

Login to add to favourites

Back Matter

Login to add to favourites

Related content