~ Office Supplies ~~ Buy Posters ~~ A-Z Products ~~ Website Advertising


Data mining - Wikipedia

<<Up     Contents

Data mining

Data mining is the practice of searching large stores of data for patterns. Used in the technical context of data warehousing it is neutral. However, it also has a wider, more pejorative usage that implies imposing patterns (and particularly causal relationships) on data where none exist.

Data mining has been defined as "The nontrivial extraction of implicit, previously unknown, and potentially useful information from data" [1].

It is also known as knowledge-discovery in databases (KDD).

Used in this sense, "data mining" implies scanning the data for any relationships, and then when one is found coming up with an interesting explanation. The problem is that large data sets invariably happen to have some exciting relationships peculiar to that data. Therefore any conclusions reached by data mining are likely to be highly suspect. In spite of this some exploratory data work is always required in any applied statistical analysis to get a feel for the data, so sometimes the line between good statistical practice and data mining is less than clear.

Here is an example. The insurance industry has found that people with good credit records tend to be more likely to make car insurance claims, and have therefore modified their pricing. While this appears to be a legitimate finding, politicians in the United States have queried its legitimacy, on the 'common-sense' grounds that how a person handles their credit card doesn't affect how they handle a car. So a finding that is statistically legitimate might not hold up to public scrutiny.

A more significant danger is finding correlations that do not really exist. An example of this is found at the investment website The Motley Fool[?]. In the late 1990s the website had a suggested investment portfolio known as the Foolish Four, which was based on a data mining analysis of trends in the stock market. Further research in the early 2000s has highlighted that the correlations they found were an artifact of the particular data set they used, rather than reflecting reality. This experience is one of many similar false findings linked to the stock market.

There are also privacy concerns associated with data mining. For example, if an employer has access to medical records, they may screen out people with diabetes or have had a heart attack. Screening out such employees will cut costs for insurance, but it creates ethical and legal problems.

There are many legitimate uses of data mining. For example, a database of all prescription drugs taken by people can be used to find combinations of drugs with an adverse reaction. Since the combination may occur only in 100 people and the reaction in 10 of them, a single case may not raise a red flag. Such a database could find reactions and save lives. However, there is huge potential for abuse of such a database.

Basically, data mining gives information that wouldn't be available otherwise. It must be properly interpreted to be useful. When the data collected involves individual people, there are many questions concerning privacy, legality, and ethics.

See Also

[1] W. Frawley and G. Piatetsky-Shapiro and C. Matheus, Knowledge Discovery in Databases: An Overview. AI Magazine , Fall 1992, pgs 213-228.


Note: if you got here by looking for the rapper KDD, see KDD (rapper)[?].

wikipedia.org dumped 2003-03-17 with terodump




 
 
4 carat RARE Honey Yellow Gold BASTNAESITE Facet Cab Rough Golden gemstone Faceting gem jewel PRETTY
 4 carat RARE Honey Yellow Gold BASTNAESITE Cab Golden ing jewel PRETTY 
 
29 carats Tigereye red yellow gold TigerIron Jasper gem tumble polished Cabbing cab tiger iron rough
 29 carats Tigereye red yellow gold TigerIron Jasper tumble polished Cabbing cab tiger iron  
 
35 carat grape STICHTITE gemstone Cabbing lapidary tumble polished rough gem stone jewelry 7 gram 2
 35 carat grape STICHTITE Cabbing lapidary tumble polished jewelry 7 gram 2 
 
33 carats blue Tanzanite zoisite rough crystal specimen jewelry cabbing gemstone 6 grams lot parcel
 33 carats blue Tanzanite zoisite crystal specimen jewelry cabbing 6 grams lot parcel 
 
Brilliant WHITE Clear TOPAZ jewel Loose natural cabochons cut polished jewelry gemstones 4x2 half mm
 Brilliant WHITE Clear TOPAZ jewel Loose cabochons cut polished jewelry 4x2 half mm