Google Search

Friday, June 27, 2008

Data Mining

What is Data Mining?
Too much data and not enough information — this is a problem facing many
businesses and industries. Most businesses have an enormous amount of data, with a
great deal of information hiding within it, but "hiding" is usually exactly what it is
doing: So much data exists that it overwhelms traditional methods of data analysis.
Data mining provides a way to get at the information buried in the data. Data mining
creates models to find hidden patterns in large, complex collections of data, patterns
that sometimes elude traditional statistical approaches to analysis because of the large
number of attributes, the complexity of patterns, or the difficulty in performing the
analysis.

What Is Data Mining in the Database?
Data mining projects usually require a significant amount of data collection and data
processing before and after model building. Data tables are created by combining
many different types and sources of information. Real-world data is often dirty, that is,
includes wrong or missing values; data must often be cleaned before it can be used.
Data is filtered, normalized, sampled, transformed in various ways, and eventually
used as input to data mining algorithms. Up to 80% of the effort in a data mining
project is often devoted to data preparation. When the data is stored as a table in a
database, data preparation can be performed using database facilities.
Data mining models have to be built, tested, validated, managed, and deployed in
their appropriate application domain environments. The data mining results may need
to be post-processed as part of domain specific computations (for example, calculating
estimated risks, expected utilities, and response probabilities) and then stored into
permanent databases or data warehouses.
Making the entire data mining process work in a reproducible and reliable way is
challenging; it may involve automation and transfers across servers, data repositories,
applications, and tools. For example, some data mining tools require that data be
exported from the corporate database and converted to the data mining tool’s format;
data mining results must be imported into the database. Removing or reducing these
obstacles can enable data mining to be utilized more frequently to extract more
valuable information and, in many cases, to make a significant impact on the
bottom-line of an enterprise. Data mining in the database makes the data movement
required by tools that do not operate in the database unnecessary and make it much
easier to mine up-to-date data. Also, the less data movement, the less time the entire
data mining process takes.
Data movement can make data insecure. If data never leaves the database, database
security protects the data.
In summary, data mining in the database provides the following benefits:
■ Less data movement
■ More data security
■ Up-to-date data

No comments: