What is Data Mining?
Too much data and not enough information — this is a problem facing many
businesses and industries. Most businesses have an enormous amount of data, with a
great deal of information hiding within it, but "hiding" is usually exactly what it is
doing: So much data exists that it overwhelms traditional methods of data analysis.
Data mining provides a way to get at the information buried in the data. Data mining
creates models to find hidden patterns in large, complex collections of data, patterns
that sometimes elude traditional statistical approaches to analysis because of the large
number of attributes, the complexity of patterns, or the difficulty in performing the
analysis.
What Is Data Mining in the Database?
Data mining projects usually require a significant amount of data collection and data
processing before and after model building. Data tables are created by combining
many different types and sources of information. Real-world data is often dirty, that is,
includes wrong or missing values; data must often be cleaned before it can be used.
Data is filtered, normalized, sampled, transformed in various ways, and eventually
used as input to data mining algorithms. Up to 80% of the effort in a data mining
project is often devoted to data preparation. When the data is stored as a table in a
database, data preparation can be performed using database facilities.
Data mining models have to be built, tested, validated, managed, and deployed in
their appropriate application domain environments. The data mining results may need
to be post-processed as part of domain specific computations (for example, calculating
estimated risks, expected utilities, and response probabilities) and then stored into
permanent databases or data warehouses.
Making the entire data mining process work in a reproducible and reliable way is
challenging; it may involve automation and transfers across servers, data repositories,
applications, and tools. For example, some data mining tools require that data be
exported from the corporate database and converted to the data mining tool’s format;
data mining results must be imported into the database. Removing or reducing these
obstacles can enable data mining to be utilized more frequently to extract more
valuable information and, in many cases, to make a significant impact on the
bottom-line of an enterprise. Data mining in the database makes the data movement
required by tools that do not operate in the database unnecessary and make it much
easier to mine up-to-date data. Also, the less data movement, the less time the entire
data mining process takes.
Data movement can make data insecure. If data never leaves the database, database
security protects the data.
In summary, data mining in the database provides the following benefits:
■ Less data movement
■ More data security
■ Up-to-date data
Google Search
Friday, June 27, 2008
Saturday, June 21, 2008
What is OLAP, MOLAP, ROLAP, DOLAP, HOLAP? Examples?
OLAP - On-Line Analytical Processing.
Designates a category of applications and technologies that allow the collection, storage, manipulation and reproduction of multidimensional data, with the goal of analysis.
MOLAP - Multidimensional OLAP.
This term designates a cartesian data structure more specifically. In effect, MOLAP contrasts with ROLAP. Inb the former, joins between tables are already suitable, which enhances performances. In the latter, joins are computed during the request.
Targeted at groups of users because it's a shared environment. Data is stored in an exclusive server-based format. It performs more complex analysis of data.
DOLAP - Desktop OLAP.
Small OLAP products for local multidimensional analysis Desktop OLAP. There can be a mini multidimensional database (using Personal Express), or extraction of a datacube (using Business Objects).
Designed for low-end, single, departmental user. Data is stored in cubes on the desktop. It's like having your own spreadsheet. Since the data is local, end users don't have to worry about performance hits against the server.
ROLAP - Relational OLAP.
Designates one or several star schemas stored in relational databases. This technology permits multidimensional analysis with data stored in relational databases.
Used for large departments or groups because it supports large amounts of data and users.
HOLAP:Hybridization of OLAP, which can include any of the above.
Designates a category of applications and technologies that allow the collection, storage, manipulation and reproduction of multidimensional data, with the goal of analysis.
MOLAP - Multidimensional OLAP.
This term designates a cartesian data structure more specifically. In effect, MOLAP contrasts with ROLAP. Inb the former, joins between tables are already suitable, which enhances performances. In the latter, joins are computed during the request.
Targeted at groups of users because it's a shared environment. Data is stored in an exclusive server-based format. It performs more complex analysis of data.
DOLAP - Desktop OLAP.
Small OLAP products for local multidimensional analysis Desktop OLAP. There can be a mini multidimensional database (using Personal Express), or extraction of a datacube (using Business Objects).
Designed for low-end, single, departmental user. Data is stored in cubes on the desktop. It's like having your own spreadsheet. Since the data is local, end users don't have to worry about performance hits against the server.
ROLAP - Relational OLAP.
Designates one or several star schemas stored in relational databases. This technology permits multidimensional analysis with data stored in relational databases.
Used for large departments or groups because it supports large amounts of data and users.
HOLAP:Hybridization of OLAP, which can include any of the above.
BI Tools
Types of business intelligence tools
The key general categories of business intelligence tools are:
Spreadsheets[1]
Reporting and querying software - are tools that extract, sort, summarize, and present selected data
OLAP
Digital Dashboards
Data mining
Process mining
Business performance management
Except for spreadsheets, these tools are sold as standalone tools, suites of tools, components of ERP systems, or as components of software targeted to a specific industry. The tools are sometimes packaged into data warehouse appliances.
Open Source and Free Business Intelligence Products
Freereporting.com: Free Web-based BI software application by LogiXML
Eclipse BIRT Project: Eclipse-based open source reporting for web applications, especially those based on Java EE.
OpenI: simple web application that does OLAP reporting
Palo (OLAP database): Memory-based OLAP Server (MOLAP) with interface to Microsoft Excel, .NET, PHP, Java and C++
Pentaho: enterprise-class reporting, analysis, dashboard, data mining and workflow capabilities
RapidMiner (formerly YALE): open-source software for intelligent data analysis, knowledge discovery, data mining, predictive analytics, and machine learning useful for business intelligence applications.
SpagoBI: a Business Intelligence Free Platform which uses many FOSS tools as analytical engines, integrating them in an infrastructure which offers a cross-operativeness and a consistent vision between Report,OLAP,Data Mining,Dashboard and over the DWH.
The key general categories of business intelligence tools are:
Spreadsheets[1]
Reporting and querying software - are tools that extract, sort, summarize, and present selected data
OLAP
Digital Dashboards
Data mining
Process mining
Business performance management
Except for spreadsheets, these tools are sold as standalone tools, suites of tools, components of ERP systems, or as components of software targeted to a specific industry. The tools are sometimes packaged into data warehouse appliances.
Open Source and Free Business Intelligence Products
Freereporting.com: Free Web-based BI software application by LogiXML
Eclipse BIRT Project: Eclipse-based open source reporting for web applications, especially those based on Java EE.
OpenI: simple web application that does OLAP reporting
Palo (OLAP database): Memory-based OLAP Server (MOLAP) with interface to Microsoft Excel, .NET, PHP, Java and C++
Pentaho: enterprise-class reporting, analysis, dashboard, data mining and workflow capabilities
RapidMiner (formerly YALE): open-source software for intelligent data analysis, knowledge discovery, data mining, predictive analytics, and machine learning useful for business intelligence applications.
SpagoBI: a Business Intelligence Free Platform which uses many FOSS tools as analytical engines, integrating them in an infrastructure which offers a cross-operativeness and a consistent vision between Report,OLAP,Data Mining,Dashboard and over the DWH.
History of BI
Prior to the start of the Information Age in the late 20th century, businesses had to collect data from non-automated sources. Businesses then lacked the computing resources necessary to properly analyze the data, and as a result, companies often made business decisions primarily on the basis of intuition.
As businesses automated systems the amount of data increased but its collection remained difficult due to the inability of information to be moved between or within systems. Analysis of information informed for long-term decision making, but was slow and often required the use of instinct or expertise to make short-term decisions. Business intelligence was defined in an October 1958 IBM Journal article by Hans Peter Luhn.[1] Luhn wrote,
In this paper, business is a collection of activities carried on for whatever purpose, be it science, technology, commerce, industry, law, government, defense, et cetera. The communication facility serving the conduct of a business (in the broad sense) may be referred to as an intelligence system. The notion of intelligence is also defined here, in a more general sense, as "the ability to apprehend the interrelationships of presented facts in such a way as to guide action towards a desired goal."
In modern businesses the use of standards, automation and specialized software allows large volumes of information to be warehoused, extracted, transformed and loaded to greatly increase the speed at which data becomes available, including the use of online tools.
In 1989 Howard Dresner, later a Gartner Group analyst, popularized BI as an umbrella term to describe a set of concepts and methods to improve business decision-making by using fact-based decision support systems
As businesses automated systems the amount of data increased but its collection remained difficult due to the inability of information to be moved between or within systems. Analysis of information informed for long-term decision making, but was slow and often required the use of instinct or expertise to make short-term decisions. Business intelligence was defined in an October 1958 IBM Journal article by Hans Peter Luhn.[1] Luhn wrote,
In this paper, business is a collection of activities carried on for whatever purpose, be it science, technology, commerce, industry, law, government, defense, et cetera. The communication facility serving the conduct of a business (in the broad sense) may be referred to as an intelligence system. The notion of intelligence is also defined here, in a more general sense, as "the ability to apprehend the interrelationships of presented facts in such a way as to guide action towards a desired goal."
In modern businesses the use of standards, automation and specialized software allows large volumes of information to be warehoused, extracted, transformed and loaded to greatly increase the speed at which data becomes available, including the use of online tools.
In 1989 Howard Dresner, later a Gartner Group analyst, popularized BI as an umbrella term to describe a set of concepts and methods to improve business decision-making by using fact-based decision support systems
What is BI?
The term business intelligence (BI) refers to technologies, applications and practices for the collection, integration, analysis, and presentation of business information and also sometimes to the information itself. The purpose of business intelligence is to support better business decision making. It dates to 1958.[1] D. J. Power explains in "A Brief History of Decision Support Systems,"[2]
BI describes a set of concepts and methods to improve business decision making by using fact-based support systems. BI is sometimes used interchangeably with briefing books, report and query tools and executive information systems. Business Intelligence systems are data-driven DSS.
BI systems provide historical, current, and predictive views of business operations, most often using data that has been gathered into a data warehouse or a data mart and occasionally working from operational data. Software elements support the use of this information by assisting in the extraction, analysis, and reporting of information. Applications tackle sales, production, financial, and many other sources of business data for purposes that include, notably, business performance management. Information may be gathered on comparable companies to produce benchmarks.
BI describes a set of concepts and methods to improve business decision making by using fact-based support systems. BI is sometimes used interchangeably with briefing books, report and query tools and executive information systems. Business Intelligence systems are data-driven DSS.
BI systems provide historical, current, and predictive views of business operations, most often using data that has been gathered into a data warehouse or a data mart and occasionally working from operational data. Software elements support the use of this information by assisting in the extraction, analysis, and reporting of information. Applications tackle sales, production, financial, and many other sources of business data for purposes that include, notably, business performance management. Information may be gathered on comparable companies to produce benchmarks.
Subscribe to:
Posts (Atom)