Google Search

Sunday, August 31, 2008

SCHEMAS IN DATA WAREHOUSING

12.1 STAR SCHEMA

The star schema is perhaps the simplest data warehouse schema. It is called a star schema because the entity-relationship diagram of this schema resembles a star, with points radiating from a central table. The centre of the star consists of a large fact table and the points of the star are the dimension tables.

A star schema is characterised by one or more very large fact tables that contain the primary information in the data warehouse, and a number of much smaller dimension tables (or lookup tables), each of which contains information about the entries for a particular attribute in the fact table.

A star query is a join between a fact table and a number of dimension tables. Each dimension table is joined to the fact table using a primary key to foreign key join, but the dimension tables are not joined to each other.

A typical fact table contains keys and measures. In the example given below, the fact table, sales, contain the measures quantity_sold, amount, and cost, and the keys cust_id, time_id, prod_id, channel_id, and promo_id. The dimension tables are customers, times, products, channels, and promotions. The product dimension table, contains information about each product number that appears in the fact table.

A star join is a primary key to foreign key join of the dimension tables to a fact table.

The main advantages of star schemas are that they:
• Provide a direct and intuitive mapping between the business entities being analysed by end users and the schema design.
• Provide highly optimised performance for typical star queries.
• Are widely supported by a large number of business intelligence tools, which may anticipate or even require that the data-warehouse schema contain dimension tables

Star schemas are used for both simple data marts and very large data warehouses


12.2 SNOWFLAKE SCHEMA



Snowflake schemas normalize dimensions to eliminate redundancy. That is, the dimension data has been grouped into multiple tables instead of one large table.
The snowflake model is the result of decomposing one or more of the dimensions, which sometimes have hierarchies themselves.

For example, a product dimension table in a star schema might be normalized into a products table, a product_category table, and a product_manufacturer table in a snowflake schema. While this saves space, it increases the number of dimension tables and requires more foreign key joins. The result is more complex queries and reduced query performance. The figure below presents a graphical representation of a snowflake schema.

No comments: