Data Warehouse Modeling: Star Schema vs. Snowflake Schema
Some examples of data warehouse modeling are the Entity Relationship modeling that consists of multiple fact tables that use one or more dimension tables. PDF | Star schema, which maintains one-to-many relationships between dimensions and The fact table contains thousands, or even millions of rows . connect multiple diagnoses to a fact table [KRRT98]. relationship between two entities. A commonly used tool, entity-relationship diagramming is principally enlisted in translation into star schema, if a set of transformation rules or an algorithm to . Families of star schemas occur when there are multiple fact tables of different.
Once again, visually the snowflake schema reminds us of its namesake, with several layers of dimension tables creating an irregular snowflake-like shape.
Normalization As mentioned, normalization is a key difference between star and snowflake schemas. Regarding this, there are a couple of things to know: Snowflake schemas will use less space to store dimension tables. This is because as a rule any normalized database produces far fewer redundant records.
Denormalized data models increase the chances of data integrity problems. These issues will complicate future modifications and maintenance as well. To experienced data modelers, the snowflake schema seems more logically organized than the star schema. This is my personal opinion, not a hard fact.
Query Complexity In our first two articles, we demonstrated a query that could be used on the sales model to get the quantity of all phone-type products sold in Berlin stores in The star schema query looks like this: Because the dimension tables are normalized, we need to dig deeper to get the name of the product type and the city. We have to add another JOIN for every new level inside the same dimension. In the star schema, we only join the fact table with those dimension tables we need.
Joining two tables takes time because the DMBS takes longer to process the request. There is a better possibility that data will be physically closer on the disk if it lives inside the same table.
Basically, a query ran against a snowflake schema data mart will execute more slowly. Speeding Things Up To speed up reporting, we can: Aggregate data to the level we need in reports. This will compress the data significantly. Only give users the data they need for analysis and reports. Which Should You Use? Consider using the snowflake schema: As the warehouse is Data Central for the company, we could save lot of space this way.
When dimension tables require a significant amount of storage space. In most cases, the fact tables will be the ones that take most of the space. For instance, the dimension tables could contain a lot of redundant-but-needed attributes. In our example, we used the city attribute to describe the city where the store is located. Note the zero-to-many relationship on the child table. Therein lies the problem. Drill-down incompleteness Moving from left to right, we drill into the data values of all the sectors.
When we look at the data, we see that the minimal date value on the parent sector table is different than the minimal date value on the child department table. This is because treasury values are not decomposed in the department table as other sectors are.
- Navigation menu
We need to add a default value for the parent in the child entity and create a connection between them. In our example, we could add a default treasury value in the department dimension. Another solution — the more common one — is to add a default value for all unallocated sectors. Roll-Up Incompleteness Roll-up incompleteness is the reverse of drill-down incompleteness. Moving from a more granular child dimensional table to a less granular parent table, we get a smaller total. This total indicates that not all child values are allocated to parent values.
Example The fact table is connected to the child table in a typical snowflake schema. In this case, we have two dimensional tables: Again, this is the problem. And again although this is a valid modeling construct, it opens the door to errors. Moving from left to right, we go from a finer grain to a coarser grain.
We roll up the data. Notice the difference in total values on different grain levels. How to Resolve Roll-Up Incompleteness As roll-up and drill-down incompleteness are mirror images of each other, so are their solutions.
We need to add a default value for the child entity to the parent entity, with a connection between them. In this example, it would be an unallocated category value. Non-Strict Dimension Relationships This is another issue that we can easily identify. Unlike roll-up and drill-down incompleteness, a non-strict dimensional relationship problem is pure design error.
Star schema - Wikipedia
Watch for it when there are many-to-many relationships in the model: The Vertabelo modeling tool does not allow many-to-many relationships. A good rule of thumb in dimensional modeling and modeling in general is to avoid many-to-many relationships.
In this case, I will display the model with two one-to-many relationships. We have a two dimensional tables: The relationship is many-to-many because each month can have many weeks and one week can be in two months. Non-strict Incompleteness We represent the weeks of a year as a sequence of numbers and months in a year as a list of names. The sum of data of sales in weeks is different than the sum of sales in months because there is an overlap in some weeks. This happens when a week falls into two months or when certain months are not in our data period March.
The data displayed is correct, but it is not roll-upable. How to Resolve Non-Strict Dimension Relationships We solve this error by placing the dimensions in different hierarchies. If we look at the model for this solution, we see two hierarchies which are independent of one and other. We mitigated the roll-up operation from week to month. For example, in the above case we would define one major category — the major parent — out of many categories.
Dimension-Fact Summarizability Problems Dimension-fact summarizability problems are found in operations between fact and dimensional tables. Like dimensional summarizability problems, they are evident in the erroneous cardinalities of summarized data. When looking at dimension-fact summarizability problems, we commonly see two modeling anti-patterns. The first relates to the joining of incomplete dimensional table data for all fact table values.
The second relates to a non-strict relationship between values in fact and dimensional tables. Incomplete Dimension-Fact Relationships Incomplete dimension-fact relationship problems manifest themselves in join operations between fact and dimensional tables.
They occur when the fact table contains measures with no corresponding value in the dimensional table. Summary calculations on the fact table vary depending on the dimensional tables we are using for our calculation. You may have already noticed the incomplete relation to the customer table.
As with the incomplete dimensional model, therein lies the problem. In the first scenario, we must display the sum of all balances for customers on a monthly grain.