zymr-data-cubes-a-tool-for-business

A Data cube is a way of modeling a relational database data warehouse for rapid reporting and analytical querying. End users can rapidly and efficiently query data cubes to answer business or research questions.

OLTP vs. OLAP

There are two main formats for data modeling: Online Transaction Processing (OLTP) and Online Analytical Processing (OLAP). OLTP data modeling focuses on ensuring rapid storage of data and data integrity. Accordingly, OLTP data models are highly normalized and thus very little or no data is stored in multiple places (data redundancy). OLAP data modeling, on the other hand, is optimized for rapid reads, and in order to facilitate that, data models are highly de-normalized and thus have lots of data redundancy.

Building a Data Cube: Measures and Dimensions

An OLAP data cube consists of a couple of basic building blocks: measures (sometimes also called “facts”) and dimensions. Measures are raw data that analysts are interested in: sales numbers, employee counts, or numbers of manufacturing defects. If it’s countable, it can be a measure. Dimensions are the various attributes that are associated with the measures. Dimensions can be time-based (e.g. day, hour, month), location based (e.g. region or retail store), business-based (e.g. company division, product line), or anything else that’s relevant to a measure.

Each record in the data cube’s fact table consists of all the dimensions and all the measures associated with the dimensions. So for an example of a retail sales data fact table, each record might store:

  • Sales (measure)
  • Date (dimension)
  • Day of week (dimension)
  • Hour of day (dimension)
  • Store number (dimension)
  • Store state (dimension)
  • Store region (dimension)

A fact table set up like this would allow an analyst to rapidly answer questions like:

  • “Which hour of the week has the most revenue, on average?”
  • “Which store produced the most revenue last month?”
  • “What are the top three regions for revenue on Wednesdays so far this year?”

Additional facts can be added to a fact table as long as they have the same dimensions. For example, the hypothetical business in this case could add employee expenses by store by hour, or profitability metrics by store by hour if those data points exist.

Astute readers will notice that this fact table has lots of data redundancy. For example, any given store will be assigned to one state and one region only. So, state and region are derivable from the store number. But adding the store state and store region to the cube allows rapid analysis of states and regions. There are different ways to manage dimensions, including keeping dimensions in hierarchies. Each approach has different pros and cons.

Loading a Data Cube

Data cubes need to be loaded from OLTP systems. Tools to move data are called Extract, Transform and Load (ETL) tools, and there are a number of them on the market. The two main ways to load data cubes are deleting all the data from the data cube and reload it regularly from the OLTP (sometimes called “truncate and load”) or adding new data to the cube and modifying any existing data as needed (sometimes called “delta load”). Truncate and load is going to be quick to program and less prone to data errors, but the load itself will be time-consuming, potentially leading to large periods of time when the data is totally unavailable. Delta loads take longer to program and troubleshooting data errors can be difficult, but the data is always available and each load is less resource-intensive than a truncate and load.

Usage of Data Cubes

Various tools for exploring data cubes exist in the marketplace. Most commercial reporting tools will have the ability to query a data cube and provide an end user with an intuitive interface for interacting with the data. Custom tools for this kind of exploration of data cubes can also be built using, for example, Oracle’s OLAP Java API.

Data cubes are best suited for environments where analysts are asking open-ended questions about the data and using the answers to those questions to lead them down new analytical routes. They do not excel at providing regular canned reports, though they can perform that function. They are, however, excellent for rapid prototyping of new reports

Data cubes are a powerful and flexible way to enable businesses to answer questions about the business. Data cubes store data in a format that allows end users to rapidly sum business measures across different facets of the business.

This article was authored by Hardik Shah, who is a Software Architect at Zymr.

Recent Posts

Popular Posts

0 comments

Leave a Reply

© 2017, Zymr, Inc. All Rights Reserved.