A Data cube is a way of modeling a relational database data warehouse for rapid reporting and analytical querying. End users can rapidly and efficiently query data cubes to answer business or research questions.
OLTP vs. OLAP
There are two main formats for data modeling: Online Transaction Processing (OLTP) and Online Analytical Processing (OLAP). OLTP data modeling focuses on ensuring rapid storage of data and data integrity. Accordingly, OLTP data models are highly normalized and thus very little or no data is stored in multiple places (data redundancy). OLAP data modeling, on the other hand, is optimized for rapid reads, and in order to facilitate that, data models are highly de-normalized and thus have lots of data redundancy.
Building a Data Cube: Measures and Dimensions
An OLAP data cube consists of a couple of basic building blocks: measures (sometimes also called “facts”) and dimensions. Measures are raw data that analysts are interested in: sales numbers, employee counts, or numbers of manufacturing defects. If it’s countable, it can be a measure. Dimensions are the various attributes that are associated with the measures. Dimensions can be time-based (e.g. day, hour, month), location based (e.g. region or retail store), business-based (e.g. company division, product line), or anything else that’s relevant to a measure.
Each record in the data cube’s fact table consists of all the dimensions and all the measures associated with the dimensions. So for an example of a retail sales data fact table, each record might store:
A fact table set up like this would allow an analyst to rapidly answer questions like:
Additional facts can be added to a fact table as long as they have the same dimensions. For example, the hypothetical business in this case could add employee expenses by store by hour, or profitability metrics by store by hour if those data points exist.
Astute readers will notice that this fact table has lots of data redundancy. For example, any given store will be assigned to one state and one region only. So, state and region are derivable from the store number. But adding the store state and store region to the cube allows rapid analysis of states and regions. There are different ways to manage dimensions, including keeping dimensions in hierarchies. Each approach has different pros and cons.
Loading a Data Cube
Data cubes need to be loaded from OLTP systems. Tools to move data are called Extract, Transform and Load (ETL) tools, and there are a number of them on the market. The two main ways to load data cubes are deleting all the data from the data cube and reload it regularly from the OLTP (sometimes called “truncate and load”) or adding new data to the cube and modifying any existing data as needed (sometimes called “delta load”). Truncate and load is going to be quick to program and less prone to data errors, but the load itself will be time-consuming, potentially leading to large periods of time when the data is totally unavailable. Delta loads take longer to program and troubleshooting data errors can be difficult, but the data is always available and each load is less resource-intensive than a truncate and load.
Usage of Data Cubes
Various tools for exploring data cubes exist in the marketplace. Most commercial reporting tools will have the ability to query a data cube and provide an end user with an intuitive interface for interacting with the data. Custom tools for this kind of exploration of data cubes can also be built using, for example, Oracle’s OLAP Java API.
Data cubes are best suited for environments where analysts are asking open-ended questions about the data and using the answers to those questions to lead them down new analytical routes. They do not excel at providing regular canned reports, though they can perform that function. They are, however, excellent for rapid prototyping of new reports
Data cubes are a powerful and flexible way to enable businesses to answer questions about the business. Data cubes store data in a format that allows end users to rapidly sum business measures across different facets of the business.
Smartsourcing is a brief guide to the world of modern technology partnerships. It was developed through a collaborative effort of top Zymr executives as we uncovered a gap in the market between the perception of what outsourcing used to be, and how leading technology innovators are leveraging this globalized approach to value generation. Read this guide to learn...