Before building a data warehouse, you must conduct a comprehensive business survey to determine relevant objectives and requirements. You need to know the real business requirements and the issues that can be resolved by each business system.
Business survey
As the foundation of building a data warehouse, the in-depth business survey and requirement analysis determine whether the data warehouse can be built successfully. Before building a data warehouse, you need to invite relevant business staff to introduce the specific business. In this way, you can make clear the requirements of analysts and operational staff in each team, and accordingly compose relevant documents.
You can acquire the following information in detail through surveys and interviews:
The organizational structure and division of labor of users. For example, users may come from different departments, such as the data analysis, operations, and maintenance departments. The requirements for the data warehouse vary with the department. Therefore, you need to conduct surveys on different departments.
The overall business architecture of users, the relationships between business modules, and the process of data flow. You need to sort out the overall business data framework.
The key features of various existing business systems and their collected data.
This tutorial uses the e-commerce business of Company A as an example to show how to sort out the business data framework. Company A has four e-commerce business units: investment promotion, supply chain, marketing, and service. The requirements and data application vary with the business unit. Before building a data warehouse, you must specify the target business units and the business requirements to be met.
In addition, you need to learn more about the existing features of each business unit. A feature is tightly coupled with a business unit and corresponds to one or more tables. It serves as a data source for building a data warehouse. The following table lists the features of the marketing business unit.
Feature | E-commerce marketing management of Company A |
Product management | Yes |
Customer management | Yes |
Purchase process | Yes |
Transaction order | Yes |
Customer feedback | Yes |
Yes indicates that the feature is included, whereas No indicates that the feature is excluded.
This tutorial assumes that users are marketing data analysts in the e-commerce marketing department. The required data includes the total sales of a category (such as kitchenware) of the last day in each province, the names of products whose sales rank top 10 in this category, and the purchasing power distribution (per capita consumption) of customers in each province. These types of data are used for marketing analysis. By means of marketing analysis, the final business requirements are to carry out precision marketing for products of this category and increase the total sales. Based on the business survey, this tutorial focuses on analyzing the transaction order feature of the marketing business unit.
Requirement analysis
If the requirements of data analysts and business operational staff are not taken into account, the data warehouse that is built merely based on the business survey may not be easy to use. After completing the business survey, you need to further collect the requirements of data users for in-depth research and analysis.
You can analyze requirements in the following ways:
Communicate with analysts and business operational staff to obtain their requirements.
Research and analyze the existing reports in the report system.
In the requirement analysis phase, you need to sort out the metrics in business analysis results or reports, and clarify the definition and statistic granularity of each metric. A statistic granularity can be used as the input of a dimension. We recommend that you think about the following questions and take them into account during subsequent data modeling:
What dimensions and statistic granularities are used to aggregate business data? What are the standards for measurement? For example, the trading volume is a dimension, and the order quantity is a measure.
How do I design the data warehouse detail (DWD) layer and the data warehouse service (DWS) layer? How do I design the dimension data (DIM) layer? Are there common metrics?
Does data need to be backed up and integrated into the DWS layer?
For example, data analysts need to know the turnover of the kitchenware category in the e-commerce business of Company A. After obtaining this requirement, you need to analyze and specify the dimensions, measures, and statistic granularities for aggregating data. For example, you can specify the category as a dimension, the turnover as a measure, and the entire table as a statistic granularity. In addition, you need to consider how to design detail data and summary data, whether the data is included in a report at the DIM layer, and whether the data needs to be integrated into an aggregate table.
The analysis results of the requirement survey are usually presented in the documents that record atomic metrics and derived metrics.