Is Data Modeling Dead? The Answer May Surprise You | 7wData

Is Data Modeling Dead? The Answer May Surprise You | 7wData

An article on Substack entitled The Death of data modeling has aroused heated discussions among data practitioners. According to the article, many of those data practitioners feel that the process is no longer suitable for today’s needs. The author suggests that this is because data modeling has been overtaken by “the proliferation of Agile, the shift to engineering-lead organizations, and implementation friction.” But reports of the death of data modeling are premature. It has not died but, like most useful things in tech, it has evolved.



In response, this article elaborates on why people think data modeling is dead and looks at the pain points and challenges enterprises face with the old approach to data modeling. The concept of Data Modeling 2.0 is introduced, and then we then explain how this new generation of data modeling can help enterprises overcome the challenges associated with traditional data modeling to achieve more governed and outstanding self-service data analysis.



At its core, data modeling is the organization of an enterprise’s data and the creation of models that make it easy for its users to consume in a meaningful and consistent way. However, the chaos attending the use of data models has affected the delivery of expected results as the metrics employed by business users are often ambiguous and unaligned.



This chaos has caused many CIOs to grow anxious about the effects of traditional data models as their organizations and data stores scale. For example, if a company uses a medium-sized business intelligence (BI) or data warehouse, it will soon be flooded with aggregation tables since different ETL tasks and aggregation tables will be created by each engineering-lead team. Correspondingly, hundreds or thousands of tables will need to be created; and if each report has more than ten metrics, there will be tens or even hundreds of thousands of business metrics with no assurance of consistent definitions, or that the data is even being used.



For analysts, the traditional data model delivery process has a long time-to-insight (TTI) cycle, and can only be used to analyze the already developed, known data, rather than gain insight into an unknown area, which is of little business help. For engineers, it means a lot of repetitive development work, high development overhead, and a low sense of achievement.



The traditional delivery mode centered on data models usually requires a series of development processes: requirement analysis, data sourcing, ETL pipeline implementation, dashboard creation, and user acceptance testing (UAT). For example, this data delivery process typically takes 12 business days for a real company. That is unacceptable by today’s on-demand standards.



Besides, as data engineers usually need to serve more than one team, they lack sufficient understanding of the business behind the data models, nor do they have enough manpower to keep up with changing business needs. This leads to a long development and delivery cycle, which also means the business team needs to get involved deeply to make sure data requirements go live properly.



Business users are also in an awkward position. When they need the data, they do not know where to find trusted data–or even if there is trusted data available–which greatly impacts their productivity. As more data tables are produced by different data teams, a large number of tasks and aggregation tables in these tables are not properly governed, and data consumers (business users) soon don’t know which data should be used to answer their questions.

Images Powered by Shutterstock