The Data Capability Maturity Model: Part 1 - Domain Modeling

In this series of articles, we develop a capability maturity model for a data initiative and ultimately for a data-driven company.  In this first article, we will focus on the domain model, which is the language we need in order to have conversations about the data.  Upon reading this article, you will understand the historical context and a have a powerful tool to assess the practice of domain modeling in your organization.

Review

In our last post, we discussed several key elements of a Data Initiative:

  1. Planning

    1. Strategy

    2. Requirements

    3. Architecture

    4. Domain Model

    5. Methodology / Life Cycle
       

  2. Engineering

    1. Data Pipelines

    2. Data Warehousing

    3. Data Analytics

    4. [Reporting and Presentation]
       

  3. Operations

    1. Event Logging

    2. Alerting

    3. Metrics

    4. Validation

    5. Watchdog

As you may have experienced, not every initiative addresses these concerns equally well.  Its understandable that project teams focus on areas of interest and expertise.  However,  inadequately addressing any one of these areas will prevent the team from realizing the required business value or ROI.  What we need is a tool to be able to quickly assess the current state of the initiative or organization and to guide the team along a path of improvement.

Enter the Rubric

In these situations, I like to use a rubric to describe and measure quality, clarify expectations and improve practices.  A well known example of a rubric in action is the Software Engineering Institute’s Capability Maturity Model (CMM).   The CMM defines a rubric based on 5 levels of maturity:

  1. Initial (chaotic, ad hoc, individual heroics) - the starting point for use of a new or undocumented repeat process.

  2. Repeatable - the process is at least documented sufficiently such that repeating the same steps may be attempted.

  3. Defined - the process is defined/confirmed as a standard business process

  4. Capable - the process is quantitatively managed in accordance with agreed-upon metrics.

  5. Efficient - process management includes deliberate process optimization/improvement.

The CMM, while rigorous, is often criticized for being too bureaucratic for many agile development teams.  In defense of the CMM, I believe this reputation is largely due to the way in which organizations have chosen to interpret and apply the model.  The CMM itself is not prescriptive in the approach used to achieve the standards. However, the levels themselves do assume that a quantitative continuous improvement model is the ultimate goal.  In agile philosophy, we put more emphasis on principles than process. So, we’ll use this approach more loosely to map out levels of expectations in each of these areas, but we won’t adhere to the CMM’s definition of levels.

Rubric for Domain Modeling

Here is my rubric for assessing the practice of domain modeling:

Level Area of Concern:
Domain Modeling
1 - Initial No defined requirements or practices for ensuring consistent use of language across team, customers, and stakeholders
2 - Repeatable A data dictionary and relational model is maintained that guides the organization and meaning of the data
3 - Defined The data dictionary captures existing variation in language and breadth across organization. Conflicts exist.
4 - Capable Organization has an ongoing conversation or practice to resolve naming discrepancies and promote ubiquitous language
5 - Efficient A well-established nomenclature or professional discourse exists that sets the expectations for communication in the domain. Its widely accepted within the organization

Let’s take a step back and look at how a simple rubric like this helps explain common breakdowns in a data initiative.  

The Old Order ... of the 90's

First, despite the advancement of technology,  the idea of a data-driven organization with an ubiquitous language is not a new one.  Arguably, in some ways, corporate America was at its pinnacle here in the mid 90’s and is now struggling to regain this ground, albeit at much higher rates of productivity.  

Why is this?  In the mid 90’s, corporations had successfully adopted sophisticated ERP systems that automated business practices which had been reasonably stable for decades.  These businesses were data-driven in the sense that much of revenue growth was driven by financial management based on GAAP and cost accounting practices. The professional discourses of Accounting and Finance provided standards and conventions for collecting, organizing, and analyzing data to make business decisions. These were taught with a high degree of uniformity at universities across the country and around the world and embodied within their business function in every company. As a result, the productive conversation was about the data, not about how to talk about the data, productivity increased and costs dropped. MIS could draw on a large and reasonably interchangeable labor pool that spoke the same language.

The Warring States Period

Then what happened?  All the balls have been thrown up in the air. The business models changed, the volume and breadth of the data changed, the technology changed and the language ballooned and morphed.  I like to call this the warring states period, because it parallels the 200-year period of general chaos in ancient China that preceded a unified state.

As a result,  we are now dealing with business processes that generate much higher economic output per worker (productivity), but frequently suffer from ineffective and inefficient interactions.  More time is spent talking about how to talk about the data. More time is spent in mis-understanding between customers, departments and team members.  This in turn equates to increases in cost, complexity and timeframes to collect, organize and analyze data. And, in turn, to a gap in expectations.

There is good and bad news here.  The bad news is that the warring states period is not going to end anytime soon.  The rate of change is only increasing. Revenue growth is actually to be found in embracing and exploiting this very complexity.  This is the good news. This is what it means now to be a data-driven company. The company that can harness data in a continuously changing environment is able to compete and exploit new opportunities, while maintaining a backbone of stability.  

Building the Domain Model

If we yield to the above chaos, we find ourselves at Level 1. Data is organized according to the way the person working with it sees fit. And, it's up to the next person to figure it out if they too would like to use the data. We see this with different power-users creating different versions of similar spreadsheets with different numbers.

The next step is to build, publish and share a data dictionary and relational model that explains where the data is and how it is organized.  This is Level 2.  A forward-thinking analyst or engineer has taken the initiative.  But, some people aren’t listening! They aren’t reading the doc. And the doc doesn’t reflect the way they think about the data.  We see this when we realize the data is there, but its not really there for all users. It isn’t seen by all users. The spreadsheets continue.

In Level 3, we are starting to formalize the conversation about how to talk about the data.  We are bringing different users, analysts and developers together and starting to understand the breadth of language and the source of misunderstandings.  Not every meeting has to be repeated. More users can see the data.  So they can start sharing and trusting the data more and stop reinventing the wheel.

These conversations may have been impromptu and may largely have been bottom-up from the project team, but now we are seeing a concerted effort to resolve the misunderstandings.  Executives recognize the importance of naming and are involved in bringing together different departments in order to tighten up the language. We are now at Level 4.  The data can now be shared freely within the organization.  

Our focus turns towards transcendence.  How do we maintain this as staff changes and the business evolves?  How do we onboard new people and collaborate with other companies? This brings us to Level 5.  We are aligning our domain model with industry standards.  Hopefully, we have been doing this from the beginning. But here, we are seeing the results.  We are seeing alignment in the industry and are able to share and onboard more effectively. An understanding of the discourse is part of the job requirement and there is an online course on the subject offered by the local university.  We have arrived.

Summary

In this article, we revisited the key elements of a data initiative and introduced the idea of a rubric as a tool to better measure and understand our projects.  Then we applied this to the area of domain modeling. Domain Modeling is the practice of developing the language and understanding we need to discuss the data. It's a prerequisite to any meaningful conversation. The result is a 5-level rubric for assessing the practice of domain modeling in your organization or project. We then go on to provide a historical context and illustrate the importance of domain modeling to realizing the ROI of your project.  We close with a narrative of the experiences of a project team as they mature in this context.

We hope this tool helps you better understand the impact of and assess the domain model for any data initiative.  Look out for our next article, where we expand our rubric to the next major area of concern. See you next time!

Doug TungComment