Conceptually, there are competing versions of data warehousing strategies that are employed industrially. One (false) dichotomy that is commonly shared is between the bottom-up and top-down approaches to designing a data warehouse. The difference between these two is, essentially, the answer to this question:

“Should we try and bring together all of our data into one place so we can answer all of our business questions from it, or should we tackle individual business concerns first?”

The orientation implied by “bottom-up” and “top-down” is that with an organization-wide view of data at the top, and the individual business concerns (and consequently, most of the data products) at the bottom.

The “top” and “bottom” here can be thought of in a classic pyramid model showing the levels of business processes.

Bottom-up: The Kimball method

The bottom-up approach is largely popularized as the Kimball method. This method starts with specific business processes that are being modelled, and from that decides on the granularity of the data, and the dimensions and metrics that it will contain. The Kimball method is named for Ralph Kimball, who put forward a precise set of best practices with The Data Warehouse Toolkit in 1996. Two updated editions have come out, co-authored with Margy Ross, the most recent being in 2013.

This approach is “bottom-up” in the sense that it doesn’t seek to build an overarching initial design to a data warehouse, but instead results in individual data warehouses as the initial output. Over-arching organization-level views of data emerge organically from these products in aggregate.

The overall Kimball method contains much more than a decision on whether to start with business concerns, but that is where it is anchored. If you are designing data warehouses (and other data products) in a bottom-up approach, then you are first looking at specific needs of the business, and the consumers of the data within it, in order to produce a data product that meets those needs.

The Kimball method specifically does this in four steps:

  1. Identify the business process of concern
  2. Specify the grain of the data
  3. Define the relevant dimensions
  4. Define the metrics that record the facts of the matter

The output of a run through the Kimball method is a data product (a data mart in this method, considered strictly) that can be used to answer questions about the specific business process it is designed to model.

Top-down: The Inmon method

The most commonly-cited top-down approach is popularized as the Inmon method. This method starts, instead, with a very broad view of the data that is generated across the business. If you are designing a data warehouse according to the Inmon method, you are essentially taking an inventory of all of the operations in the business that generate (or could generate) data, and from that designing a general data warehouse that would be the go-to resource for anyone looking to answer business questions using data.

The name comes from Bill Inmon, who is credited with inventing the data warehouse as we know it today. He outlined his concept of a data warehouse in works such as his 1992 book Building the Data Warehouse, and a 2008 follow-up, DW 2.0: The Architecture for the Next Generation of Data Warehousing.

The phrase I most often hear that ties to the top-down approach is the “single source of truth”. Therein lies the appeal of this method: you end up with one big database that represents the set of all facts about the business. If you want to do any analysis on the business, you go to those records. The records are normalized and linked across the web of business operations, such that the answers you get should always be consistent with one another.

Building a data warehouse this way means attempting to collect all of the facts about the business at the outset and producing one primary well of data from which data is consumed.

Lower Manhattan Skyline at Night, New York, New York, USA – Fujifilm X-T5 – 2023

Comparing approaches

Typically, when trying to decide between these approaches, organizations consider the pros and cons of each approach:

This is not an exhaustive list of the pros and cons of each, and there is room for disagreement of my assessment here. However, in general, this comes down to the bottom-up approach being more agile, but a bit more chaotic, while the top-down approach is more dogmatic, but stable and complete.

One benefit of the Kimball approach, and part of its purpose, comes from the fact that its smaller footprint implies lower costs for the storage, network bandwidth and computational processing required to operate on that data. I have left this out, as these concerns are decreasing in relevance. All of these commodities have become much less expensive than they were, even when the most recent iterations of these specific approaches were released.

Traffic Under the Train, Chicago, Illinois, USA – Fujifilm X-Pro3 – 2023

Current practices

For organizations that are aiming to be “agile”, whether in development workflows or in operational ones, the Kimball approach seems to be favoured. I seldom see business stakeholders actually asking for a top-down approach to the data, since the bottom-up approach brings answers sooner, and produces something more flexible. Even when CMOs are complaining about discrepancies in data and wanting a “single source of truth”, they’re also looking for very practical data products. Usually they mean something simpler, they just don’t want it to be confusing.

On the other hand, I frequently hear IT stakeholders offer their vision of consolidated data warehousing using a top-down approach, since their concerns are largely based on keeping the system reliable and stable, and having a single source of truth is a primary goal.

In my experience, if the tasks associated with data engineering fall to a team with IT concerns, they are more likely to attempt a top-down approach where they can predictably maintain one coherent system.

Much of my work is in the realm of digital analytics, specifically. Since this type of data is heavily dependent on explicit collection of the data through implementation of web and app tracking, the overwhelming majority of analysts here prefer the bottom-up approach. I frequently see digital analysts advocating that all of the implemented tracking and reporting be based on business needs and questions people are actually asking. This can be fueled in part by the fact that tracking websites and mobile apps places some demands on the browser or device, and so tracking everything can degrade the user experience within them.

There’s a rational desire for a “single source of truth”, of course. Data consumers with a business mindset live in a world where there are many products contributing to the overall picture they need to look at. Organizations that employ some combination of platforms for advertising, marketing automation, CRM, ecommerce experiences, digital analytics and other tools involved in the context of digital business have data in very many places; it can be overwhelming, and it leaves analysts and stakeholders in that environment with a desire to see a consolidated view.

San Francisco Rising, San Francisco, California, USA – Fujifilm X-T5 – 2023

Why this is a false dichotomy

Top-down or bottom-up. Well, there’s a lot of space between the top and the bottom, and that leaves all kinds of places to start.

The Kimball method seems to be especially popular in some circles, because the bottom-up approach still builds up to a top. As business processes are modelled, they contribute to a larger whole. It is fairly easy to imagine a sort-of bottom-up approach in which there’s some sort of design being built towards, but it is a flexible vision, and the building of the parts is still highly focused and pragmatic. It might be harder to imagine a mostly top-down approach. Much of the impetus behind taking this approach will discourage improvisation, after all.

If you want to favour a bottom-up approach because it’s more practical, though, the now-kind-of-traditional Kimball method doesn’t quite go far enough. If top-down and bottom-up were truly a dichotomy, then the Kimball-esque data marts wouldn’t be all that polished. If we’re starting at business needs, then we need to start earlier.

I don’t know about you, but I’ve never really seen a business problem that kicked off with the details already worked out. And that’s largely why this can’t truly be the split. The so-called bottom-up approach we’ve described isn’t starting at the bottom. There’s more down, down there.

This series continues in Part II: Data in a vault at the bottom of a lake.

Cover: Mirror Wall Downtown, Ottawa, Ontario, Canada – Fujifilm X-T5 – 2022

I’m Head of Product at Napkyn, a provider of digital analytics and media solutions. I’m also a father and a photographer, and I have a background in philosophy. This is my site.