Unraveling Data: The Modern Data Stack

Data is ubiquitous in todays’ world and with more information, more chances for it to tangle and turn into an unusable mess. The Modern Data Stack is here to solve that problem.

Unraveling Data: The Modern Data Stack

The world runs on data. We all know it by now. And if it is one of today’s most important assets in modern society, then we should care deeply about it, right? More so if we are using data to improve our business, to make better decisions. And yet, every day we see businesses with data scattered all over the place, making confusing hoops and turns and getting lost in translation. And so decision-making comes back to square one: intuition.

I bet to some of you that might sound familiar.

But there are options and systems trying to organize the mess your data might be in and unravel it. Enter: The modern data stack.

By now —and if you read the title of the piece, duh— you probably know that what we are talking about is not a single SaaS tool, a platform or technology, but rather a well thought combination of these, so that you can store, manage and use your data in ways that can be truly meaningful for your organization.

Why should you care?

You should probably care if any of these apply to you or your company

  • You have your data in multiple data sources and struggle to see stuff from various sources at the same time.
  • You have to deal with duplicates and outdated data frequently.
  • Data cleaning and transformation (or lack thereof) is a pain, or —even worse— is not allowing you to visualize and analyze the data for better decision making.
  • Reporting and analytics are not giving you the insights you are looking for.

So what does a Modern Data Stack look like?

As we already established, a modern data stack consists of a set of technologies working together. These, of course, are not all doing the same thing and don't necessarily work all at the same time. So let’s take a quick look at the parts, shall we?

Data Sources

Where the data is coming from. Someone wrote their email address on your website and turned into a lead in Hubspot? Hubspot then is the data source of your leads. There was a transaction in Stripe? That new information’s data source will, of course, be Stripe. Do you have some other stuff in PostgreSQL? Then that is also a data source. As you can see, data can be scattered over multiple sources. The more your business grows, the more likely it is that you will have a greater number of data sources.

Ingestion

This is the process in which your generated data from numerous sources is moved to a data storage. The three main players that are currently in charge of this stage are Fivetran, Stitch and Segment.

Warehousing or Storage

The name gives it all away. This stage is where all data from all sources is stored. There are big names in the data warehouse field like Snowflake, Redshift and BigQuery. However it is also seen that earlier stage companies use read-only replicas of their databases as their data storage.

Transformation and Modeling

At this point in the data stack, your information is molded into understandable data so that even non-technical users can explore what’s in it without the help of an engineer. If data cleaning and transformation is your main pain point, then this might be the part that could change it all for the better. Right now, the main company dealing with this step of the stack is dbt.

Analysis

It is very likely that this is the stage that most users are familiar with. Data visualization and business intelligence tools operate in this space and they specialize in creating dashboards and other forms of visual representations where data can be analyzed and monitored. The tools for this part of the stack are plenty, but Tableau, Looker, Metabase and Mixpanel are some of the most popular at the moment.

Operationalization

God, that name is a mouthful! Some are calling it right now “reverse-ETL”, as it takes the data from the storage or warehouse back to a system where it can be operational (where it can finally be useful!). You  might sync leads’ data to your CRM, for instance. Hightouch and Census have been the main players in this area of the stage by the time this article was written.


Modern Data Stacks are already getting traction in a lot of companies as data becomes more ubiquitous in our workspaces. Have you dipped your toes in the world of the Modern Data Stack? You don’t want to be the last one to get to the party.