The modern data stack (MDS) is a collection of point solutions businesses are cobbling together to aggregate, manage and analyze data. We’ve already documented the issues with this approach here and here. In this blog, we’ll focus on the DIY MDS approach’s fatal flaw: high cost and low ROI.
If you are embarking on the path of purchasing a data warehouse and hiring a data analyst to create cross-departmental reporting and analytics, you can expect to invest an additional $50k minimum in the tools they need to do their job: a data warehouse to store the data, ETL tools to get data into the data warehouse, transform tools and compute credits to get your data in the shape you need, and ultimately a BI tool to create the dashboards and reports you desire.
Then the fun begins as your new hire(s) will invest an additional 4-6 months cobbling together these bespoke tools and wrangling your data just to make it presentable. And at some point, you’ll want to activate the data and insights stored in the warehouse, which requires another tool to master and another vendor to manage.
We’ve broken down the cost of this approach in the table below.
The True Cost of the Modern Data Stack
What is it? | How much does it cost? | |||
Year 1 (projected) |
Year 1(actual) | Year 2 | Year 3 | |
Data Storage (Aggregates and stores data.)Sample vendors: PostgreSQL, MySQL, MongoDB, Amazon S3, Snowflake | $2,000 | $3,000 | $10,000 | $20,000 |
Data Processing / Compute (Operations run on data stored in a cloud data warehouse.)Sample vendors: dbt, Databricks, BigQuery | $20,000 | $40,000 | $80,000 | $150,000 |
ETL (Extract, Transform, Load) (Moves data from business systems into a data store and transforms it into a common format for reporting.)Sample vendors: Fivetran, Stitch, Apache Airflow | $12,000 | $20,000 | $40,000 | $75,000 |
Reverse ETL (Activates data in the data store by sending it back to business systems.)Sample vendors: Hightouch, Census | $8,000 | $20,000 | $30,000 | $50,000 |
Analytics (For creating reports and dashboards and performing ad-hoc analysis.)Sample vendors: Looker, Power BI, Tableau | $1,000 | $1,000 | $6,000 | $40,000 |
Observability / Catalog (Monitor and manage data and metadata across tools.)Sample vendors: Atlan, Monte Carlo, data.world, Informatica | N/A | N/A | N/A | $50,000 |
Technology Cost Subtotal | $53,000 | $84,000 | $166,000 | $335,000 |
Personnel (The humans required to build, manage, and maintain the various components of your data stack)Example titles: Data engineer, BI engineer, Analytics manager, Data analyst, Data scientist, Data leader | $200,000 | $250,000 | $600,000 | $2,000,000 |
Technology + Personnel Costs Total | $253,000 | $334,000 | $766,000 | $2,335,000 |
Why do projected costs and actual costs vary?
It is almost impossible for data and operations professionals to predict how much compute and data processing they will need in a given year. And this unpredictability shows up in multiple places in the modern data stack: Compute costs on top of your warehouse, ETL processing costs (Monthly Active Rows), and Reverse ETL distribution costs (Number of destination fields). Sid Metcalfe in his hardware section elaborates on some of these challenges for different types of e.g. motherboards and other types of hardware.
It’s a clever business model deployed by these companies. Data storage is cheap, that’s what baits you in. Their profit comes from this highly variable expense – no different from how companies that sell razors profit off razor blades or how companies that sell printers profit off the ink. That’s good for them, not for you.
Syncari’s pricing model is simple – your price is based on the number of unified records you have under management. That means unlimited transactions (e.g. compute).
Pro tip: Ask vendors how much it costs to resync data across systems – not just THEIR cost, but THEIR cost + the cost of other elements of the MDS.
Key takeaways about the cost of MDS
- Be wary of unpredictable and explosive compute and data processing costs in the DIY MDS approach.
- Modern data stacks require expensive data teams – something most can’t afford. If you can, recruiting and hiring introduces a lag in time to value.
- The R&D required to assemble this stack (learning new tools, context switching between tools, managing multiple vendor relationships) will take time and energy from your team that would otherwise be spent running your business. Opportunity cost is significant.