A Brief History of Today’s Data Woes—And Why A Radical Rethinking Is The Only Way Forward

If you feel like your data is in a state of chaos, you’re not alone. Ninety-four percent of businesses suspect that their customer and prospect data is inaccurate. Even if you’ve recently done a cleanse, customer data decays at a rate of 30% annually—meaning 3 out of every 10 leads you collect will go bad every year. Bad data drains $3.1 trillion dollars out of the U.S. economy every year. And the path to clean and accessible data is often fraught with organizational misalignment and broken promises from vendors.

To understand why there’s still no good answer to today’s bad data problem, you need to recognize where the issue originates. Because, the mindsets that created this problem and the infrastructure we use today aren’t about to solve it. Freed from the old way, we can reimagine a future where revenue leaders can easily access data they can trust. 

Let’s rewind a couple of decades, shall we?

The early days of computer science

Today’s data problems started as soon as two computer applications needed to talk. In those days, Unix and related operating systems were de rigueur, and when applications needed to share data, pipelines were custom-built to transfer output data from one program as input to another. 

The next wave of technologies, RPCs (remote procedure calls), designed for pairs of applications residing on different computers to communicate in a standard way. This is where early patterns of point-to-point connectivity were set. And issue number one crept in, and became an obsession among technologists: How can we get these two applications to speak more easily? What if there are more than two applications? What if the applications kept moving continually from one computer to another? With all this noise, whether what these applications were saying was useful was a secondary consideration. 

Software spoke directly to other software in its own secret language in the 1990s … it was a simpler time.

Object request brokers were the next innovation, to solve some of these communication problems. But the obsession persisted: How do we get application A to talk to application B? The programs developed in this era which now serve as the foundation to how our computers work were just as happy transferring the wrong data as they were the right data. It was up to users to monitor the data flowing through the interconnected systems carefully.

Since nobody knew any better, most companies accepted these hardships as the norm. Nobody then could see all the issues this approach would create as businesses adopted more and more software.

The emergence of middleware in the 2000s

Salesforce paved the way for an entirely new cloud-based software model in 1999, and hundreds of software companies sprang up. As SaaS proliferated, so did the amount of data being generated. 

As all these new software programs needed to exchange that data and trigger actions, it became too difficult and too expensive to build custom integrations every time. The market for middleware—or out-of-the-box software built to funnel data between applications—was becoming huge. But all of this middleware was built with the “connector” philosophy: Move it now, sort it later.

Big companies like IBM, TIBCO, and Informatica also started to build their own middleware. However, as software vendors, their solutions always put their products at the center of their universe, without any careful consideration of how to resolve conflicts between data, decide which of a pair of duplicates to keep, or make sure the data formats were correct. Without a holistic approach to data management, data quality was never the goal and therefore, it was never achieved.

With these technologies, you could shovel all of your Marketo data into Salesforce and all your Salesforce data back into Marketo, but nobody stopped to consider whether or not you needed all of this data available in both systems. The connector certainly didn’t care. Suddenly you had siloed departments with data earth-movers inadvertently polluting each other’s systems. With no notion of governance, this led to a state of data anarchy.

Meanwhile, the next generation of vendors was focused on process automation, with companies like Workato, Zapier, and Automate.io leading the charge. Data was being transferred back and forth quickly and efficiently—reducing the manual effort on the part of workers. But they were still using the connector mindset. Without considering data, these solutions were reintroducing a ton of manual effort on the backend to fix bad data within departmental silos. 

Workflow automation tools were solving a critical pain point at the time, but without addressing the fundamentally flawed assumption. There was no concern for coordinating data that moves around, or data quality. Therefore they contributed to the compounding problem of bad data that was beginning to cannibalize everyone’s revenue.

The SaaS explosion of the 2010s

A dizzying volume of software startups has exploded onto the scene in the past decade, with new players springing up seemingly every day. There are over 8,000 marketing technology companies alone. The typical company’s SaaS adoption has shot up 4x higher in the past 20 years, averaging 137 different applications.

Where multipurpose tools like ERPs once spanned multiple departments, tech stacks are becoming increasingly fractured and varied, with each department buying dozens of software subscriptions. 

What does this mean for the state of data? This SaaS explosion is pouring gasoline on the fire of these nascent data problems. Suddenly, the data problems that were slowly developing are now multiplying at an unprecedented rate. Metcalfe’s Law dictates that connections grow exponentially each time you add in another system, further obfuscating an already nightmarishly complex landscape.

Data quality issues have become unignorable as business leaders realize how much manual effort their teams undertake. The average marketer spends 800 hours per year cleaning lists, and the average salesperson spends 900 hours per year on administration. 

Both CDP (customer data platform) and MDM (master data management) solutions emerged in response to these problems, aiming to centralize data in one place. But this thinking is flawed for a number of reasons. First of all, data needs to live in its natural habitat to be accessible and actionable for the various teams it serves. Second, manual effort, set-up, and upkeep make these systems unwieldy to maintain.

But the volume of data only keeps growing at a staggering rate. Cisco predicts that by 2021, 75% of workloads will be SaaS-only. As the data chaos continues to worsen, we need a better way or else companies will continue to lose 15-25% of their revenue for these mistakes. 

To rise above 2020’s state of data chaos, we need a radical shift of perspective

When siloed departments each prioritize the small slice of the customer journey they own, they often end up working against each other. There’s a parable about a group of blind men all feeling an elephant. One feels the elephant’s leg and concludes he’s touching a pillar. Another feels its trunk and is convinced he’s feeling a water spout. Yet another feels its ear and believes he is touching a fan. Ultimately, nobody is aware of the bigger picture. They’ve got an elephant in front of them, but nobody knows it.

Similarly, when marketing cares about MQLs, sales cares about closing deals, and customer success cares about renewals, few realize they’re all touching the same customer journey. A single customer is one flesh and blood entity, and yet your marketing, sales, and customer support systems likely see this person as three distinct records. 

Here’s the reality: Our mental models about data need an update for 2020. Trying to solve today’s data problem with yesteryear’s tools and mindsets simply isn’t productive. We need a paradigm shift. Companies need a bird’s-eye view of all of their data, thought through holistically using global data models.

Why is it so hard to access clean, reliable data? Because up until now, vendors have not positioned this goal as their North Star. It’s going to take a new generation of technology vendors with a radically different approach to finally solve for the only thing that actually matters: clean data.

It’s not about adding more integrations. It’s about zeroing in on data modeling and the problem of data quality specifically, and asking “What would it take for all departments to be able to access clean and correct data they can rely on to inform decision-making?” Once we can frame the question in the right way, we’re closer to a solution than we’ve ever been before. Breaking free from the data connector mindset is the first step towards recovering revenue that’s needlessly squandered by bad data.


About the Author: Neelesh is the CTO/cofounder at Syncari and a technologist with more than 20 years of experience in building large scale SaaS applications. Previously, he worked at Marketo, where he lead many major technology transformation efforts. He was also at Oracle, Successfactors before that. He loves being on the edge of product and technology.