Improving data quality is like a goldfish. At first, it may seem harmless. But in the wrong ecosystem, it can wreak havoc. A small town in Ontario, Canada recently found thousands of goldfish had been dumped into their local lake and it devastated the lake’s ecosystem. The goldfish monopolized food sources, introduced diseases, and damaged the habitats of other species. The goldfish also grew larger outside of the confines of an aquarium—up to a hulking 8 pounds! Now there’s no stopping them.
Similarly, you may not see your data as a threat. But when introduced into systems that it’s not native to, it can lead to unintended—and dire—consequences. For example, if outdated CRM data is overwriting clean data in your marketing automation platform, you’re losing valuable revenue opportunities.
While it might seem simple to improve data quality by centralizing everything into a data warehouse, customer data platform (CDP), or via a master data management (MDM) solution, these technologies are like a zoo for your data. They remove the data from their natural habitats and bring them together in one “convenient” place. However, they preserve the optimal wellbeing of none of these inhabitants.
In this blog, I’ll shed light on three major problems that can arise when you centralize data outside its native application, plus present an alternate approach that’ll keep your data thriving.
1. Different versions of your data create misalignment
There’s a common misconception that data lakes are the way to achieve unified data. But this approach is deeply flawed, since they force you to duplicate your data. As you attempt to improve data quality with updates or enrichment over time, the data points in your data lake will no longer match their versions in operational systems. You’ve effectively created an evil twin for each piece of data, and it becomes tricky to distinguish which system has the good twin.
The problem becomes impossible to ignore when organizational leaders use BI tools to run cross-functional reports based on the data lake. But marketers still run reports out of their MAPs, sales leads run reports out of their CRMs, and finance tallies bookings based on their ERP. Since the underlying data is inconsistent, none of these numbers add up. Suddenly, marketing’s leads, sales’s opportunities, and finance’s bookings each give very different views on what should be one cohesive customer journey.
“If everyone’s coming to the boardroom with different data sets, it causes huge problems for the business,” says Cristina Saunders, Co-Founder of CS2 Marketing. “If you can’t even agree on the basic facts, you’re utterly unequipped to advance critical business initiatives.”
So much for bidirectional sync
2. Improving data quality becomes a nightmare with errors propagated widely
Imagine you’re enjoying a nice dinner and your elbow knocks your wine glass, spilling merlot all over the carpet. Now what if you started cleaning the floor first, with wine from the table still dripping onto the floor? No matter how much you clean the floor, you’re never going to clean it all up because the source is still a problem. Thomas Redman, the author of Data Driven, calls this the accommodation problem: Wherever there are errors, employees spend valuable time correcting the errors where they encounter them, while neglecting to correct them at the origin.
Similarly, when your data lives in multiple systems, any given cleanup project only targets part of the problem. Achieving unified, complete, and trustworthy data becomes a moving target. To be thorough, you may have to clean the same mess in multiple spots. For example, let’s say you want to purge all non-business email addresses in your system because it’s impossible to get the enrichment to turn them into marketable leads. (Besides, how likely was firstname.lastname@example.org to make a purchase anyways?) You could mass delete these records from your CRM, but they’ll still exist in your MAP and any other integrated systems.
Metcalfe’s Law dictates that each system you add exponentially increases the number of connections between them. An extra few SaaS tools becomes dozens more places to track down and clean up dirty data. Suddenly, you have a much more complicated cleanup process on your hands to remove the bad data from all systems, unless you have a data automation tool like Syncari that lets you delete centrally.
3. Your data isn’t accessible to all teams who need it
The Harvard Business Review recently surveyed 464 executives. Eighty-seven percent of them reported that their employees perform better when they’re empowered with access to data that supports decision-making. How many executives were able to provide this data? Less than 20 percent. This extreme gulf between what leaders know their teams need to perform optimally and what they are delivering speaks to the difficulty of surfacing data.
And to make matters worse, even where data is available to an organization, that doesn’t necessarily mean it’s accessible. Sixty-eight percent of available data goes unleveraged at the average B2B organization.
Centralizing your data outside its owning system worsens the problem of access because it plucks data out of its native system and resituates it in a system they likely need data scientists to help them access. Instead of a goldfish in a pond, you’ve got it swimming around in a massive body of water—making it much harder to find.
What’s more, while your ETL tool may normalize data before it sends it to your data warehouse, your marketing team will still have to contend with chaos when they log into your MAP.
Distributed data systems—the only way to create true data harmony
So how do you keep data in its natural habitat, but still improve data quality? The answer lies in a centralized data model that lives outside of your SaaS tools, but maintains a distributed source of truth between them. If you can globally manage your data flows, but keep each dataset happily in its natural habitat, your data quality will thrive.
Syncari’s Complete Data Automation Platform
Your company has several distinct teams with their own niches in the data ecosystem, and that’s okay. The goal shouldn’t be to cram them all into one system that’s ideal for no one. Rather, let each thrive on its own terms, with some extra support from a global data management strategy to create a truly harmonious data ecosystem. Minus the gargantuan goldfish, or disfigured data.