The catastrophic cost of bad data and where it’s all headed (Part 3 of 5)

Table of Contents

 “Automation applied to an inefficient process will magnify the inefficiency.” – Bill Gates

This is part 3 of my 5-part series about the cost of bad data. Read the previous post here.

The added threat of system-specific AI

System-specific machine learning algorithms are making all the bad things happen faster. AI is increasingly a feature of SaaS offerings, from Salesforce Einstein to NetSuite Intelligent Cloud, and spending on AI in software will reach $98 billion by 2023. Forty-five percent of businesses use some form of it.

Each algorithm is specific to the system and so prioritizes data quality within its own fiefdom without concern for others, constantly optimizing based on an incomplete picture of the business.

How AI goes awry

  • The customer support AI “corrects” contract start dates to the European format based on the convention at headquarters.
  • The sales AI updates lead scores based on prospect engagement, triggering a flood of emails from marketing.
  • The marketing AI overwrites an entire list of leads’ correct phone numbers with the generic 800 numbers drawn from a new data enrichment technology.
  • The analytics AI reformats all activity data, making it unusable to the product team.
  • The Marketing AI changes the billing contacts for finance.

System-specific AI poses such a dire threat because: 

Each point solution is a silo. Each AI optimizes for something different without coordinating with others, causing central data quality to decay, fast.

System-specific AIs are features, not products. Not every company has the machine learning expertise or quantity of data necessary to build useful, enterprise-grade AI systems, but many do anyway.

Errors aren’t easily reversed. Point-solution algorithms learn, improve, and take autonomous action. But if the action is an error, teams often don’t have the tools to revert.

Data in an ecosystem with multiple system-specific AIs decays at blistering speed, and unlike human operators, AI never sleeps. Does your team have a plan?

So how are companies dealing with these issues?

The data quality problem is so imposing that not all companies are contemplating a fix. It’s the status quo in many businesses, and its effects seem so specific to each team that it’s difficult to see it as the organizational problem that it is. Gartner writes that, “New technologies and approaches are available, yet many stick to their proven practices.”

Those doing something are doing some combination of:

Buying data to fill the void

US firms spent $19 billion on third-party supplementary audience data in 2018, up 18 percent from the year before. That’s not including data cleansing, integration, and hygiene services. There were 1,120 data vendors on the MarTech Landscape chart in 2019.

Team-specific initiatives

Individual business units or team leaders either purchase new data, implement new data procedures, or occasionally cordon their data off so other business units can’t foul it up. If they pay for third-party data and data cleansing or appending services, they become dependent on frequent refreshes.

“Men and nations behave wisely, but only after they’ve exhausted all other resources.” – Abba Eban

Chasing the specter of centralized data management

Customer data platforms (CDPs) have become numerous enough to warrant their own Forrester New Wave report, but are built on a flawed premise of centralized storage. CDPs move all the data to one central location, from which it isn’t easily moved back and accessed by the end users to whom it’s valuable. If I’m a marketer, it doesn’t matter to me that our email addresses are correct in the CDP if I send emails from the marketing platform. And if I want to build a bidirectional sync, I quickly run up against API call limits and excessive latency. Moving data back and forth from a central location only exacerbates the N-squared problem covered in Part 1 whereby more connections lead to exponentially more complex problems.

Struggling with master data management

Similar to CDP, master data management software (MDM) tries to centralize data where it can be cleaned, transformed, and redistributed. It’s been a topic of conversation for over a decade, but the concept suffers serious limitations:

  • Storage is expensive and slow. Many MDM solutions try to capture all data in their original format to sort later, but storage is both more difficult and more expensive than the pioneers of the 2010s imagined. “There was a rush to capture all data and sort it all later,” says Rob Zare, Senior Director of Product Management at Salesforce. “But most of that data is garbage. People collected petabytes of proverbial cat pictures.” As a result, companies are growing disillusioned with Hadoop and non-relational data storage, and the sector is folding. Hortonworks and Cloudera have merged and at the time of this writing, MapR is near bankruptcy.
  • Central control is problematic. MDM initiatives fall under IT, which often doesn’t have the luxury of spending enough time with the various lines of business to know what good data means to end users. This also makes it slow to adapt as data and data sources change.
  • MDM requires strong executive sponsorship and change management. Which is to say, there are very few instances of it working in practice.

All this has led me to believe that to truly tackle the data quality issue, companies need and want a system that borrows from each approach, but addresses the issue at its source.

call-to-action graphic


About the authorNick is a CEO, founder, and author with over 25 years of experience in tech who writes about data ecosystems, SaaS, and product development. He spent nearly seven years as EVP of Product at Marketo and is now CEO and Founder of Syncari.

Join 10,000+ customer data thought leaders.

Follow weekly insights at the intersection of GTM, RevOps and data strategy.

Thank you for signing up!