The bigger the data, it appears, the better the chances much of it will be “bad” data.

And that bad data is costing companies a bundle, according to research published in MIT Sloan Management Review.

“Bad data is the norm,” writes Thomas Redman, president of Data Quality Solutions. Drawing on research from other big data analysts, the consulting firm estimates the cost of bad data is between 15 percent and 25 percent of revenue for many companies.

The costs associated with bad data include the time required to correct errors, confirming data from other sources and cleaning up the mess created by mistakes stemming from erroneous data, Redman calculates.

The good news, he continues, is that the root causes of errors underlying bad data can be readily pinpointed and eliminated. “All told, we estimate that two-thirds of these costs can be identified and eliminated—permanently,” Redman asserts.

As data volumes soar, a growing chorus of business consultants has been warning of declining data quality, particularly as data siloes grow higher and companies struggle to integrate massive volumes of legacy data into new platforms.

“Poor-quality data is a huge problem,” Bruce Rogers, chief insights officer at Forbes Media, noted in releasing a report in May on the explosion of bad data. “It leaves many companies trying to navigate the Information Age in the equivalent of a horse and buggy.”

The Forbes/KPMG International study found that 84 percent of CEOs surveyed expressed concerns about data quality used to make strategic decisions.

Other market analysts have attempted to quantify the cost of bad data. For example, Gartner Inc. estimates that the average company cost of sketchy data is about $9.7 million annually.

The Forbes/KPMG survey also found that 41 percent of large enterprises said they are making data quality and analytics a priority. Among the remedies are conducting inventories to determine the types of data owned by enterprises, where it is stored and in what format. Next, more companies are using benchmarking and data audits as they shift from batch processing to real-time validation and de-duplication of redundant data.

Meanwhile, Redman of Data Quality Solutions reports that his research found that nearly half of new data records created in a study contained “critical errors.” Concludes Redman: “The vast majority of data is simply unacceptable, and much of it is atrocious. Unless you have hard evidence to the contrary, you must assume that your data is in similar shape.”

Among the consequences, he warns, is wasting the time of valuable resources like data scientists on mundane data quality issues.

While estimates of the cost of bad data range as high as 25 percent of company revenue, the overall cost to the U.S. economy could run as high as $3.1 trillion, according to an IBM (NYSE: IBM) estimate cited by Redman.

Written by George Leopold, first published on Datanami website December 13 2017