Tuesday, January 7, 2020

7 Steps to Ensure and Sustain Data Quality

Written by Stephanie Shen _ Jul 28, 2019 · 10 min read

Several years ago, I met a senior director from a large company. He mentioned the company he worked for was facing data quality issues that eroded customer satisfaction, and he had spent months investigating the potential causes and how to fix them. “What have you found?” I asked eagerly. “It is a tough issue. I did not find a single cause, on the contrary, many things went wrong,” he replied. He then started citing a long list of what contributed to the data quality issues — almost every department in the company was involved and it was hard for him to decide where to begin next. This is a typical case when dealing with Data Quality, which directly relates to how an organization is doing its business and the entire life cycle of the data itself.

Before data science became mainstream, data quality was mostly mentioned for the reports delivered to internal or external clients. Nowadays, because machine learning requires a large amount of training data, the internal datasets within an organization are in high demand. In addition, the analytics are always hungry for data and constantly search for data assets that can potentially add value, which has led to quick adoption of new datasets or data sources not explored or used before. This trend has made data management and good practices of ensuring good data quality more important than ever.

The goal of this article is to give you a clear idea of how to build a data pipeline that creates and sustains good data quality from the beginning. In other words, data quality is not something that can be fundamentally improved by finding problems and fixing them. Instead, every organization should start by producing data with good quality in the first place.


No comments:

Post a Comment