Why No One Cares about Poor Data Quality

OCDQ Radio is an audio podcast about data quality and its related disciplines, produced and hosted by Jim Harris.

Why does no one care about poor data quality? Because you’re probably measuring data quality without connecting it to your organization’s business processes, applications, or other business uses for enterprise data.

During this episode, I discuss how this is accomplished through the implementation of a data governance policy as an executable process comprised of a combination of business rules and data rules that create and track meaningful data quality metrics framed within a relative business context and associated with a data quality threshold (i.e., tolerance for poor data quality). Each business use for enterprise data should be governed by its own policy. Compliance with these data governance policies aligns data quality with business insight, providing the missing link between poor data quality and poor business performance. And it is then—and only then—that anyone cares about poor data quality.

Popular OCDQ Radio Episodes

Clicking on the link will take you to the episode’s blog post:

  • Demystifying Data Science — Guest Melinda Thielbar, a Ph.D. Statistician, discusses what a data scientist does and provides a straightforward explanation of key concepts such as signal-to-noise ratio, uncertainty, and correlation.
  • Data Quality and Big Data — Guest Tom Redman (aka the “Data Doc”) discusses Data Quality and Big Data, including if data quality matters less in larger data sets, and if statistical outliers represent business insights or data quality issues.
  • Demystifying Master Data Management — Guest John Owens explains the three types of data (Transaction, Domain, Master), the four master data entities (Party, Product, Location, Asset), and the Party-Role Relationship, which is where we find many of the terms commonly used to describe the Party master data entity (e.g., Customer, Supplier, Employee).
  • Data Governance Star Wars — Special Guests Rob Karel and Gwen Thomas joined this extended, and Star Wars themed, discussion about how to balance bureaucracy and business agility during the execution of data governance programs.
  • The Johari Window of Data Quality — Guest Martin Doyle discusses helping people better understand their data and assess its business impacts, not just the negative impacts of bad data quality, but also the positive impacts of good data quality.
  • Studying Data Quality — Guest Gordon Hamilton discusses the key concepts from recommended data quality books, including those which he has implemented in his career as a data quality practitioner.

Data Quality and Chicken Little Syndrome

“The sky is falling!” exclaimed Chicken Little after an acorn fell on his head, causing him to undertake a journey to tell the King that the world is coming to an end.  So says the folk tale that became an allegory for people accused of being unreasonably afraid, or people trying to incite an unreasonable fear in those around them, sometimes referred to as Chicken Little Syndrome.

The sales pitches for data quality solutions often suffer from Chicken Little Syndrome, when vendors and consultants, instead of trying to sell the business benefits of data quality, focus too much on the negative aspects of not investing in data quality, and try scaring people into prioritizing data quality initiatives by exclaiming “your company is failing because your data quality is bad!”

The Chicken Littles of Data Quality use sound bites like “data quality problems cost businesses more than $600 billion a year!” or “poor data quality costs organizations 35% of their revenue!”  However, the most common characteristic of these fear mongering estimates about the costs of poor data quality is that, upon closer examination, most of them either rely on anecdotal evidence, or hide behind the curtain of an allegedly proprietary case study, the details of which conveniently can’t be publicly disclosed.

Lacking a tangible estimate for the cost of poor data quality often complicates building the business case for data quality.  Even though a data quality initiative has the long-term potential of reducing the costs, and mitigating the risks, associated with poor data quality, its initial costs are very tangible.  For example, the short-term increased costs of a data quality initiative can include the purchase of data quality software, and the professional services needed for training and consulting to support installation, configuration, application development, testing, and production implementation.  When considering these short-term costs, and especially when lacking a tangible estimate for the cost of poor data quality, many organizations understandably conclude that it’s less risky to gamble on not investing in a data quality initiative and hope things are just not as bad as Chicken Little claims.

“The sky isn’t falling on us.”

Furthermore, the reason that citing specific examples of poor data quality also doesn’t work very well is not just because of the lack of a verifiable estimate for the associated business costs.  Another significant contributing factor is that people naturally dismiss the possibility that something bad that happened to someone else could also happen to them.

So, when Chicken Little undertakes a journey to tell the CEO that the organization is coming to an end due to poor data quality, exclaiming that “the sky is falling!” while citing one of those data quality disaster stories that befell another organization, should we really be surprised when the CEO looks up, scratches their head, and declares that “the sky isn’t falling on us.”

Sometimes, denying the existence of data quality issues is a natural self-defense mechanism for the people responsible for the business processes and technology surrounding data since nobody wants to be blamed for causing, or failing to fix, data quality issues.  Other times, people suffer from the illusion-of-quality effect caused by the dark side of data cleansing.  In other words, they don’t believe that data quality issues occur very often because the data made available to end users in dashboards and reports often passes through many processes that cleanse or otherwise sanitize the data before it reaches them.

Can we stop Playing Chicken with Data Quality?

Most of the time, advocating for data quality feels like we are playing chicken with executive sponsors and business stakeholders, as if we were driving toward them at full speed on a collision course, armed with fear mongering and disaster stories, hoping that they swerve in the direction of approving a data quality initiative.  But there has to be a better way to advocate for data quality other than constantly exclaiming that “the sky is falling!”  (Don’t cry fowl — I realize that I just mixed my chicken metaphors.)

Beware the Data Governance Ides of March

WindowsLiveWriter-TheIdesofMarchandtheTheatreofDataQuality_80BF-

Morte de Césare (Death of Caesar) by Vincenzo Camuccini, 1798

Today is the Ides of March (March 15), which back in 44 BC was definitely not a good day to be Julius Caesar, who was literally stabbed in the back by the Roman Senate during his assassination in the Theatre of Pompey (as depicted above), which was spearheaded by Brutus and Cassius in a failed attempt to restore the Roman Republic, but instead resulted in a series of civil wars that ultimately led to the establishment of the permanent Roman Empire by Caesar’s heir Octavius (aka Caesar Augustus).

“Beware the Ides of March” is the famously dramatized warning from William Shakespeare’s play Julius Caesar, which has me pondering whether a data governance program implementation has an Ides of March (albeit a less dramatic one—hopefully).

Hybrid Approach (starting Top-Down) is currently leading my unscientific poll about the best way to approach data governance, acknowledging executive sponsorship and a data governance board will be required for the top-down-driven activities of funding, policy making and enforcement, decision rights, and arbitration of conflicting business priorities as well as organizational politics.

The definition of data governance policies illustrates the intersection of business, data, and technical knowledge spread throughout the organization, revealing how interconnected and interdependent the organization is.  The policies provide a framework for the communication and collaboration of business, data, and technical stakeholders, and establish an enterprise-wide understanding of the roles and responsibilities involved, and the accountability required to support the organization’s daily business activities.

The process of defining data governance policies resembles the communication and collaboration of the Roman Republic, but the process of implementing and enforcing data governance policies resembles the command and control of the Roman Empire.

During this transition of power, from policy definition to policy implementation and enforcement, lies the greatest challenge for a data governance program.  Even though no executive sponsor is the Data Governance Emperor (not even Caesar CEO) and the data governance board is not the Data Governance Senate, a heavy-handed top-down approach to data governance can make policy compliance feel like imperial rule and policy enforcement feel like martial law.  Although a series of enterprise civil wars is unlikely to result, the data governance program is likely to fail without the support of a strong and stable bottom-up foundation.

The enforcement of data governance policies is often confused with traditional management notions of command and control, but the enduring success of data governance requires an organizational culture that embodies communication and collaboration, which is mostly facilitated by bottom-up-driven activities led by the example of data stewards and other peer-level change agents.

“Beware the Data Governance Ides of March” is my dramatized warning about relying too much on the top-down approach to implementing data governance—and especially if your organization has any data stewards named Brutus or Cassius.

Plato’s Data

Plato’s Cave is a famous allegory from philosophy that describes a fictional scenario where people mistake an illusion for reality.

The allegory describes a group of people who have lived their whole lives as prisoners chained motionless in a dark cave, forced to face a blank wall.  Behind the prisoners is a large fire.  In front of the fire are puppeteers that project shadows onto the cave wall, acting out little plays, which include mimicking voices and sound effects that echo off the cave walls.  These shadows and echoes are only projections, partial reflections of a reality created by the puppeteers.  However, this illusion represents the only reality the prisoners have ever known, and so to them the shadows are real sights and the echoes are real sounds.

When one of the prisoners is freed and permitted to turn around and see the source of the shadows and echoes, he rejects reality as an illusion.  The prisoner is then dragged out of the cave into the sunlight, out into the bright, painful light of the real world, which he also rejects as an illusion.  How could these sights and sounds be real to him when all he has ever known is the cave?

But eventually the prisoner acclimates to the real world, realizing that the real illusion was the shadows and echoes in the cave.

Unfortunately, this is when he’s returned to his imprisonment in the cave.  Can you imagine how painful the rest of his life will be, once again being forced to watch the shadows and listen to the echoes — except now he knows that they are not real.

Plato’s Cinema

A modern update on the allegory is something we could call Plato’s Cinema, where a group of people live their whole lives as prisoners chained motionless in a dark cinema, forced to face a blank screen.  Behind the audience is a large movie projector.

Please stop reading for a moment and try to imagine if everything you ever knew was based entirely on the movies you watched.

Now imagine you are one of the prisoners, and you did not get to choose the movies, but instead were forced to watch whatever the projectionist chooses to show you.  Although the fictional characters and stories of these movies are only projections, partial reflections of a reality created by the movie producers, since this illusion would represent the only reality you have ever known, to you the characters would be real people and the stories would be real events.

If you were freed from this cinema prison, permitted to turn around and see the projector, wouldn’t you reject it as an illusion?  If you were dragged out of the cinema into the sunlight, out into the bright, painful light of the real world, wouldn’t you also reject reality as an illusion?  How could these sights and sounds be real to you when all you have ever known is the cinema?

Let’s say that you eventually acclimated to the real world, realizing that the real illusion was the projections on the movie screen.

However, now let’s imagine that you are then returned to your imprisonment in the dark cinema.  Can you imagine how painful the rest of your life would be, once again being forced to watch the movies — except now you know that they are not real.

Plato’s Data

Whether it’s an abstract description of real-world entities (i.e., “master data”) or an abstract description of real-world interactions (i.e., “transaction data”) among entities, data is an abstract description of reality — let’s call this the allegory of Plato’s Data.

We often act as if we are being forced to face our computer screen, upon which data tells us a story about the real world that is just as enticing as the flickering shadows on the wall of Plato’s Cave, or the mesmerizing movies projected in Plato’s Cinema.

Data shapes our perception of the real world, but sometimes we forget that data is only a partial reflection of reality.

I am sure that it sounds silly to point out something so obvious, but imagine if, before you were freed, the other prisoners, in either the cave or the cinema, tried to convince you that the shadows or the movies weren’t real.  Or imagine you’re the prisoner returning to either the cave or the cinema.  How would you convince other prisoners that you’ve seen the true nature of reality?

A common question about Plato’s Cave is whether it’s crueler to show the prisoner the real world, or to return the prisoner to the cave after he has seen it.  Much like the illusions of the cave and the cinema, data makes more sense the more we believe it is real.

However, with data, neither breaking the illusion nor returning ourselves to it is cruel, but is instead a necessary practice because it’s important to occasionally remind ourselves that data and the real world are not the same thing.

Data Quality in Six Verbs

Once upon a time when asked on Twitter to identify a list of critical topics for data quality practitioners, my pithy (with only 140 characters in a tweet, pithy is as good as it gets) response was, and especially since I prefer emphasizing the need to take action, to propose six critical verbs: Investigate, Communicate, Collaborate, Remediate, Inebriate, and Reiterate.

Lest my pith be misunderstood aplenty, this blog post provides more detail, plus links to related posts, about what I meant.

1 — Investigate

Data quality is not exactly a riddle wrapped in a mystery inside an enigma.  However, understanding your data is essential to using it effectively and improving its quality.  Therefore, the first thing you must do is investigate.

So, grab your favorite (preferably highly caffeinated) beverage, get settled into your comfy chair, roll up your sleeves and starting analyzing that data.  Data profiling tools can be very helpful with raw data analysis.

However, data profiling is elementary, my dear reader.  In order for you to make sense of those data elements, you require business context.  This means you must also go talk with data’s best friends—its stewards, analysts, and subject matter experts.

Six blog posts related to Investigate:

2 — Communicate

After you have completed your preliminary investigation, the next thing you must do is communicate your findings, which helps improve everyone’s understanding of how data is being used, verify data’s business relevancy, and prioritize critical issues.

Keep in mind that communication is mostly about listening.  Also, be prepared to face “data denial” whenever data quality is discussed.  This is a natural self-defense mechanism for the people responsible for business processes, technology, and data, which is understandable because nobody likes to be blamed (or feel blamed) for causing or failing to fix data quality problems.

No matter how uncomfortable these discussions may be at times, they are essential to evaluating the potential ROI of data quality improvements, defining data quality standards, and most importantly, providing a working definition of success.

Six blog posts related to Communicate:

3 — Collaborate

After you have investigated and communicated, now you must rally the team that will work together to improve the quality of your data.  A cross-disciplinary team will be needed because data quality is neither a business nor a technical issue—it is both.

Therefore, you will need the collaborative effort of business and technical folks.  The business folks usually own the data, or at least the business processes that create it, so they understand its meaning and daily use.  The technical folks usually own the hardware and software comprising your data architecture.  Both sets of folks must realize they are all “one company folk” that must collaborate in order to be successful.

No, you don’t need a folk singer, but you may need an executive sponsor.  The need for collaboration might sound rather simple, but as one of my favorite folk singers taught me, sometimes the hardest thing to learn is the least complicated.

Six blog posts related to Collaborate:

4 — Remediate

Resolving data quality issues requires a combination of data cleansing and defect prevention.  Data cleansing is reactive and its common (and deserved) criticism is that it essentially treats the symptoms without curing the disease. 

Defect prevention is proactive and through root cause analysis and process improvements, it essentially is the cure for the quality ills that ail your data.  However, a data governance framework is often necessary for defect prevention to be successful.  As is patience and understanding since it will require a strategic organizational transformation that doesn’t happen overnight.

The unavoidable reality is that data cleansing is used to correct today’s problems while defect prevention is busy building a better tomorrow for your organization.  Fundamentally, data quality requires a hybrid discipline that combines data cleansing and defect prevention into an enterprise-wide best practice.

Six blog posts related to Remediate:

5 — Inebriate

I am not necessarily advocating that kind of inebriation.  Instead, think Emily Dickinson (i.e., “Inebriate of air am I” – it’s a line from a poem about happiness that, yes, also happens to make a good drinking song). 

My point is that you must not only celebrate your successes, but celebrate them quite publicly.  Channel yet another poet (Walt Whitman) and sound your barbaric yawp over the cubicles of your company: “We just improved the quality of our data!”

Of course, you will need to be more specific.  Declare success using words illustrating the business impact of your achievements, such as mitigated risks, reduced costs, or increased revenues — those three are always guaranteed executive crowd pleasers.

Six blog posts related to Inebriate:

6 — Reiterate

Like the legend of the phoenix, the end is also a new beginning.  Therefore, don’t get too inebriated, since you are not celebrating the end of your efforts.  Your data quality journey has only just begun.  Your continuous monitoring must continue and your ongoing improvements must remain ongoing.  Which is why, despite the tension this reality, and this bad grammatical pun, might cause you, always remember that the tense of all six of these verbs is future continuous.

Six blog posts related to Reiterate:

What Say You?

Please let me know what you think, pithy or otherwise, by posting a comment below.  And feel free to use more than six verbs.

Finding Data Quality

WindowsLiveWriter-FindingDataQuality_F0E9-

Have you ever experienced that sinking feeling, where you sense if you don’t find data quality, then data quality will find you?

In the spring of 2003, Pixar Animation Studios produced one of my all-time favorite Walt Disney Pictures—Finding Nemo

This blog post is an hommage to not only the film, but also to the critically important role into which data quality is cast within all of your enterprise information initiatives, including business intelligence, master data management, and data governance. 

I hope that you enjoy reading this blog post, but most important, I hope you always remember: “Data are friends, not food.”

Data Silos

WindowsLiveWriter-FindingDataQuality_F0E9-

“Mine!  Mine!  Mine!  Mine!  Mine!”

That’s the Data Silo Mantra—and it is also the bane of successful enterprise information management.  Many organizations persist on their reliance on vertical data silos, where each and every business unit acts as the custodian of their own private data—thereby maintaining their own version of the truth.

Impressive business growth can cause an organization to become a victim of its own success.  Significant collateral damage can be caused by this success, and most notably to the organization’s burgeoning information architecture.

Earlier in an organization’s history, it usually has fewer systems and easily manageable volumes of data, thereby making managing data quality and effectively delivering the critical information required to make informed business decisions everyday, a relatively easy task where technology can serve business needs well—especially when the business and its needs are small.

However, as the organization grows, it trades effectiveness for efficiency, prioritizing short-term tactics over long-term strategy, and by seeing power in the hoarding of data, not in the sharing of information, the organization chooses business unit autonomy over enterprise-wide collaboration—and without this collaboration, successful enterprise information management is impossible.

A data silo often merely represents a microcosm of an enterprise-wide problem—and this truth is neither convenient nor kind.

Data Profiling

WindowsLiveWriter-FindingDataQuality_F0E9-

“I see a light—I’m feeling good about my data . . .

Good feeling’s gone—AHH!”

Although it’s not exactly a riddle wrapped in a mystery inside an enigma,  understanding your data is essential to using it effectively and improving its quality—to achieve these goals, there is simply no substitute for data analysis.

Data profiling can provide a reality check for the perceptions and assumptions you may have about the quality of your data.  A data profiling tool can help you by automating some of the grunt work needed to begin your analysis.

However, it is important to remember that the analysis itself can not be automated—you need to translate your analysis into the meaningful reports and questions that will facilitate more effective communication and help establish tangible business context.

Ultimately, I believe the goal of data profiling is not to find answers, but instead, to discover the right questions. 

Discovering the right questions requires talking with data’s best friends—its stewards, analysts, and subject matter experts.  These discussions are a critical prerequisite for determining data usage, standards, and the business relevant metrics for measuring and improving data quality.  Always remember that well performed data profiling is highly interactive and a very iterative process.

Defect Prevention

WindowsLiveWriter-FindingDataQuality_F0E9-

“You, Data-Dude, takin’ on the defects.

You’ve got serious data quality issues, dude.

Awesome.”

Even though it is impossible to truly prevent every problem before it happens, proactive defect prevention is a highly recommended data quality best practice because the more control enforced where data originates, the better the overall quality will be for enterprise information.

Although defect prevention is most commonly associated with business and technical process improvements, after identifying the burning root cause of your data defects, you may predictably need to apply some of the principles of behavioral data quality.

In other words, understanding the complex human dynamics often underlying data defects is necessary for developing far more effective tactics and strategies for implementing successful and sustainable data quality improvements.

Data Cleansing

WindowsLiveWriter-FindingDataQuality_F0E9-

“Just keep cleansing.  Just keep cleansing.

Just keep cleansing, cleansing, cleansing.

What do we do?  We cleanse, cleanse.”

That’s not the Data Cleansing Theme Song—but it can sometimes feel like it.  Especially whenever poor data quality negatively impacts decision-critical information, the organization may legitimately prioritize a reactive short-term response, where the only remediation will be fixing the immediate problems.

Balancing the demands of this data triage mentality with the best practice of implementing defect prevention wherever possible, will often create a very challenging situation for you to contend with on an almost daily basis.

Therefore, although comprehensive data remediation will require combining reactive and proactive approaches to data quality, you need to be willing and able to put data cleansing tools to good use whenever necessary.

Communication

WindowsLiveWriter-FindingDataQuality_F0E9-

“It’s like he’s trying to speak to me, I know it.

Look, you’re really cute, but I can’t understand what you’re saying.

Say that data quality thing again.”

I hear this kind of thing all the time (well, not the “you’re really cute” part).

Effective communication improves everyone’s understanding of data quality, establishes a tangible business context, and helps prioritize critical data issues. 

Keep in mind that communication is mostly about listening.  Also, be prepared to face “data denial” when data quality problems are discussed.  Most often, this is a natural self-defense mechanism for the people responsible for business processes, technology, and data—and because of the simple fact that nobody likes to feel blamed for causing or failing to fix the data quality problems.

The key to effective communication is clarity.  You should always make sure that all data quality concepts are clearly defined and in a language that everyone can understand.  I am not just talking about translating the techno-mumbojumbo, because even business-speak can sound more like business-babbling—and not just to the technical folks.

Additionally, don’t be afraid to ask questions or admit when you don’t know the answers.  Many costly mistakes can be made when people assume that others know (or pretend to know themselves) what key concepts and other terminology actually mean.

Never underestimate the potential negative impacts that the point of view paradox can have on communication.  For example, the perspectives of the business and technical stakeholders can often appear to be diametrically opposed.

Practicing effective communication requires shutting our mouth, opening our ears, and empathically listening to each other, instead of continuing to practice ineffective communication, where we merely take turns throwing word-darts at each other.

Collaboration

WindowsLiveWriter-FindingDataQuality_F0E9-

“Oh and one more thing:

When facing the daunting challenge of collaboration,

Work through it together, don't avoid it.

Come on, trust each other on this one.

Yes—trust—it’s what successful teams do.”

Most organizations suffer from a lack of collaboration, and as noted earlier, without true enterprise-wide collaboration, true success is impossible.

Beyond the data silo problem, the most common challenge for collaboration is the divide perceived to exist between the Business and IT, where the Business usually owns the data and understands its meaning and use in the day-to-day operation of the enterprise, and IT usually owns the hardware and software infrastructure of the enterprise’s technical architecture.

However, neither the Business nor IT alone has all of the necessary knowledge and resources required to truly be successful.  Data quality requires that the Business and IT forge an ongoing and iterative collaboration.

You must rally the team that will work together to improve the quality of your data.  A cross-disciplinary team will truly be necessary because data quality is neither a business issue nor a technical issue—it is both, truly making it an enterprise issue.

Executive sponsors, business and technical stakeholders, business analysts, data stewards, technology experts, and yes, even consultants and contractors—only when all of you are truly working together as a collaborative team, can the enterprise truly achieve great things, both tactically and strategically.

Successful enterprise information management is spelled E—A—C.

Of course, that stands for Enterprises—Always—Collaborate.  The EAC can be one seriously challenging place, dude.

You don’t know if you know what they know, or if they know what you know, but when you know, then they know, you know?

It’s like first you are all like “Whoa!” and they are all like “Whoaaa!” then you are like “Sweet!” and then they are like “Totally!”

This critical need for collaboration might seem rather obvious.  However, as all of the great philosophers have taught us, sometimes the hardest thing to learn is the least complicated.

Okay.  Squirt will now give you a rundown of the proper collaboration technique:

“Good afternoon. We’re gonna have a great collaboration today.

Okay, first crank a hard cutback as you hit the wall.

There’s a screaming bottom curve, so watch out.

Remember: rip it, roll it, and punch it.”

Finding Data Quality

WindowsLiveWriter-FindingDataQuality_F0E9-

As more and more organizations realize the critical importance of viewing data as a strategic corporate asset, data quality is becoming an increasingly prevalent topic of discussion.

However, and somewhat understandably, data quality is sometimes viewed as a small fish—albeit with a “lucky fin”—in a much larger pond.

In other words, data quality is often discussed only in its relation to enterprise information initiatives such as data integration, master data management, data warehousing, business intelligence, and data governance.

There is nothing wrong with this perspective, and as a data quality expert, I admit to my general tendency to see data quality in everything.  However, regardless of the perspective from which you begin your journey, I believe that eventually you will be Finding Data Quality wherever you look as well.

Is DG a D-O-G?

Is+DG+a+DOG.jpg

Convincing your organization to invest in a sustained data quality program implemented within a data governance framework can be a very difficult task requiring an advocate with a championship pedigree.  But sometimes it seems like no matter how persuasive your sales pitch is, even when your presentation is judged best in show, it appears to fall on deaf ears.

Perhaps, data governance (DG) is a D-O-G.  In other words, maybe the DG message is similar to a sound only dogs can hear.

Galton’s Whistle

In the late 19th century, Francis Galton developed a whistle (now more commonly called a dog whistle), which he used to test the range of frequencies that could be heard by various animals.  Galton was conducting experiments on human faculties, including the range of human hearing.  Although not its intended purpose, today Galton’s whistle is used by dog trainers.  By varying the frequency of the whistle, it emits a sound (inaudible to humans) used either to simply get a dog’s attention, or alternatively to inflict pain for the purpose of correcting undesirable behavior.

Bad Data, Bad, Bad Data!

Many organizations do not become aware of the importance of data governance until poor data quality repeatedly “bites” critical business decisions.  Typically following a very nasty bite, executives scream “bad data, bad, bad data!” without stopping to realize the enterprise’s poor data management practices unleashed the perpetually bad data now running amuck within their systems.

For these organizations, advocacy of proactive defect prevention was an inaudible sound, and now the executives blow harshly into their data whistle and demand a one-time data cleansing project to correct the current data quality problems.

However, even after the project is over, it’s often still a doggone crazy data world.

The Data Whisperer

Executing disconnected one-off projects to deal with data issues when they become too big to ignore doesn’t work because it doesn’t identify and correct the root causes of data’s bad behavior.  By advocating root cause analysis and business process improvement, data governance can essentially be understood as The Data Whisperer.

Data governance defines policies and procedures for aligning data usage with business metrics, establishes data stewardship, prioritizes data quality issues, and facilitates collaboration among all of the business and technical stakeholders.

Data governance enables enterprise-wide data quality by combining data cleansing (which will still occasionally be necessary) and defect prevention into a hybrid discipline, which will result in you hearing everyday tales about data so well behaved that even your executives’ tails will be wagging.

Data’s Best Friend

Without question, data governance is very disruptive to an organization’s status quo.  It requires patience, understanding, and dedication because it will require a strategic enterprise-wide transformation that doesn’t happen overnight.

However, data governance is also data’s best friend. 

And in order for your organization to be successful, you have to realize that data is also your best friend.  Data governance will help you take good care of your data, which in turn will take good care of your business.

Basically, the success of your organization comes down to a very simple question — Are you a DG person?

The Stone Wars of Root Cause Analysis

Stone+Ripple.jpg

“As a single stone causes concentric ripples in a pond,” Martin Doyle commented on my blog post There is No Such Thing as a Root Cause, “there will always be one root cause event creating the data quality wave.  There may be interference after the root cause event which may look like a root cause, creating eddies of side effects and confusion, but I believe there will always be one root cause.  Work backwards from the data quality side effects to the root cause and the data quality ripples will be eliminated.”

Martin Doyle and I continued our congenial blog comment banter on my podcast episode The Johari Window of Data Quality, but in this blog post I wanted to focus on the stone-throwing metaphor for root cause analysis.

Let’s begin with the concept of a single stone causing the concentric ripples in a pond.  Is the stone really the root cause?  Who threw the stone?  Why did that particular person choose to throw that specific stone?  How did the stone come to be alongside the pond?  Which path did the stone-thrower take to get to the pond?  What happened to the stone-thrower earlier in the day that made them want to go to the pond, and once there, pick up a stone and throw it in the pond?

My point is that while root cause analysis is important to data quality improvement, too often we can get carried away riding the ripples of what we believe to be the root cause of poor data quality.  Adding to the complexity is the fact there’s hardly ever just one stone.  Many stones get thrown into our data ponds, and trying to un-ripple their poor quality effects can lead us to false conclusions because causation is non-linear in nature.  Causation is a complex network of many interrelated causes and effects, so some of what appear to be the effects of the root cause you have isolated may, in fact, be the effects of other causes.

As Laura Sebastian-Coleman explains, data quality assessments are often “a quest to find a single criminal—The Root Cause—rather than to understand the process that creates the data and the factors that contribute to data issues and discrepancies.”  Those approaching data quality this way, “start hunting for the one thing that will explain all the problems.  Their goal is to slay the root cause and live happily ever after.  Their intentions are good.  And slaying root causes—such as poor process design—can bring about improvement.  But many data problems are symptoms of a lack of knowledge about the data and the processes that create it.  You cannot slay a lack of knowledge.  The only way to solve a knowledge problem is to build knowledge of the data.”

Believing that you have found and eliminated the root cause of all your data quality problems is like believing that after you have removed the stones from your pond (i.e., data cleansing), you can stop the stone-throwers by building a high stone-deflecting wall around your pond (i.e., defect prevention).  However, there will always be stones (i.e., data quality issues) and there will always be stone-throwers (i.e., people and processes) that will find a way to throw a stone in your pond.

In our recent podcast Measuring Data Quality for Ongoing Improvement, Laura Sebastian-Coleman and I discussed although root cause is used as a singular noun, just as data is used as a singular noun, we should talk about root causes since, just as data analysis is not analysis of a single datum, root cause analysis should not be viewed as analysis of a single root cause.

The bottom line, or, if you prefer, the ripple at the bottom of the pond, is the Stone Wars of Root Cause Analysis will never end because data quality is a journey, not a destination.  After all, that’s why it’s called ongoing data quality improvement.