You only get a Return from something you actually Invest in

In my previous post, I took a slightly controversial stance on a popular three-word phrase — Root Cause Analysis.  In this post, it’s another popular three-word phrase — Return on Investment (most commonly abbreviated as the acronym ROI).

What is the ROI of purchasing a data quality tool or launching a data governance program?

Zero.  Zip.  Zilch.  Intet.  Ingenting.  Rien.  Nada.  Nothing.  Nichts.  Niets.  Null.  Niente.  Bupkis.

There is No Such Thing as the ROI of purchasing a data quality tool or launching a data governance program.

Before you hire “The Butcher” to eliminate me for being The Man Who Knew Too Little about ROI, please allow me to explain.

Returns only come from Investments

Although the reason that you likely purchased a data quality tool is because you have business-critical data quality problems, simply purchasing a tool is not an investment (unless you believe in Magic Beans) since the tool itself is not a solution.

You use tools to build, test, implement, and maintain solutions.  For example, I spent several hundred dollars on new power tools last year for a home improvement project.  However, I haven’t received any return on my home improvement investment for a simple reason — I still haven’t even taken most of the tools out of their packaging yet.  In other words, I barely even started my home improvement project.  It is precisely because I haven’t invested any time and effort that I haven’t seen any returns.  And it certainly isn’t going to help me (although it would help Home Depot) if I believed buying even more new tools was the answer.

Although the reason that you likely launched a data governance program is because you have complex issues involving the intersection of data, business processes, technology, and people, simply launching a data governance program is not an investment since it does not conjure the three most important letters.

Data is only an Asset if Data is a Currency

In his book UnMarketing, Scott Stratten discusses this within the context of the ROI of social media (a commonly misunderstood aspect of social media strategy), but his insight is just as applicable to any discussion of ROI.  “Think of it this way: You wouldn’t open a business bank account and ask to withdraw $5,000 before depositing anything. The banker would think you are a loony.”

Yet, as Stratten explained, people do this all the time in social media by failing to build up what is known as social currency.  “You’ve got to invest in something before withdrawing. Investing your social currency means giving your time, your knowledge, and your efforts to that channel before trying to withdraw monetary currency.”

The same logic applies perfectly to data quality and data governance, where we could say it’s the failure to build up what I will call data currency.  You’ve got to invest in data before you could ever consider data an asset to your organization.  Investing your data currency means giving your time, your knowledge, and your efforts to data quality and data governance before trying to withdraw monetary currency (i.e., before trying to calculate the ROI of a data quality tool or a data governance program).

If you actually want to get a return on your investment, then actually invest in your data.  Invest in doing the hard daily work of continuously improving your data quality and putting into practice your data governance principles, policies, and procedures.

Data is only an asset if data is a currency.  Invest in your data currency, and you will eventually get a return on your investment.

You only get a return from something you actually invest in.

Related Posts

Can Enterprise-Class Solutions Ever Deliver ROI?

Do you believe in Magic (Quadrants)?

Which came first, the Data Quality Tool or the Business Need?

What Data Quality Technology Wants

A Farscape Analogy for Data Quality

The Data Quality Wager

“Some is not a number and soon is not a time”

The Dumb and Dumber Guide to Data Quality

There is No Such Thing as a Root Cause

Root cause analysis.  Most people within the industry, myself included, often discuss the importance of determining the root cause of data governance and data quality issues.  However, the complex cause and effect relationships underlying an issue means that when an issue is encountered, often you are only seeing one of the numerous effects of its root cause (or causes).

In my post The Root! The Root! The Root Cause is on Fire!, I poked fun at those resistant to root cause analysis with the lyrics:

The Root! The Root! The Root Cause is on Fire!
We don’t want to determine why, just let the Root Cause burn.
Burn, Root Cause, Burn!

However, I think that the time is long overdue for even me to admit the truth — There is No Such Thing as a Root Cause.

Before you charge at me with torches and pitchforks for having an Abby Normal brain, please allow me to explain.

 

Defect Prevention, Mouse Traps, and Spam Filters

Some advocates of defect prevention claim that zero defects is not only a useful motivation, but also an attainable goal.  In my post The Asymptote of Data Quality, I quoted Daniel Pink’s book Drive: The Surprising Truth About What Motivates Us:

“Mastery is an asymptote.  You can approach it.  You can home in on it.  You can get really, really, really close to it.  But you can never touch it.  Mastery is impossible to realize fully.

The mastery asymptote is a source of frustration.  Why reach for something you can never fully attain?

But it’s also a source of allure.  Why not reach for it?  The joy is in the pursuit more than the realization.

In the end, mastery attracts precisely because mastery eludes.”

The mastery of defect prevention is sometimes distorted into a belief in data perfection, into a belief that we can not just build a better mousetrap, but we can build a mousetrap that could catch all the mice, or that by placing a mousetrap in our garage, which prevents mice from entering via the garage, we somehow also prevent mice from finding another way into our house.

Obviously, we can’t catch all the mice.  However, that doesn’t mean we should let the mice be like Pinky and the Brain:

Pinky: “Gee, Brain, what do you want to do tonight?”

The Brain: “The same thing we do every night, Pinky — Try to take over the world!”

My point is that defect prevention is not the same thing as defect elimination.  Defects evolve.  An excellent example of this is spam.  Even conservative estimates indicate almost 80% of all e-mail sent world-wide is spam.  A similar percentage of blog comments are spam, and spam generating bots are quite prevalent on Twitter and other micro-blogging and social networking services.  The inconvenient truth is that as we build better and better spam filters, spammers create better and better spam.

Just as mousetraps don’t eliminate mice and spam filters don’t eliminate spam, defect prevention doesn’t eliminate defects.

However, mousetraps, spam filters, and defect prevention are essential proactive best practices.

 

There are No Lines of Causation — Only Loops of Correlation

There are no root causes, only strong correlations.  And correlations are strengthened by continuous monitoring.  Believing there are root causes means believing continuous monitoring, and by extension, continuous improvement, has an end point.  I call this the defect elimination fallacy, which I parodied in song in my post Imagining the Future of Data Quality.

Knowing there are only strong correlations means knowing continuous improvement is an infinite feedback loop.  A practical example of this reality comes from data-driven decision making, where:

  1. Better Business Performance is often correlated with
  2. Better Decisions, which, in turn, are often correlated with
  3. Better Data, which is precisely why Better Decisions with Better Data is foundational to Business Success — however . . .

This does not mean that we can draw straight lines of causation between (3) and (1), (3) and (2), or (2) and (1).

Despite our preference for simplicity over complexity, if bad data was the root cause of bad decisions and/or bad business performance, every organization would never be profitable, and if good data was the root cause of good decisions and/or good business performance, every organization could always be profitable.  Even if good data was a root cause, not just a correlation, and even when data perfection is temporarily achieved, the effects would still be ephemeral because not only do defects evolve, but so does the business world.  This evolution requires an endless revolution of continuous monitoring and improvement.

Many organizations implement data quality thresholds to close the feedback loop evaluating the effectiveness of their data management and data governance, but few implement decision quality thresholds to close the feedback loop evaluating the effectiveness of their data-driven decision making.

The quality of a decision is determined by the business results it produces, not the person who made the decision, the quality of the data used to support the decision, or even the decision-making technique.  Of course, the reality is that business results are often not immediate and may sometimes be contingent upon the complex interplay of multiple decisions.

Even though evaluating decision quality only establishes a correlation, and not a causation, between the decision execution and its business results, it is still essential to continuously monitor data-driven decision making.

Although the business world will never be totally predictable, we can not turn a blind eye to the need for data-driven decision making best practices, or the reality that no best practice can eliminate the potential for poor data quality and decision quality, nor the potential for poor business results even despite better data quality and decision quality.  Central to continuous improvement is the importance of closing the feedback loops that make data-driven decisions more transparent through better monitoring, allowing the organization to learn from its decision-making mistakes, and make adjustments when necessary.

We need to connect the dots of better business performance, better decisions, and better data by drawing loops of correlation.

 

Decision-Data Feedback Loop

Continuous improvement enables better decisions with better data, which drives better business performance — as long as you never stop looping the Decision-Data Feedback Loop, and start accepting that there is no such thing as a root cause.

I discuss this, and other aspects of data-driven decision making, in my DataFlux white paper, which is available for download (registration required) using the following link: Decision-Driven Data Management

 

Related Posts

The Root! The Root! The Root Cause is on Fire!

Bayesian Data-Driven Decision Making

The Role of Data Quality Monitoring in Data Governance

The Circle of Quality

Oughtn’t you audit?

The Dichotomy Paradox, Data Quality and Zero Defects

The Asymptote of Data Quality

To Our Data Perfectionists

Imagining the Future of Data Quality

What going to the Dentist taught me about Data Quality

DQ-Tip: “There is No Such Thing as Data Accuracy...”

The HedgeFoxian Hypothesis

The Speed of Decision

In a previous post, I used the Large Hadron Collider as a metaphor for big data and big analytics where the creative destruction caused by high-velocity collisions of large volumes of varying data attempt to reveal elementary particles of business intelligence.

Since recent scientific experiments have sparked discussion about the possibility of exceeding the speed of light, in this blog post I examine whether it’s possible to exceed the speed of decision (i.e., the constraints that time puts on data-driven decision making).

 

Is Decision Speed more important than Data Quality?

In my blog post Thaler’s Apples and Data Quality Oranges, I explained how time-inconsistent data quality preferences within business intelligence reflect the reality that with the speed at which things change these days, more near-real-time operational business decisions are required, which sometimes makes decision speed more important than data quality.

Even though advancements in computational power, network bandwidth, parallel processing frameworks (e.g., MapReduce), scalable and distributed models (e.g., cloud computing), and other techniques (e.g., in-memory computing) are making real-time data-driven decisions more technologically possible than ever before, as I explained in my blog post Satisficing Data Quality, data-driven decision making often has to contend with the practical trade-offs between correct answers and timely answers.

Although we can’t afford to completely sacrifice data quality for faster business decisions, and obviously high quality data is preferable to poor quality data, less than perfect data quality can not be used as an excuse to delay making a critical decision.

 

Is Decision Speed more important than Decision Quality?

The increasing demand for real-time data-driven decisions is not only requiring us to re-evaluate our data quality thresholds.  In my blog post The Circle of Quality, I explained the connection between data quality and decision quality, and how result quality trumps them both because an organization’s success is measured by the quality of the business results it produces.

Again, with the speed at which the business world now changes, the reality is that the fear of making a mistake can not be used as an excuse to delay making a critical decision, which sometimes makes decision speed more important than decision quality.

“Fail faster” has long been hailed as the mantra of business innovation.  It’s not because failure is a laudable business goal, but instead because the faster you can identify your mistakes, the faster you can correct your mistakes.  Of course this requires that you are actually willing to admit you made a mistake.

(As an aside, I often wonder what’s more difficult for an organization to admit: poor data quality or poor decision quality?)

Although good decisions are obviously preferable to bad decisions, we have to acknowledge the fragility of our knowledge and accept that mistake-driven learning is an essential element of efficient and effective data-driven decision making.

Although the speed of decision is not the same type of constant as the speed of light, in our constantly changing business world, the speed of decision represents the constant demand for good-enough data for fast-enough decisions.

 

Related Posts

The Big Data Collider

A Decision Needle in a Data Haystack

The Data-Decision Symphony

Thaler’s Apples and Data Quality Oranges

Satisficing Data Quality

Data Confabulation in Business Intelligence

The Data that Supported the Decision

Data Psychedelicatessen

OCDQ Radio - Big Data and Big Analytics

OCDQ Radio - Good-Enough Data for Fast-Enough Decisions

Data, Information, and Knowledge Management

Data In, Decision Out

The Real Data Value is Business Insight

Is your data complete and accurate, but useless to your business?

The Circle of Quality

The Data Cold War

One of the many things I love about Twitter is its ability to spark ideas via real-time conversations.  For example, while live-tweeting during last week’s episode of DM Radio, the topic of which was how to get started with data governance, I tweeted about the data silo challenges and corporate cultural obstacles being discussed.

I tweeted that data is an asset only if it is a shared asset, across the silos, across the corporate culture, and that, in order to be successful with data governance, organizations must replace the mantra “my private knowledge is my power” with “our shared knowledge empowers us all.”

“That’s very socialist thinking,” Mark Madsen responded.  “Soon we’ll be having arguments about capitalizing over socializing our data.”

To which I responded that the more socialized data is, the more capitalized data can become . . . just ask Google.

“Oh no,” Mark humorously replied, “decades of political rhetoric about socialism to be ruined by a discussion of data!”  And I quipped that discussions about data have been accused of worse, and decades of data rhetoric certainly hasn’t proven very helpful in corporate politics.

 

Later, while ruminating on this light-hearted exchange, I wondered if we actually are in the midst of the Data Cold War.

 

The Data Cold War

The Cold War, which lasted approximately from 1946 to 1991, was the political, military, and economic competition between the Communist World, primarily the former Soviet Union, and the Western world, primarily the United States.  One of the major tenets of the Cold War was the conflicting ideologies of socialism and capitalism.

In enterprise data management, one of the most debated ideologies is whether or not data should be viewed as a corporate asset, especially by the for-profit corporations of capitalism, which is (even before the Cold War began), and will likely forever remain, the world’s dominant economic model.

My earlier remark that data is an asset only if it is a shared asset, across the silos, across the corporate culture, is indicative of the bounded socialist view of enterprise data.  In other words, almost no one in the enterprise data management space is suggesting that data should be shared beyond the boundary of the organization.  In this sense, advocates, including myself, of data governance are advocating socializing data within the enterprise so that data can be better capitalized as a true corporate asset.

This mindset makes sense because sharing data with the world, especially for free, couldn’t possibly be profitable — or could it?

 

The Master Data Management Magic Trick

The genius (and some justifiably ponder if it’s evil genius) of companies like Google and Facebook is they realized how to make money in a free world — by which I mean the world of Free: The Future of a Radical Price, the 2009 book by Chris Anderson.

By encouraging their users to freely share their own personal data, Google and Facebook ingeniously answer what David Loshin calls the most dangerous question in data management: What is the definition of customer?

How do Google and Facebook answer the most dangerous question?

A customer is a product.

This is the first step that begins what I call the Master Data Management Magic Trick.

Instead of trying to manage the troublesome master data domain of customer and link it, through sales transaction data, to the master data domain of product (products, by the way, have always been undeniably accepted as a corporate asset even though product data has not been), Google and Facebook simply eliminate the need for customers (and, by extension, eliminate the need for customer service because, since their product is free, it has no customers) by transforming what would otherwise be customers into the very product that they sell — and, in fact, the only “real” product that they have.

And since what their users perceive as their product is virtual (i.e., entirely Internet-based), it’s not really a product, but instead a free service, which can be discontinued at any time.  And if it was, who would you complain to?  And on what basis?

After all, you never paid for anything.

This is the second step that completes the Master Data Management Magic Trick — a product is a free service.

Therefore, Google and Facebook magically make both their customers and their products (i.e., master data) disappear, while simultaneously making billions of dollars (i.e., transaction data) appear in their corporate bank accounts.

(Yes, the personal data of their users is master data.  However, because it is used in an anonymized and aggregated format, it is not, nor does it need to be, managed like the master data we talk about in the enterprise data management industry.)

 

Google and Facebook have Capitalized Socialism

By “empowering” us with free services, Google and Facebook use the power of our own personal data against us — by selling it.

However, it’s important to note that they indirectly sell our personal data as anonymized and aggregated demographic data.

Although they do not directly sell our individually identifiable information (because, truthfully, it has very limited, and mostly no legal, value, i.e., that would be identity theft), Google and Facebook do occasionally get sued (mostly outside the United States) for violating data privacy and data protection laws.

However, it’s precisely because we freely give our personal data to them, that until, or if, laws are changed to protect us from ourselves, it’s almost impossible to prove they are doing anything illegal (again, their undeniable genius is arguably evil genius).

Google and Facebook are the exact same kind of company — they are both Internet advertising agencies.

They both sell online advertising space to other companies, which are looking to demographically target prospective customers because those companies actually do view people as potential real customers for their own real products.

The irony is that if all of their users stopped using their free service, then not only would our personal data be more private and more secure, but the new revenue streams of Google and Facebook would eventually dry up because, specifically by design, they have neither real customers nor real products.  More precisely, their only real customers (other companies) would stop buying advertising from them because no one would ever see and (albeit, even now, only occasionally) click on their ads.

Essentially, companies like Google and Facebook are winning the Data Cold War because they have capitalized socialism.

In other words, the bottom line is Google and Facebook have socialized data in order to capitalize data as a true corporate asset.

 

Related Posts

Freemium is the future – and the future is now

The Age of the Platform

Amazon’s Data Management Brain

The Semantic Future of MDM

A Brave New Data World

Big Data and Big Analytics

A Farscape Analogy for Data Quality

Organizing For Data Quality

Sharing Data

Song of My Data

Data in the (Oscar) Wilde

The Most August Imagination

Once Upon a Time in the Data

The Idea of Order in Data

Hell is other people’s data

Data, Information, and Knowledge Management

The difference, and relationship, between data and information is a common debate.  Not only do these two terms have varying definitions, but they are often used interchangeably.  Just a few examples include comparing and contrasting data quality with information quality, data management with information management, and data governance with information governance.

In a previous blog post, I referenced the Information Hierarchy provided by Professor Ray R. Larson of the School of Information at the University of California, Berkeley:

  • Data – The raw material of information
  • Information – Data organized and presented by someone
  • Knowledge – Information read, heard, or seen, and understood
  • Wisdom – Distilled and integrated knowledge and understanding

Some consider this an esoteric debate between data geeks and information nerds, but what is not debated is the importance of understanding how organizations use data and/or information to support their business activities.  Of particular interest is the organization’s journey from data to decision, the latter of which is usually considered the primary focus of business intelligence.

In his recent blog post, Scott Andrews explained what he called The Information Continuum:

  • Data – A Fact or a piece of information, or a series thereof
  • Information – Knowledge discerned from data
  • Business Intelligence – Information Management pertaining to an organization’s policy or decision-making, particularly when tied to strategic or operational objectives

 

Knowledge Management

Data Cake
Image by EpicGraphic

This recent graphic does a great job of visualizing the difference between data and information, as well as the importance of how information is presented.  Although the depiction of knowledge as consumed information is oversimplified, I am not sure how this particular visual metaphor could properly represent knowledge as actually understanding the consumed information.

It’s been awhile since the term knowledge management was in vogue within the data management industry. When I began my career, in the early 1990s, I remember hearing about knowledge management as often as we hear about data governance today, which, as you know, is quite often.  The reason I have resurrected the term in this blog post is because I can’t help but wonder if the debate about data and information obfuscates the fact that the organization’s appetite, its business hunger, is for knowledge.

 

Three Questions for You

  1. Does your organization make a practical distinction between data and information?
  2. If so, how does this distinction affect your quality, management, and governance initiatives?
  3. What is the relationship between those initiatives and your business intelligence efforts?

 

Please share your thoughts and experiences by posting a comment below.

 

Related Posts

The Real Data Value is Business Insight

Is your data complete and accurate, but useless to your business?

Data In, Decision Out

The Data-Decision Symphony

Data Confabulation in Business Intelligence

Thaler’s Apples and Data Quality Oranges

DQ-View: Baseball and Data Quality

Beyond a “Single Version of the Truth”

The Business versus IT—Tear down this wall!

Finding Data Quality

Fantasy League Data Quality

The Circle of Quality

Data Governance Star Wars

OCDQ Radio is a vendor-neutral podcast about data quality and its related disciplines, produced and hosted by Jim Harris.

WindowsLiveWriter-DataGovernanceStarWars_728F-

Shown above is the poll results from the recent Star Wars themed blog debate about one of data governance’s biggest challenges, how to balance bureaucracy and business agility.  Rob Karel took the position for Bureaucracy as Darth Karel of the Empire, and I took the position for Agility as OCDQ-Wan Harris of the Rebellion.

However, this was a true debate format where Rob and I intentionally argued polar opposite positions with full knowledge that the reality is data governance success requires effectively balancing bureaucracy and business agility.

Just in case you missed the blog debate, here are the post links:

On this special, extended, and Star Wars themed episode of OCDQ Radio, I am joined by Rob Karel and Gwen Thomas to discuss this common challenge of effectively balancing bureaucracy and business agility on data governance programs.

Rob Karel is a Principal Analyst at Forrester Research, where he serves Business Process and Applications Professionals.  Rob is a leading expert in how companies manage data and integrate information across the enterprise.  His current research focus includes process data management, master data management, data quality management, metadata management, data governance, and data integration technologies.  Rob has more than 19 years of data management experience, working in both business and IT roles to develop solutions that provide better quality, confidence in, and usability of critical enterprise data.

Gwen Thomas is the Founder and President of The Data Governance Institute, a vendor-neutral, mission-based organization with three arms: publishing free frameworks and guidance, supporting communities of practitioners, and offering training and consulting.  Gwen also writes the popular blog Data Governance Matters, frequently contributes to IT and business publications, and is the author of the book Alpha Males and Data Disasters: The Case for Data Governance.

This extended episode of OCDQ Radio is 49 minutes long, and is divided into two parts, which are separated by a brief Star Wars themed intermission.  In Part 1, Rob and I discuss our blog debate.  In Part 2, Gwen joins us to provide her excellent insights.

Popular OCDQ Radio Episodes

Clicking on the link will take you to the episode’s blog post:

  • Demystifying Data Science — Guest Melinda Thielbar, a Ph.D. Statistician, discusses what a data scientist does and provides a straightforward explanation of key concepts such as signal-to-noise ratio, uncertainty, and correlation.
  • Data Quality and Big Data — Guest Tom Redman (aka the “Data Doc”) discusses Data Quality and Big Data, including if data quality matters less in larger data sets, and if statistical outliers represent business insights or data quality issues.
  • Demystifying Master Data Management — Guest John Owens explains the three types of data (Transaction, Domain, Master), the four master data entities (Party, Product, Location, Asset), and the Party-Role Relationship, which is where we find many of the terms commonly used to describe the Party master data entity (e.g., Customer, Supplier, Employee).
  • The Johari Window of Data Quality — Guest Martin Doyle discusses helping people better understand their data and assess its business impacts, not just the negative impacts of bad data quality, but also the positive impacts of good data quality.
  • Studying Data Quality — Guest Gordon Hamilton discusses the key concepts from recommended data quality books, including those which he has implemented in his career as a data quality practitioner.

Stuck in the Middle with Data Governance

Perhaps the most common debate about data governance is whether it should be started from the top down or the bottom up.

Data governance requires the coordination of a complex combination of a myriad of factors, including executive sponsorship, funding, decision rights, arbitration of conflicting priorities, policy definition, policy implementation, data quality remediation, data stewardship, business process optimization, technology, policy enforcement—and obviously many other factors as well.

This common debate is understandable since some of these data governance success factors are mostly top-down (e.g., funding), and some of these data governance success factors are mostly bottom-up (e.g., data quality remediation and data stewardship).

However, the complexity that stymies many organizations is most data governance success factors are somewhere in the middle.

 

Stuck in the Middle with Data Governance

At certain times during the evolution of a data governance program, top-down aspects will be emphasized, and at other times, bottom-up aspects will be emphasized.  So whether you start from the top down or the bottom up, eventually you are going to need to blend together top-down and bottom-up aspects in order to sustain an ongoing and pervasive data governance program.

To paraphrase The Beatles, when you get to the bottom, you go back to the top, where you stop and turn, and you go for a ride until you get to the bottom—and then you do it again.  (But hopefully your program doesn’t get code-named: “Helter Skelter”)

But after some initial progress has been made, to paraphrase Stealers Wheel, people within the organization may start to feel like we have top-down to the left of us, bottom-up to the right to us, and here we are—stuck in the middle with data governance.

In other words, although data governance is never a direct current only flowing in one top-down or bottom-up direction, but instead continually flows in an alternating current between top-down and bottom-up, when this dynamic is not communicated to everyone throughout the organization, progress is disrupted by people waiting around for someone else to complete the circuit.

But when, paraphrasing Pearl Jam, data governance is taken up by the middle—then there ain’t gonna be any middle any more.

In other words, when data governance pervades every level of the organization, everyone stops thinking in terms of top-down and bottom-up, and acts like an enterprise in the midst of sustaining the momentum of a successful data governance program.

 

Data Governance Conference

DGIQ Event Button

Next week, I will be attending the Data Governance and Information Quality Conference, which will be held June 27-30 in San Diego, California at the Catamaran Resort Hotel and Spa.

If you will also be attending, and you want to schedule a meeting with me: Contact me via email

If you will not be attending, you can follow the conference tweets using the hashtag: #DGIQ2011

 

Related Posts

Data Governance Star Wars: Balancing Bureaucracy And Agility

Council Data Governance

DQ-View: Roman Ruts on the Road to Data Governance

The Data Governance Oratorio

Zig-Zag-Diagonal Data Governance

Data Governance and the Buttered Cat Paradox

Beware the Data Governance Ides of March

A Tale of Two G’s

The People Platform

Rise of the Datechnibus

The Collaborative Culture of Data Governance

Connect Four and Data Governance

The Role Of Data Quality Monitoring In Data Governance

Quality and Governance are Beyond the Data

Data Transcendentalism

Podcast: Data Governance is Mission Possible

Video: Declaration of Data Governance

Don’t Do Less Bad; Do Better Good

Jack Bauer and Enforcing Data Governance Policies

The Prince of Data Governance

MacGyver: Data Governance and Duct Tape

The Diffusion of Data Governance

Data Governance Star Wars: Balancing Bureaucracy and Agility

I was recently discussing data governance best practices with Rob Karel, the well respected analyst at Forrester Research, and our conversation migrated to one of data governance’s biggest challenges — how to balance bureaucracy and business agility.

So Rob and I thought it would be fun to tackle this dilemma in a Star Wars themed debate across our individual blog platforms with Rob taking the position for Bureaucracy as the Empire and me taking the opposing position for Agility as the Rebellion.

(Yes, the cliché is true, conversations between self-proclaimed data geeks tend to result in Star Wars or Star Trek parallels.)

Disclaimer: Remember that this is a true debate format where Rob and I are intentionally arguing polar opposite positions with full knowledge that the reality is data governance success requires effectively balancing bureaucracy and agility.

Please take the time to read both of our blog posts, then we encourage your comments — and your votes (see the poll below).

Data Governance Star Wars

If you are having trouble viewing this video, you can watch it on Vimeo by clicking on this link: Data Governance Star Wars

The Force is Too Strong with This One

“Don’t give in to Bureaucracy—that is the path to the Dark Side of Data Governance.”

Data governance requires the coordination of a complex combination of a myriad of factors, including executive sponsorship, funding, decision rights, arbitration of conflicting priorities, policy definition, policy implementation, data quality remediation, data stewardship, business process optimization, technology enablement, and, perhaps most notably, policy enforcement.

When confronted by this phantom menace of complexity, many organizations believe that the only path to success must be command and control—institute a rigid bureaucracy to dictate policies, demand compliance, and dole out punishments.  This approach to data governance often makes policy compliance feel like imperial rule, and policy enforcement feel like martial law.

But beware.  Bureaucracy, command, control—the Dark Side of Data Governance are they.  Once you start down the dark path, forever will it dominate your destiny, consume your organization it will.

No Time to Discuss this as a Committee

“There is a great disturbance in the Data, as if millions of voices suddenly cried out for Governance but were suddenly silenced.  I fear something terrible has happened.  I fear another organization has started by creating a Data Governance Committee.”

Yes, it’s true—at some point, an official Data Governance Committee (or Council, or Board, or Galactic Senate) will be necessary.

However, one of the surest ways to guarantee the failure of a new data governance program is to start by creating a committee.  This is often done with the best of intentions, bringing together key stakeholders from all around the organization, representatives of each business unit and business function, as well as data and technology stakeholders.  But when you start by discussing data governance as a committee, you often never get data governance out of the committee (i.e., all talk, mostly arguing, no action).

Successful data governance programs often start with a small band of rebels (aka change agents) struggling to restore quality to some business-critical data, or struggling to resolve inefficiencies in a key business process.  Once news of their successful pilot project spreads, more change agents will rally to the cause—because that’s what data governance truly requires, not a committee, but a cause to believe in and fight for—especially after the Empire of Bureaucracy strikes back and tries to put down the rebellion.

Collaboration is the Data Governance Force

“Collaboration is what gives a data governance program its power.  Its energy binds us together.  Cooperative beings are we.  You must feel the Collaboration all around you, among the people, the data, the business process, the technology, everywhere.”

Many rightfully lament the misleading term “data governance” because it appears to put the emphasis on “governing data.”

Data governance actually governs the interactions among business processes, data, technology and, most important—people.  It is the organization’s people, empowered by high quality data and enabled by technology, who optimize business processes for superior corporate performance.  Data governance reveals how truly interconnected and interdependent the organization is, showing how everything that happens within the enterprise happens as a result of the interactions occurring among its people.

Data governance provides the framework for the communication and collaboration of business, data, and technical stakeholders, and establishes an enterprise-wide understanding of the roles and responsibilities involved, and the accountability required to support the organization’s business activities, and materialize the value of the enterprise’s data as positive business impacts.

Enforcing data governance policies with command and control is the quick and easy path—to failure.  Principles, not policies, are what truly give a data governance program its power.  Communication and collaboration are the two most powerful principles.

“May the Collaboration be with your Data Governance program.  Always.”

Always in Motion is the Future

“Be mindful of the future, but not at the expense of the moment.  Keep your concentration here and now, where it belongs.”

Perhaps the strongest case against bureaucracy in data governance is the business agility that is necessary for an organization to survive and thrive in today’s highly competitive and rapidly evolving marketplace.  The organization must follow what works for as long as it works, but without being afraid to adjust as necessary when circumstances inevitably change.

Change is the only galactic constant, which is why data governance policies can never be cast in stone (or frozen in carbonite).

Will a well-implemented data governance strategy continue to be successful?  Difficult to see.  Always in motion is the future.  And this is why, when it comes to deliberately designing a data governance program for agility: “Do or do not.  There is no try.”

Click here to read Rob “Darth” Karel’s blog post entry in this data governance debate

Please feel free to also post a comment below and explain your vote or simply share your opinions and experiences.

Listen to Data Governance Star Wars on OCDQ Radio — In Part 1, Rob Karel and I discuss our blog mock debate, which is followed by a brief Star Wars themed intermission, and then in Part 2, Gwen Thomas joins us to provide her excellent insights.

Got Data Quality?

I have written many blog posts about how it’s neither a realistic nor a required data management goal to achieve data perfection, i.e., 100% data quality or zero defects.

Of course, this admonition logically invites the questions:

If achieving 100% data quality isn’t the goal, then what is?

99%?

98%?

As I was pondering these questions while grocery shopping, I walked down the dairy aisle casually perusing the wide variety of milk options, when the thought occurred to me that data quality issues have a lot in common with the fat content of milk.

The classification of the percentage of fat (more specifically butterfat) in milk varies slightly by country.  In the United States, whole milk is approximately 3.25% fat, whereas reduced fat milk is 2% fat, low fat milk is 1% fat, and skim milk is 0.5% fat.

Reducing the total amount of fat (especially saturated and trans fat) is a common recommendation for a healthy diet.  Likewise, reducing the total amount of defects (i.e., data quality issues) is a common recommendation for a healthy data management strategy.  However, just like it would be unhealthy to remove all of the fat from your diet (because some fatty acids are essential nutrients that can’t be derived from other sources), it would be unhealthy to attempt to remove all of the defects from your data.

So maybe your organization is currently drinking whole data (i.e., 3.25% defects or 96.75% data quality) and needs to consider switching to reduced defect data (i.e., 2% defects or 98% data quality), low defect data (i.e., 1% defects or 99% data quality), or possibly even skim data (i.e., 0.5% defects or 99.5% data quality).

No matter what your perspective is regarding the appropriate data quality goal for your organization, at the very least, I think that we can all agree that all of our enterprise data management initiatives have to ask the question: “Got Quality?”

 

Related Posts

The Dichotomy Paradox, Data Quality and Zero Defects

The Asymptote of Data Quality

To Our Data Perfectionists

The Real Data Value is Business Insight

Is your data complete and accurate, but useless to your business?

Thaler’s Apples and Data Quality Oranges

Data Quality and The Middle Way

Missed It By That Much

The Data Quality Goldilocks Zone

You Can’t Always Get the Data You Want

Data Quality Practices—Activate!

This is a screen capture of the results of last month’s unscientific poll about proactive data quality versus reactive data quality alongside one of my favorite (this is the third post I’ve used it in) graphics of the Wonder Twins (Zan and Jayna) with Gleek.

Although reactive (15 combined votes) easily defeated proactive (6 combined votes) in the poll, proactive versus reactive is one debate that will likely never end.  However, the debate makes it seem as if we are forced to choose one approach over the other.

Generally speaking, most recommended data quality practices advocate implementing proactive defect prevention and avoiding reactive data cleansing.  But as Graham Rhind commented, data quality is neither exclusively proactive nor exclusively reactive.

“And if you need proof, start looking at the data,” Graham explained.  “For example, gender.  To produce quality data, a gender must be collected and assigned proactively, i.e., at the data collection stage.  Gender coding reactively on the basis of, for example, name, only works correctly and with certainty in a certain percentage of cases (that percentage always being less than 100).  Reactive data quality in that case can never be the best practice because it can never produce the best data quality, and, depending on what you do with your data, can be very damaging.”

“On the other hand,” Graham continued, “the real world to which the data is referring changes.  People move, change names, grow old, die.  Postal code systems and telephone number systems change.  Place names change, countries come and go.  In all of those cases, a reactive process is the one that will improve data quality.”

“Data quality is a continuous process,” Graham concluded.  From his perspective, a realistic data quality practice advocates being “proactive as much as possible, and reactive to keep up with a dynamic world.  Works for me, and has done well for decades.”

I agree with Graham because, just like any complex problem, data quality has no fast and easy solution.  In my experience, a hybrid discipline is always required, combining proactive and reactive approaches into one continuous data quality practice.

Or as Zan (representing Proactive) and Jayna (representing Reactive) would say: “Data Quality Practices—Activate!”

And as Gleek would remind us: “The best data quality practices remain continuously active.”

 

Related Posts

How active is your data quality practice?

The Data Quality Wager

The Dichotomy Paradox, Data Quality and Zero Defects

Retroactive Data Quality

A Tale of Two Q’s

What going to the dentist taught me about data quality

Groundhog Data Quality Day

Hyperactive Data Quality (Second Edition)

The General Theory of Data Quality

What Data Quality Technology Wants

MacGyver: Data Governance and Duct Tape

To Our Data Perfectionists

Finding Data Quality

The Dichotomy Paradox, Data Quality and Zero Defects

As Joseph Mazur explains in Zeno’s Paradox, the ancient Greek philosopher Zeno constructed a series of logical paradoxes to prove that motion is impossible, which today remain on the cutting edge of our investigations into the fabric of space and time.

One of the paradoxes is known as the Dichotomy:

“A moving object will never reach any given point, because however near it may be, it must always first accomplish a halfway stage, and then the halfway stage of what is left and so on, and this series has no end.  Therefore, the object can never reach the end of any given distance.”

Of course, this paradox sounds silly.  After all, reaching a given point like the finish line in a race is reachable in real life since people win races all the time.  However, in theory, the mathematics is maddeningly sound, since it creates an infinite series of steps between the starting point and the finish line—and an infinite number of steps creates a journey that can never end.

Furthermore, this theoretical race cannot even begin, since in order to reach the first step, the recursive nature of this paradox proves that we would never reach the point of completing the first step.  Hence, the paradoxical conclusion is any travel over any finite distance can neither be completed nor begun, and so all motion must be an illusion.  Some of the greatest minds in history (from Galileo to Einstein to Stephen Hawking) have tackled the Dichotomy Paradox—but without being able to disprove it.

Data Quality and Zero Defects

The given point that many enterprise initiatives attempt to reach with data quality is 100% with a metric such as data accuracy.  Leaving aside (in this post) the fact that any data quality metric without a tangible business context provides no business value, 100% data quality (aka Zero Defects) is an unreachable destination—no matter how close you get or how long you try to reach it.

Zero Defects is a laudable goal—but its theory and practice comes from manufacturing quality.  However, I have always been of the opinion, unpopular among some of my peers, that manufacturing quality and data quality are very different disciplines, and although there is much to be learned from studying the theories of manufacturing quality, I believe that brute forcing those theories onto data quality is impractical and fundamentally flawed (and I’ve even said so in verse: To Our Data Perfectionists).

The given point that enterprise initiatives should actually be attempting to reach is data-driven solutions for business problems.

Advocates of Zero Defects argue that, in theory, defect-free data should be fit to serve as the basis for every possible business use, enabling a data-driven solution for any business problem.  However, in practice, business uses for data, as well as business itself, is always evolving.  Therefore, business problems are dynamic problems that do not have—nor do they require—perfect solutions.

Although the Dichotomy Paradox proves motion is theoretically impossible, our physical motion practically proves otherwise.  Has your data quality practice become motionless by trying to prove that Zero Defects is more than just theoretically possible?

How active is your data quality practice?

My recent blog post The Data Quality Wager received a provocative comment from Richard Ordowich that sparked another round of discussion and debate about proactive data quality versus reactive data quality in the LinkedIn Group for the IAIDQ.

“Data quality is a reactive practice,” explained Ordowich.  “Perhaps that is not what is professed in the musings of others or the desired outcome, but it is nevertheless the current state of the best practices.  Data profiling and data cleansing are after the fact data quality practices.  The data is already defective.  Proactive defect prevention requires a greater discipline and changes to organizational behavior that is not part of the current best practices.  This I suggest is wishful thinking at this point in time.”

“How can data quality practices,” C. Lwanga Yonke responded, “that do not include proactive defect prevention (with the required discipline and changes to organizational behavior) be considered best practices?  Seems to me a data quality program must include these proactive activities to be considered a best practice.  And from what I see, there are many such programs out there.  True, they are not the majority—but they do exist.”

After Ordowich requested real examples of proactive data quality practices, Jayson Alayay commented “I have implemented data quality using statistical process control techniques where expected volumes and ratios are predicted using forecasting models that self-adjust using historical trends.  We receive an alert when significant deviations from forecast are detected.  One of our overarching data quality goals is to detect a significant data issue as soon as it becomes detectable in the system.”

“It is possible,” replied Ordowich, “to estimate the probability of data errors in data sets based on the currency (freshness) and usage of the data.  The problem is this process does not identify the specific instances of errors just the probability that an error may exist in the data set.  These techniques only identify trends not specific instances of errors.  These techniques do not predict the probability of a single instance data error that can wreak havoc.  For example, the ratings of mortgages was a systemic problem, which data quality did not address.  Yet the consequences were far and wide.  Also these techniques do not predict systemic quality problems related to business policies and processes.  As a result, their direct impact on the business is limited.”

“For as long as human hands key in data,” responded Alayay, “a data quality implementation to a great extent will be reactive.  Improving data quality not only pertains to detection of defects, but also enhancement of content, e.g., address standardization, geocoding, application of rules and assumptions to replace missing values, etc.  With so many factors in play, a real life example of a proactive data quality implementation that suits what you’re asking for may be hard to pinpoint.  My opinion is that the implementation of ‘comprehensive’ data quality programs can have big rewards and big risks.  One big risk is that it can slow time-to-market and kill innovation because otherwise talented people would be spending a significant amount of their time complying with rules and standards in the name of improving data quality.”

“When an organization embarks on a new project,” replied Ordowich, “at what point in the conversation is data quality discussed?  How many marketing plans, new product development plans, or even software development plans have you seen include data quality?  Data quality is not even an afterthought in most organizations, it is ignored.  Data quality is not even in the vocabulary until a problem occurs.  Data quality is not part of the culture or behaviors within most organizations.”

 

 

Please feel free to post a comment below and explain your vote or simply share your opinions and experiences.

 

Related Posts

A Tale of Two Q’s

What going to the dentist taught me about data quality

Groundhog Data Quality Day

Hyperactive Data Quality (Second Edition)

The General Theory of Data Quality

What Data Quality Technology Wants

MacGyver: Data Governance and Duct Tape

To Our Data Perfectionists

Finding Data Quality

Retroactive Data Quality