The Data Quality Wager

Gordon Hamilton emailed me with an excellent recommended topic for a data quality blog post:

“It always seems crazy to me that few executives base their ‘corporate wagers’ on the statistical research touted by data quality authors such as Tom Redman, Jack Olson and Larry English that shows that 15-45% of the operating expense of virtually all organizations is WASTED due to data quality issues.

So, if every organization is leaving 15-45% on the table each year, why don’t they do something about it?  Philip Crosby says that quality is free, so why do the executives allow the waste to go on and on and on?  It seems that if the shareholders actually think about the Data Quality Wager they might wonder why their executives are wasting their shares’ value.  A large portion of that 15-45% could all go to the bottom line without a capital investment.

I’m maybe sounding a little vitriolic because I’ve been re-reading Deming’s Out of the Crisis and he has a low regard for North American industry because they won’t move beyond their short-term goals to build a quality organization, let alone implement Deming’s 14 principles or Larry English’s paraphrasing of them in a data quality context.”

The Data Quality Wager

Gordon Hamilton explained in his email that his reference to the Data Quality Wager was an allusion to Pascal’s Wager, but what follows is my rendering of it in a data quality context (i.e., if you don’t like what follows, please yell at me, not Gordon).

Although I agree with Gordon, I also acknowledge that convincing your organization to invest in data quality initiatives can be a hard sell.  A common mistake is not framing the investment in data quality initiatives using business language such as mitigated risks, reduced costs, or increased revenue.  I also acknowledge the reality of the fiscal calendar effect and how most initiatives increase short-term costs based on the long-term potential of eventually mitigating risks, reducing costs, or increasing revenue.

Short-term increased costs of a data quality initiative can include the purchase of data quality software and its maintenance fees, as well as the professional services needed for training and consulting for installation, configuration, application development, testing, and production implementation.  And there are often additional short-term increased costs, both external and internal.

Please note that I am talking about the costs of proactively investing in a data quality initiative before any data quality issues have manifested that would prompt reactively investing in a data cleansing project.  Although, either way, the short-term increased costs are the same, I am simply acknowledging the reality that it is always easier for a reactive project to get funding than it is for a proactive program to get funding—and this is obviously not only true for data quality initiatives.

Therefore, the organization has to evaluate the possible outcomes of proactively investing in data quality initiatives while also considering the possible existence of data quality issues (i.e., the existence of tangible business-impacting data quality issues):

WindowsLiveWriter-TheDataQualityWager_BA5E-
  1. Invest in data quality initiatives + Data quality issues exist = Decreased risks and (eventually) decreased costs

  2. Invest in data quality initiatives + Data quality issues do not exist = Only increased costs — No ROI

  3. Do not invest in data quality initiatives + Data quality issues exist = Increased risks and (eventually) increased costs

  4. Do not invest in data quality initiatives + Data quality issues do not exist = No increased costs and no increased risks

Data quality professionals, vendors, and industry analysts all strongly advocate #1 — and all strongly criticize #3.  (Additionally, since we believe data quality issues exist, most “orthodox” data quality folks generally refuse to even acknowledge #2 and #4.)

Unfortunately, when advocating #1, we often don’t effectively sell the business benefits of data quality, and when criticizing #3, we often focus too much on the negative aspects of not investing in data quality.

Only #4 “guarantees” neither increased costs nor increased risks by gambling on not investing in data quality initiatives based on the belief that data quality issues do not exist—and, by default, this is how many organizations make the Data Quality Wager.

How is your organization making the Data Quality Wager?

Zig-Zag-Diagonal Data Governance

This is a screen capture of the results of last month’s unscientific poll about the best way to approach data governance, which requires executive sponsorship and a data governance board for the top-down-driven activities of funding, policy making and enforcement, decision rights, and arbitration of conflicting business priorities as well as organizational politics—but also requires data stewards and other grass roots advocates for the bottom-up-driven activities of policy implementation, data remediation, and process optimization, all led by the example of peer-level change agents adopting the organization’s new best practices for data quality management, business process management, and technology management.

Hybrid Approach (starting Top-Down) won by a slim margin, but overall the need for a hybrid approach to data governance was the prevailing consensus opinion, with the only real debate being whether to start data governance top-down or bottom-up.

 

Commendable Comments

Rob Drysdale commented: “Too many companies get paralyzed thinking about how to do this and implement it. (Along with the overwhelmed feeling that it is too much time/effort/money to fix it.)  But I think your poll needs another option to vote on, specifically: ‘Whatever works for the company/culture/organization’ since not all solutions will work for every organization.  In some where it is highly structured, rigid and controlled, there wouldn’t be the freedom at the grass-roots level to start something like this and it might be frowned upon by upper-level management.  In other organizations that foster grass-roots things then it could work.  However, no matter which way you can get it started and working, you need to have buy-in and commitment at all levels to keep it going and make it effective.”

Paul Fulton commented: “I definitely agree that it needs to be a combination of both.  Data Governance at a senior level making key decisions to provide air cover and Data Management at the grass-roots level actually making things happen.”

Jill Wanless commented: “Our organization has taken the Hybrid Approach (starting Bottom-Up) and it works well for two reasons: (1) the worker bee rock stars are all aligned and ready to hit the ground running, and (2) the ‘Top’ can sit back and let the ‘aligned’ worker bees get on with it.  Of course, this approach is sometimes (painfully) slow, but with the ground-level rock stars already aligned, there is less resistance implementing the policies, and the Top’s heavy hand is needed much less frequently, but I voted for Hybrid Approach (starting Top-Down) because I have less than stellar patience for the long and scenic route.”

 

Zig-Zag-Diagonal Data Governance

I definitely agree with Rob’s well-articulated points that corporate culture is the most significant variable with data governance since it determines whether starting top-down or bottom-up is the best approach for a particular organization—and no matter which way you get started, you eventually need buy-in and commitment at all levels to keep it going and make it effective.

I voted for Hybrid Approach (starting Bottom-Up) since I have seen more data governance programs get successfully started because of the key factor of grass-roots alignment minimizing resistance to policy implementation, as Jill’s comment described.

And, of course, I agree with Paul’s remark that eventually data governance will require a combination of both top-down and bottom-up aspects.  At certain times during the evolution of a data governance program, top-down aspects will be emphasized, and at other times, bottom-up aspects will be emphasized.  However, it is unlikely that any long-term success can be sustained by relying exclusively on either a top-down-only or a bottom-up-only approach to data governance.

Let’s stop debating top-down versus bottom-up data governance—and start embracing Zig-Zag-Diagonal Data Governance.

 

Data Governance “Next Practices”

Phil Simon and I co-host and co-produce the wildly popular podcast Knights of the Data Roundtable, a bi-weekly data management podcast sponsored by the good folks at DataFlux, a SAS Company.

On Episode 5, our special guest, best-practice expert, and all-around industry thought leader Jill Dyché discussed her excellent framework for data governance “next practices” called The 5 + 2 Model.

 

Related Posts

Beware the Data Governance Ides of March

Data Governance and the Buttered Cat Paradox

Twitter, Data Governance, and a #ButteredCat #FollowFriday

A Tale of Two G’s

The Collaborative Culture of Data Governance

Connect Four and Data Governance

Quality and Governance are Beyond the Data

Podcast: Data Governance is Mission Possible

Video: Declaration of Data Governance

Don’t Do Less Bad; Do Better Good

Jack Bauer and Enforcing Data Governance Policies

The Prince of Data Governance

MacGyver: Data Governance and Duct Tape

Data Governance and the Buttered Cat Paradox

One of the most common questions about data governance is:

What is the best way to approach it—top-down or bottom-up?

The top-down approach is where executive sponsorship and the role of the data governance board is emphasized.

The bottom-up approach is where data stewardship and the role of peer-level data governance change agents is emphasized.

This debate reminds me of the buttered cat paradox (shown to the left as illustrated by Greg Williams), which is a thought experiment combining the two common adages: “cats always land on their feet” and “buttered toast always lands buttered side down.”

In other words, if you strapped buttered toast (butter side up) on the back of a cat and then dropped it from a high height (Please Note: this is only a thought experiment, so no cats or toast are harmed), presumably the very laws of physics would be suspended, leaving our fearless feline of the buttered-toast-paratrooper brigade hovering forever in midair, spinning in perpetual motion, as both the buttered side of the toast and the cat’s feet attempt to land on the ground.

It appears that the question of either a top-down or a bottom-up approach with data governance poses a similar paradox.

Data governance will require executive sponsorship and a data governance board for the top-down-driven activities of funding, policy making and enforcement, decision rights, and arbitration of conflicting business priorities as well as organizational politics.

However, data governance will also require data stewards and other grass roots advocates for the bottom-up-driven activities of policy implementation, data remediation, and process optimization, all led by the example of peer-level change agents adopting the organization’s new best practices for data quality management, business process management, and technology management.

Therefore, recognizing the eventual need for aspects of both a top-down and a bottom-up approach with data governance can leave an organization at a loss to understand where to begin, hovering forever in mid-decision, spinning in perpetual thought, unable to land a first footfall on their data governance journey—and afraid of falling flat on the buttered side of their toast.

Although data governance is not a thought experiment, planning and designing your data governance program does require thought, and perhaps some experimentation, in order to discover what will work best for your organization’s corporate culture.

What do you think is the best way to approach data governance? Please feel free to post a comment below and explain your vote or simply share your opinions and experiences.

Thaler’s Apples and Data Quality Oranges

In the opening chapter of his book Carrots and Sticks, Ian Ayres recounts the story of Thaler’s Apples:

“The behavioral revolution in economics began in 1981 when Richard Thaler published a seven-page letter in a somewhat obscure economics journal, which posed a pretty simple choice about apples.

Which would you prefer:

(A) One apple in one year, or

(B) Two apples in one year plus one day?

This is a strange hypothetical—why would you have to wait a year to receive an apple?  But choosing is not very difficult; most people would choose to wait an extra day to double the size of their gift.

Thaler went on, however, to pose a second apple choice.

Which would you prefer:

(C) One apple today, or

(D) Two apples tomorrow?

What’s interesting is that many people give a different, seemingly inconsistent answer to this second question.  Many of the same people who are patient when asked to consider this choice a year in advance turn around and become impatient when the choice has immediate consequences—they prefer C over D.

What was revolutionary about his apple example is that it illustrated the plausibility of what behavioral economists call ‘time-inconsistent’ preferences.  Richard was centrally interested in the people who chose both B and C.  These people, who preferred two apples in the future but one apple today, flipped their preferences as the delivery date got closer.”

What does this have to do with data quality?  Give me a moment to finish eating my second apple, and then I will explain . . .

 

Data Quality Oranges

Let’s imagine that an orange represents a unit of measurement for data quality, somewhat analogous to data accuracy, such that the more data quality oranges you have, the better the quality of data is for your needs—let’s say for making a business decision.

Which would you prefer:

(A) One data quality orange in one month, or

(B) Two data quality oranges in one month plus one day?

(Please Note: Due to the strange uncertainties of fruit-based mathematics, two data quality oranges do not necessarily equate to a doubling of data accuracy, but two data quality oranges are certainly an improvement over one data quality orange).

Now, of course, on those rare occasions when you can afford to wait a month or so before making a critical business decision, most people would choose to wait an extra day in order to improve their data quality before making their data-driven decision.

However, let’s imagine you are feeling squeezed by a more pressing business decision—now which would you prefer:

(C) One data quality orange today, or

(D) Two data quality oranges tomorrow?

In my experience with data quality and business intelligence, most people prefer B over A—and C over D.

This “time-inconsistent” data quality preference within business intelligence reflects the reality that with the speed at which things change these days, more real-time business decisions are required—perhaps making speed more important than quality.

In a recent Data Knights Tweet Jam, Mark Lorion pondered speed versus quality within business intelligence, asking: “Is it better to be perfect in 30 days or 70% today?  Good enough may often be good enough.”

To which Henrik Liliendahl Sørensen responded with the perfectly pithy wisdom: “Good, Fast, Decision—Pick any two.”

However, Steve Dine cautioned that speed versus quality is decision dependent: “70% is good when deciding how many pencils to order, but maybe not for a one billion dollar acquisition.”

Mark’s follow-up captured the speed versus quality tradeoff succinctly with “Good Now versus Great Later.”  And Henrik added the excellent cautionary note: “Good decision now, great decision too late—especially if data quality is not a mature discipline.”

 

What Say You?

How many data quality oranges do you think it takes?  Or for those who prefer a less fruitful phrasing, where do you stand on the speed versus quality debate?  How good does data quality have to be in order to make a good data-driven business decision?

 

Related Posts

To Our Data Perfectionists

DQ-Tip: “There is no such thing as data accuracy...”

DQ-Tip: “Data quality is primarily about context not accuracy...”

Data Quality and the Cupertino Effect

The Real Data Value is Business Insight

Is your data complete and accurate, but useless to your business?

Data In, Decision Out

The Data-Decision Symphony

Data!

You Can’t Always Get the Data You Want

Alternatives to Enterprise Data Quality Tools

The recent analysis by Andy Bitterer of Gartner Research (and ANALYSTerical) about the acquisition of open source data quality tool DataCleaner by the enterprise data quality vendor Human Inference, prompted the following Twitter conversation:

Since enterprise data quality tools can be cost-prohibitive, more prospective customers are exploring free and/or open source alternatives, such as the Talend Open Profiler, licensed under the open source General Public License, or non-open source, but entirely free alternatives, such as the Ataccama DQ Analyzer.  And, as Andy noted in his analysis, both of these tools offer an easy transition to the vendors’ full-fledged commercial data quality tools, offering more than just data profiling functionality.

As Henrik Liliendahl Sørensen explained, in his blog post Data Quality Tools Revealed, data profiling is the technically easiest part of data quality, which explains the tool diversity, and early adoption of free and/or open source alternatives.

And there are also other non-open source alternatives that are more affordable than enterprise data quality tools, such as Datamartist, which combines data profiling and data migration capabilities into an easy-to-use desktop application.

My point is neither to discourage the purchase of enterprise data quality tools, nor promote their alternatives—and this blog post is certainly not an endorsement—paid or otherwise—of the alternative data quality tools I have mentioned simply as examples.

My point is that many new technology innovations originate from small entrepreneurial ventures, which tend to be specialists with a narrow focus that can provide a great source of rapid innovation.  This is in contrast to the data management industry trend of innovation via acquisition and consolidation, embedding data quality technology within data management platforms, which also provide data integration and master data management (MDM) functionality as well, allowing the mega-vendors to offer end-to-end solutions and the convenience of one-vendor information technology shopping.

However, most software licenses for these enterprise data management platforms start in the six figures.  On top of the licensing, you have to add the annual maintenance fees, which are usually in the five figures.  Add to the total cost of the solution, the professional services that are needed for training and consulting for installation, configuration, application development, testing, and production implementation—and you have another six figure annual investment.

Debates about free and/or open source software usually focus on the robustness of functionality and the intellectual property of source code.  However, from my perspective, I think that the real reason more prospective customers are exploring these alternatives to enterprise data quality tools is because of the free aspect—but not because of the open source aspect.

In other words—and once again I am only using it as an example—I might download Talend Open Profiler because I wanted data profiling functionality at an affordable price—but not because I wanted the opportunity to customize its source code.

I believe the “try it before you buy it” aspect of free and/or open source software is what’s important to prospective customers.

Therefore, enterprise data quality vendors, instead of acquiring an open source tool as Human Inference did with DataCleaner, how about offering a free (with limited functionality) or trial version of your enterprise data quality tool as an alternative option?

 

Related Posts

Do you believe in Magic (Quadrants)?

Can Enterprise-Class Solutions Ever Deliver ROI?

Which came first, the Data Quality Tool or the Business Need?

Selling the Business Benefits of Data Quality

What Data Quality Technology Wants

Has Data Become a Four-Letter Word?

In her excellent blog post 'The Bad Data Ate My Homework' and Other IT Scapegoating, Loraine Lawson explained how “there are a lot of problems that can be blamed on bad data.  I suspect it would be fair to say that there’s a good percentage of problems we don’t even know about that can be blamed on bad data and a lack of data integration, quality and governance.”

Lawson examined whether bad data could have been the cause of the bank foreclosure fiasco, as opposed to, as she concludes, the more realistic causes being bad business and negligence, which, if not addressed, could lead to another global financial crisis.

“Bad data,” Lawson explained, “might be the most ubiquitous excuse since ‘the dog ate my homework.’  But while most of us would laugh at the idea of blaming the dog for missing homework, when someone blames the data, we all nod our heads in sympathy, because we all know how troublesome computers are.  And then the buck gets (unfairly) passed to IT.”

Unfairly blaming IT, or technology in general, when poor data quality negatively impacts business performance is ignoring the organization’s collective ownership of its problems, and its shared responsibility for the solutions to those problems, and causes, as Lawson explained in Data’s Conundrum: Everybody Wants Control, Nobody Wants Responsibility, an “unresolved conflict on both the business and the IT side over data ownership and its related issues, from stewardship to governance.”

In organizations suffering from this unresolved conflict between IT and the Business—a dysfunctional divide also known as the IT-Business Chasm—bad data becomes the default scapegoat used by both sides.

Perhaps, in a strange way, placing the blame on bad data is progress when compared with the historical notions of data denial, when an organization’s default was to claim that it had no data quality issues whatsoever.

However, admitting bad data not only exists, but that bad data is also having a tangible negative impact on business performance doesn’t seem to have motivated organizations to take action.  Instead, many appear to prefer practicing bad data blamestorming, where the Business blames bad data on IT and its technology, and IT blames bad data on the Business and its business processes.

Or perhaps, by default, everyone just claims that “the bad data ate my homework.”

Are your efforts to convince executive management that data needs to treated like a five-letter word (“asset”) being undermined by the fact that data has become a four-letter word in your organization?

 

Related Posts

The Business versus IT—Tear down this wall!

Quality and Governance are Beyond the Data

Data In, Decision Out

The Data-Decision Symphony

The Reptilian Anti-Data Brain

Hell is other people’s data

Promoting Poor Data Quality

Who Framed Data Entry?

Data, data everywhere, but where is data quality?

The Circle of Quality

The Asymptote of Data Quality

In analytic geometry (according to Wikipedia), an asymptote of a curve is a line such that the distance between the curve and the line approaches zero as they tend to infinity.  The inspiration for my hand-drawn illustration was a similar one (not related to data quality) in the excellent book Linchpin: Are You Indispensable? by Seth Godin, which describes an asymptote as:

“A line that gets closer and closer and closer to perfection, but never quite touches.”

“As you get closer to perfection,” Godin explains, “it gets more and more difficult to improve, and the market values the improvements a little bit less.  Increasing your free-throw percentage from 98 to 99 percent may rank you better in the record books, but it won’t win any more games, and the last 1 percent takes almost as long to achieve as the first 98 percent did.”

The pursuit of data perfection is a common debate in data quality circles, where it is usually known by the motto:

“The data will always be entered right, the first time, every time.”

However, Henrik Liliendahl Sørensen has cautioned that even when this ideal can be achieved, we must still acknowledge the inconvenient truth that things change, and Evan Levy has reminded us that data quality isn’t the same as data perfection, and David Loshin has used the Pareto principle to describe the point of diminishing returns in data quality improvements.

Chasing data perfection can be a powerful motivation, but it can also undermine the best of intentions.  Not only is it important to accept that the Asymptote of Data Quality can never be reached, but we must realize that data perfection was never the goal.

The goal is data-driven solutions for business problems—and these dynamic problems rarely have (or require) a perfect solution.

Data quality practitioners must strive for continuous data quality improvement, but always within the business context of data, and without losing themselves in the pursuit of a data-myopic ideal such as data perfection.

 

Related Posts

To Our Data Perfectionists

The Data-Decision Symphony

Is your data complete and accurate, but useless to your business?

Finding Data Quality

MacGyver: Data Governance and Duct Tape

You Can’t Always Get the Data You Want

What going to the dentist taught me about data quality

A Tale of Two Q’s

Data Quality and The Middle Way

Hyperactive Data Quality (Second Edition)

Missed It By That Much

The Data Quality Goldilocks Zone

What Data Quality Technology Wants

This is a screen capture of the results of last month’s unscientific data quality poll where it was noted that viewpoints about the role of data quality technology (i.e., what data quality technology wants) are generally split between two opposing perspectives:

  1. Technology enables a data quality process, but doesn’t obviate the need for people (e.g., data stewards) to remain actively involved and be held accountable for maintaining the quality of data.
  2. Technology automates a data quality process, and a well-designed and properly implemented technical solution obviates the need for people to be actively involved after its implementation.

 

Commendable Comments

Henrik Liliendahl Sørensen voted for enable, but commented he likes to say enables by automating the time consuming parts, an excellent point which he further elaborated on in two of his recent blog posts: Automation and Technology and Maturity.

Garnie Bolling commented that he believes people will always be part of the process, especially since data quality has so many dimensions and trends, and although automated systems can deal with what he called fundamental data characteristics, an automated system can not change with trends or the ongoing evolution of data.

Frank Harland commented that automation can and has to take over the tedious bits of work (e.g., he wouldn't want to type in all those queries that can be automated by data profiling tools), but to get data right, we have to get processes right, get data architecture right, get culture and KPI’s right, and get a lot of the “right” people to do all the hard work that has to be done.

Chris Jackson commented that what an organization really needs is quality data processes not data quality processes, and once the focus is on treating the data properly rather than catching and remediating poor data, you can have a meaningful debate about the relative importance of well-trained and motivated staff vs. systems that encourage good data behavior vs. replacing fallible people with standard automated process steps.

Alexa Wackernagel commented that when it comes to discussions about data migration and data quality with clients, she often gets the requirement—or better to call it the dream—for automated processes, but the reality is that data handling needs easy accessible technology to enable data quality.

Thanks to everyone who voted and special thanks to everyone who commented.  As always, your feedback is greatly appreciated.

 

What Data Quality Technology Wants: Enable and Automate


“Data Quality Powers—Activate!”


“I’m sorry, Defect.  I’m afraid I can’t allow that.”

I have to admit that my poll question was flawed (as my friend HAL would say, “It can only be attributable to human error”).

Posing the question in an either/or context made it difficult for the important role of automation within data quality processes to garner many votes.  I agree with the comments above that the role of data quality technology is to both enable and automate.

As the Wonder Twins demonstrate, data quality technology enables Zan (i.e., technical people), Jayna (i.e., business people), and  Gleek (i.e., data space monkeys, er I mean, data people) to activate one of their most important powers—collaboration.

In addition to the examples described in the comments above, data quality technology automates proactive defect prevention by providing real-time services, which greatly minimize poor data quality at the multiple points of origin within the data ecosystem, because although it is impossible to prevent every problem before it happens, the more control enforced where data originates, the better the overall enterprise data quality will be—or as my friend HAL would say:

“Putting data quality technology to its fullest possible use is all any corporate entity can ever hope to do.”

Related Posts

What Does Data Quality Technology Want?

DQ-Tip: “Data quality tools do not solve data quality problems...”

Which came first, the Data Quality Tool or the Business Need?

Data Quality Industry: Problem Solvers or Enablers?

Data Quality Magic

The Tooth Fairy of Data Quality

Data Quality is not a Magic Trick

Do you believe in Magic (Quadrants)?

Pirates of the Computer: The Curse of the Poor Data Quality

DQ-BE: Single Version of the Time

Data Quality By Example (DQ-BE) is an OCDQ regular segment that provides examples of data quality key concepts.

Photo via Flickr by: Leo Reynolds

Like truth, beauty, and singing ability, data quality is in the eyes of the beholder.

Data’s quality is determined by evaluating its fitness for the purpose of use.  However, in the vast majority of cases, data has multiple uses, and data of sufficient quality for one use may not be of sufficient quality for other uses.

Therefore, to be more accurate, data quality is in the eyes of the user.

The perspective of the user provides a relative context for data quality.  Many argue an absolute context for data quality exists, one which is independent of the often conflicting perspectives of different users.

This absolute context is often referred to as a “Single Version of the Truth.”

As one example of the challenges inherent in this data quality key concept, let’s consider if there is a “Single Version of the Time.”

 

Single Version of the Time

I am writing this blog post at 10:00 AM.  I am using time in a relative context, meaning that from my perspective it is 10 o’clock in the morning.  I live in the Central Standard time zone (CST) of the United States. 

My friend in Europe would say that I am writing this blog post at 5:00 PM.  He is also using time in a relative context, meaning that from his perspective it is 5 o’clock in the afternoon.  My friend lives in the Central European time zone (CET).

We could argue that an absolute time exists, as defined by Coordinated Universal Time (UTC).  Local times around the world can be expressed as a relative time using positive or negative offsets from UTC.  For example, my relative time is UTC-6 and my friend’s relative time is UTC+1.  Alternatively, we could use absolute time and say that I am writing this blog post at 16:00 UTC.

Although using an absolute time is an absolute necessity if, for example, my friend and I wanted to schedule a time to have a telephone (or Skype) discussion, it would be confusing to use UTC when referring to events relative to our local time zone.

In other words, the relative context of the user’s perspective is valid and an absolute context independent of the perspectives of different users is also valid—especially whenever a shared perspective is necessary in order to facilitate dialogue and discussion.

Therefore, instead of calling UTC a Single Version of the Time, we could call it a Shared Version of the Time and when it comes to the data quality concept of a Single Version of the Truth, perhaps it’s time we started calling it a Shared Version of the Truth.

 

Related Posts

Single Version of the Truth

The Quest for the Golden Copy

Beyond a “Single Version of the Truth”

The Idea of Order in Data

DQ-BE: Data Quality Airlines

DQ-Tip: “There is no such thing as data accuracy...”

Data Quality and the Cupertino Effect

DQ-Tip: “Data quality is primarily about context not accuracy...”

What Does Data Quality Technology Want?

During a recent Radiolab podcast, Kevin Kelly, author of the book What Technology Wants, used the analogy of how a flower leans toward sunlight because it “wants” the sunlight, to describe what the interweaving web of evolving technical innovations (what he refers to as the super-organism of technology) is leaning toward—in other words, what technology wants.

The other Radiolab guest was Steven Johnson, author of the book Where Good Ideas Come From, who somewhat dispelled the traditional notion of the eureka effect by explaining that the evolution of ideas, like all evolution, stumbles its way toward the next good idea, which inevitably leads to a significant breakthrough, such as what happens with innovations in technology.

Listening to this thought-provoking podcast made me ponder the question: What does data quality technology want?

In a previous post, I used the term OOBE-DQ to refer to the out-of-box-experience (OOBE) provided by data quality (DQ) tools, which usually becomes a debate between “ease of use” and “powerful functionality” after you ignore the Magic Beans sales pitch that guarantees you the data quality tool is both remarkably easy to use and incredibly powerful.

The data quality market continues to evolve away from esoteric technical tools and stumble its way toward the next good idea, which is business-empowering suites providing robust functionality with increasingly role-based user interfaces, which are tailored to the specific needs of different users.  Of course, many vendors would love to claim sole responsibility for what they would call significant innovations in data quality technology, instead of what are simply by-products of an evolving market.

The deployment of data quality functionality within and across organizations also continues to evolve, as data cleansing activities are being complemented by real-time defect prevention services used to greatly minimize poor data quality at the multiple points of origin within the enterprise data ecosystem.

However, viewpoints about the role of data quality technology generally remain split between two opposing perspectives:

  1. Technology enables a data quality process, but doesn’t obviate the need for people (e.g., data stewards) to remain actively involved and be held accountable for maintaining the quality of data.
  2. Technology automates a data quality process, and a well-designed and properly implemented technical solution obviates the need for people to be actively involved after its implementation.

Do you think that continuing advancements and innovations in data quality technology will obviate the need for people to be actively involved in data quality processes?  In the future, will we have high quality data because our technology essentially wants it and therefore leans our organizations toward high quality data?  Let’s conduct another unscientific data quality poll:

 

Additionally, please feel free to post a comment below and explain your vote or simply share your opinions and experiences.

 

Related Posts

DQ-Tip: “Data quality tools do not solve data quality problems...”

Which came first, the Data Quality Tool or the Business Need?

Data Quality Industry: Problem Solvers or Enablers?

Data Quality Magic

The Tooth Fairy of Data Quality

Data Quality is not a Magic Trick

Do you believe in Magic (Quadrants)?

Pirates of the Computer: The Curse of the Poor Data Quality

The Data Outhouse

This is a screen capture of the results of last week’s unscientific data quality poll where it was noted that in many organizations a data warehouse is the only system where data from numerous and disparate operational sources has been integrated into a single system of record containing fully integrated and historical data.  Although the rallying cry and promise of the data warehouse has long been that it will serve as the source for most of the enterprise’s reporting and decision support needs, many simply get ignored by the organization, which continues to rely on its data silos and spreadsheets for reporting and decision making.

Based on my personal experience, the most common reason is that these big boxes of data are often built with little focus on the quality of the data being delivered.  However, since that’s just my opinion, I launched the poll and invited your comments.

 

Commendable Comments

Stephen Putman commented that data warehousing “projects are usually so large that if you approach them in a big-bang, OLTP management fashion, the foundational requirements of the thing change between inception and delivery.”

“I’ve seen very few data warehouses live up to the dream,” Dylan Jones commented.  “I’ve always found that silos still persisted after a warehouse introduction because the turnaround on adding new dimensions and reports to the warehouse/mart meant that the business users simply had no option.  I think data quality obviously plays a part.  The business side only need to be burnt once or twice before they lose faith.  That said, a data warehouse is one of the best enablers of data quality motivation, so without them a lot of projects simply wouldn’t get off the ground.”

“I just voted Outhouse too,” commented Paul Drenth, “because I agree with Dylan that the business side keeps using other systems out of disappointment in the trustworthiness of the data warehouse.  I agree that bad data quality plays a role in that, but more often it’s also a lack of discipline in the organization which causes a downward spiral of missing information, and thus deciding to keep other information in a separate or local system.  So I think usability of data warehouse systems still needs to be improved significantly, also by adding invisible or automatic data quality assurance, the business might gain more trust.”

“Great point Paul, useful addition,” Dylan responded.  “I think discipline is a really important aspect, this ties in with change management.  A lot of business people simply don’t see the sense of urgency for moving their reports to a warehouse so lack the discipline to follow the procedures.  Or we make the procedures too inflexible.  On one site I noticed that whenever the business wanted to add a new dimension or category it would take a 2-3 week turnaround to sign off.  For a financial services company this was a killer because they had simply been used to dragging another column into their Excel spreadsheets, instantly getting the data they needed.  If we’re getting into information quality for a second, then the dimension of presentation quality and accessibility become far more important than things like accuracy and completeness.  Sure a warehouse may be able to show you data going back 15 years and cross validates results with surrogate sources to confirm accuracy, but if the business can’t get it in a format they need, then it’s all irrelevant.”

“I voted Data Warehouse,” commented Jarrett Goldfedder, “but this is marked with an asterisk.  I would say that 99% of the time, a data warehouse becomes an outhouse, crammed with data that serves no purpose.  I think terminology is important here, though.  In my previous organization, we called the Data Warehouse the graveyard and the people who did the analytics were the morticians.  And actually, that’s not too much of a stretch considering our job was to do CSI-type investigations and autopsies on records that didn’t fit with the upstream information.  This did not happen often, but when it did, we were quite grateful for having historical records maintained.  IMHO, if the records can trace back to the existing data and will save the organization money in the long-run, then the warehouse has served its purpose.”

“I’m having a difficult time deciding,” Corinna Martinez commented, “since most of the ones I have seen are high quality data, but not enough of it and therefore are considered Data Outhouses.  You may want to include some variation in your survey that covers good data but not enough; and bad data but lots to shift through in order to find something.”

“I too have voted Outhouse,” Simon Daniels commented, “and have also seen beautifully designed, PhD-worthy data warehouse implementations that are fundamentally of no practical use.  Part of the reason for this I think, particularly from a marketing point-of-view, which is my angle, is that how the data will be used is not sufficiently thought through.  In seeking to create marketing selections, segmentation and analytics, how will the insight locked-up in the warehouse be accessed within the context of campaign execution and subsequent response analysis?  Often sitting in splendid isolation, the data warehouse doesn’t offer the accessibility needed in day-to-day activities.”

Thanks to everyone who voted and special thanks to everyone who commented.  As always, your feedback is greatly appreciated.

 

Can MDM and Data Governance save the Data Warehouse?

During last week’s Informatica MDM Tweet Jam, Dan Power explained that master data management (MDM) can deliver to the business “a golden copy of the data that they can trust” and I remarked how companies expected that from their data warehouse.

“Most companies had unrealistic expectations from data warehouses,” Power responded, “which ended up being expensive, read-only, and updated infrequently.  MDM gives them the capability to modify the data, publish to a data warehouse, and manage complex hierarchies.  I think MDM offers more flexibility than the typical data warehouse.  That’s why business intelligence (BI) on top of MDM (or more likely, BI on top of a data warehouse that draws data from MDM) is so popular.”

As a follow-up question, I asked if MDM should be viewed as a complement or a replacement for the data warehouse.  “Definitely a complement,” Power responded. “MDM fills a void in the middle between transactional systems and the data warehouse, and does things that neither can do to data.”

In his recent blog post How to Keep the Enterprise Data Warehouse Relevant, Winston Chen explains that the data quality deficiencies of most data warehouses could be aided by MDM and data governance, which “can define and enforce data policies for quality across the data landscape.”  Chen believes that the data warehouse “is in a great position to be the poster child for data governance, and in doing so, it can keep its status as the center of gravity for all things data in an enterprise.”

I agree with Power that MDM can complement the data warehouse, and I agree with Chen that data governance can make the data warehouse (as well as many other things) better.  So perhaps MDM and data governance can save the data warehouse.

However, I must admit that I remain somewhat skeptical.  The same challenges that have caused most data warehouses to become data outhouses are also fundamental threats to the success of MDM and data governance.

 

Thinking outside the house

Just like real outhouses were eventually obsolesced by indoor plumbing, I wonder if data outhouses will eventually be obsolesced, perhaps ironically by emerging trends of outdoor plumbing, i.e., open source, cloud computing, and software as a service (SaaS).

Many industry analysts are also advocating the evolution of data as a service (DaaS), where data is taken out of all of its houses, meaning that the answer to my poll question might be neither data warehouse nor data outhouse.

Although none of these trends obviate the need for data quality nor alleviate the other significant challenges mentioned above, perhaps when it comes to data, we need to start thinking outside the house.

 

Related Posts

DQ-Poll: Data Warehouse or Data Outhouse?

Podcast: Data Governance is Mission Possible

Once Upon a Time in the Data

The Idea of Order in Data

Fantasy League Data Quality

Which came first, the Data Quality Tool or the Business Need?

Finding Data Quality

The Circle of Quality

DQ-Poll: Data Warehouse or Data Outhouse?

In many organizations, a data warehouse is the only system where data from numerous and disparate operational sources has been integrated into a single repository of enterprise data.

The rapid delivery of a single system of record containing fully integrated and historical data to be used as the source for most of the enterprise’s reporting and decision support needs has long been the rallying cry and promise of the data warehouse.

However, I have witnessed beautifully architected, elegantly implemented, and diligently maintained data warehouses simply get ignored by the organization, which continues to rely on its data silos and spreadsheets for reporting and decision making.

The most common reason is that these big boxes of data are often built with little focus on the quality of the data being delivered.

But that’s just my opinion based on my personal experience.  So let’s conduct an unscientific poll.

 

Additionally, please feel free to post a comment below and explain your vote or simply share your opinions and experiences.

Data Quality Industry: Problem Solvers or Enablers?

This morning I had the following Twitter conversation with Andy Bitterer of Gartner Research and ANALYSTerical, sparked by my previous post about Data Quality Magic, the one and only source of which I posited comes from the people involved:

 

What Say You?

Although Andy and I were just joking around, there is some truth beneath these tweets.  After all, according to Gartner research, “the market for data quality tools was worth approximately $727 million in software-related revenue as of the end of 2009, and is forecast to experience a compound annual growth rate (CAGR) of 12% during the next five years.” 

So I thought I would open this up to a good-natured debate. 

Do you think the data quality industry (software vendors, consultants, analysts, and conferences) is working harder to solve the problem of poor data quality or perpetuate the profitability of its continued existence?

All perspectives on this debate are welcome without bias.  Therefore, please post a comment below.

(Please Note: Comments advertising your products and services (or bashing your competitors) will NOT be approved.)

 

Related Posts

Which came first, the Data Quality Tool or the Business Need?

Do you believe in Magic (Quadrants)?

Can Enterprise-Class Solutions Ever Deliver ROI?

Promoting Poor Data Quality

The Once and Future Data Quality Expert

Imagining the Future of Data Quality

Darth Data

Darth Tater

While I was grocery shopping today, I couldn’t resist taking this picture of Darth Tater.

As the Amazon product review explains: “Be it a long time ago, in a galaxy far, far away or right here at home in the 21st century, Mr. Potato Head never fails to reinvent himself.”

I couldn’t help but think of how although data’s quality is determined by evaluating its fitness for the purpose of business use, most data has multiple business uses, and data of sufficient quality for one use may not be for other, perhaps unintended, business uses.

It is this “Reinventing data for mix and match business fun!” that often provides the context for what, in hindsight, appear to be obvious data quality issues.

It makes me wonder if it’s possible to turn high quality data to the dark side of the Force by misusing it for a business purpose for which it has no applicability, resulting in bad, albeit data-driven, business decisions.

Please post a comment and let me know if you think it is possible to turn Data-kin Quality-walker into Darth Data.

May the Data Quality be with you, always.

To Our Data Perfectionists

Had our organization but money enough, and time,
This demand for Data Perfection would be no crime.

We would sit down and think deep thoughts about all the wonderful ways,
To best model our data and processes, as slowly passes our endless days.
Freed from the Herculean Labors of Data Cleansing, we would sing the rhyme:
“The data will always be entered right, the first time, every time.”

We being exclusively Defect Prevention inclined,
Would only rubies within our perfected data find.
Executive Management would patiently wait for data that’s accurate and complete,
Since with infinite wealth and time, they would never fear the balance sheet.

Our vegetable enterprise data architecture would grow,
Vaster than empires, and more slow.

One hundred years would be spent lavishing deserved praise,
On our brilliant data model, upon which, with wonder, all would gaze.
Two hundred years to adore each and every defect prevention test,
But thirty thousand years to praise Juran, Deming, English, Kaizen, Six Sigma, and all the rest.
An age at least to praise every part of our flawless data quality methodology,
And the last age we would use to write our self-aggrandizing autobiography.

For our Corporate Data Asset deserves this Perfect State,
And we would never dare to love our data at any lower rate.

But at my back I always hear,
Time’s winged chariot hurrying near.

And if we do not address the immediate business needs,
Ignored by us while we were lost down in the data weeds.
Our beautiful enterprise data architecture shall no more be found,
After our Data Perfectionists’ long delay has run our company into the ground.

Because building a better tomorrow at the expense of ignoring today,
Has even with our very best of intentions, caused us to lose our way.
And all our quaint best practices will have turned to dust,
As burnt into ashes will be all of our business users’ trust.

Now, it is true that Zero Defects is a fine and noble goal,
For Manufacturing Quality—YES, but for Data Quality—NO.

We must aspire to a more practical approach, providing a critical business problem solving service,
Improving data quality, not for the sake of our data, but for the fitness of its business purpose.
Instead of focusing on only the bad we have done, forcing us to wear The Scarlet DQ Letter,
Let us focus on the good we are already doing, so from it we can learn how to do even better.

And especially now, while our enterprise-wide collaboration conspires,
To help us grow our Data Governance Maturity beyond just fighting fires.
Therefore, let us implement Defect Prevention wherever and whenever we can,
But also accept that Data Cleansing will always be an essential part of our plan.

Before our organization’s limited money and time are devoured,
Let us make sure that our critical business decisions are empowered.

Let us also realize that since change is the only universal constant,
Real best practices are not cast in stone, but written on parchment.
Because the business uses for our data, as well as our business itself, continues to evolve,
Our data strategy must be adaptation, allowing our dynamic business problems to be solved.

Thus, although it is true that we can never achieve Data Perfection,
We can deliver Business Insight, which always is our true direction.

___________________________________________________________________________________________________________________

This blog post was inspired by the poem To His Coy Mistress by Andrew Marvell.