July 01, 2016

Data Governance Frameworks are like Jigsaw Puzzles

July 01, 2016/ Jim Harris

In a recent interview, Jill Dyché explained a common misconception, namely that a data governance framework is not a strategy. “Unlike other strategic initiatives that involve IT,” Jill explained, “data governance needs to be designed. The cultural factors, the workflow factors, the organizational structure, the ownership, the political factors, all need to be accounted for when you are designing a data governance roadmap.”

“People need a mental model, that is why everybody loves frameworks,” Jill continued. “But they are not enough and I think the mistake that people make is that once they see a framework, rather than understanding its relevance to their organization, they will just adapt it and plaster it up on the whiteboard and show executives without any kind of context. So they are already defeating the purpose of data governance, which is to make it work within the context of your business problems, not just have some kind of mental model that everybody can agree on, but is not really the basis for execution.”

“So it’s a really, really dangerous trend,” Jill cautioned, “that we see where people equate strategy with framework because strategy is really a series of collected actions that result in some execution — and that is exactly what data governance is.”

And in her excellent article Data Governance Next Practices: The 5 + 2 Model, Jill explained that data governance requires a deliberate design so that the entire organization can buy into a realistic execution plan, not just a sound bite. As usual, I agree with Jill, since, in my experience, many people expect a data governance framework to provide eureka-like moments of insight.

In The Myths of Innovation, Scott Berkun debunked the myth of the eureka moment using the metaphor of a jigsaw puzzle.

“When you put the last piece into place, is there anything special about that last piece or what you were wearing when you put it in?” Berkun asked. “The only reason that last piece is significant is because of the other pieces you’d already put into place. If you jumbled up the pieces a second time, any one of them could turn out to be the last, magical piece.”

“The magic feeling at the moment of insight, when the last piece falls into place,” Berkun explained, “is the reward for many hours (or years) of investment coming together. In comparison to the simple action of fitting the puzzle piece into place, we feel the larger collective payoff of hundreds of pieces’ worth of work.”

Perhaps the myth of the data governance framework could also be debunked using the metaphor of a jigsaw puzzle.

Data governance requires the coordination of a complex combination of a myriad of factors, including executive sponsorship, funding, decision rights, arbitration of conflicting priorities, policy definition, policy implementation, data quality remediation, data stewardship, business process optimization, technology enablement, change management — and many other puzzle pieces.

How could a data governance framework possibly predict how you will assemble the puzzle pieces? Or how the puzzle pieces will fit together within your unique corporate culture? Or which of the many aspects of data governance will turn out to be the last (or even the first) piece of the puzzle to fall into place in your organization? And, of course, there is truly no last piece of the puzzle, since data governance is an ongoing program because the business world constantly gets jumbled up by change.

So, data governance frameworks are useful, but only if you realize that data governance frameworks are like jigsaw puzzles.

August 07, 2012

Demystifying Social Media

August 07, 2012/ Jim Harris

In this eight-minute video, I attempt to demystify social media, which is often over-identified with the technology that enables it, when, in fact, we have always been social, and we have always used media, because social media is about human communication, about humans communicating in the same ways they have always communicated, by sharing images, memories, stories, words, and more often nowadays, we are communicating by sharing photographs, videos, and messages via social media status updates.

This video briefly discusses the three social media services used by my local Toastmasters club — Pinterest, Vimeo, and Twitter — and concludes with an analogy inspired by The Emerald City and The Yellow Brick Road from The Wizard of Oz:

If you are having trouble viewing this video, then you can watch it on Vimeo by clicking on this link: Demystifying Social Media

You can also watch a regularly updated page of my videos by clicking on this link: OCDQ Videos

Social Karma Blog Series

Social Karma (Part 1) – Series Introduction

Social Karma (Part 2) – Social Media Preparation

Social Karma (Part 3) – Listening Stations, Home Base, and Outposts

Social Karma (Part 4) – Blogging Best Practices

Social Karma (Part 5) – Connection, Engagement, and ROI Basics

Social Karma (Part 6) – Social Media Books

Social Karma (Part 7) – Twitter

Social Karma (Part 8) – Social Karma Presentation

Information Quality Certified Professional

October 04, 2011/ Jim Harris

Information Quality Certified Professional (IQCP) is the new certification program from the IAIDQ. The application deadline for the next certification exam is October 25, 2011. For more information about IQCP certification, please refer to the following links:

About the IQCP Program: http://iaidq.org/iqcp/iqcp.shtml
Preparing for the Exam: http://iaidq.org/iqcp/exam-preparation.shtml
Exam Dates, Fees, and Locations: http://iaidq.org/iqcp/exam-date-location.shtml
IQCP Webinar by John Talburt: http://iaidq.org/webinars/2011-09-26.shtml
IQCP Webinar by Christian Walenta: http://iaidq.org/webinars/2011-03-15.shtml

Taking the first IQCP exam

A Guest Post written by Gordon Hamilton

I can still remember how galvanized I was by the first email mentions of the IQCP certification and its inaugural examination. I’d been a member of the IAIDQ for the past year and I saw the first mailings in early February 2011. It’s funny but my memory of the sequence of events was that I filled out the application for the examination that first night, but going back through my emails I see that I attended several IAIDQ Webinars and followed quite a few discussions on LinkedIn before I finally applied and paid for the exam in mid-March (I still got the early bird discount).

Looking back now, I am wondering why I was so excited about the chance to become certified in data quality. I know that I had been considering the CBIP and CBAP, from TDWI and IIBA respectively, for more than a year, going so far as to purchase study materials and take some sample exams. Both the CBIP and CBAP designations fit where my career had been for 20+ years, but the subject areas were now tangential to my focus on information and data quality.

The IQCP certification fit exactly where I hoped my career trajectory was now taking me, so it really did galvanize me to action.

I had been a software and database developer for 20+ years when I caught a bad case of Deming-god worship while contracting at Microsoft in the early 2000s, and it only got worse as I started reading books by Olson, Redman, English, Loshin, John Morris, and Maydanchik on how data quality dovetailed with development methodologies of folks like Kimball and Inmon, which in turn dovetailed with the Lean Six Sigma methods. I was on the slippery slope to choosing data quality as a career because those gurus of Data Quality, and Quality in general, were explaining, and I was finally starting to understand, why data warehouse projects failed so often, and why the business was often underwhelmed by the information product.

I had 3+ months to study and the resource center on the IAIDQ website had a list of recommended books and articles. I finally had to live up to my moniker on Twitter of DQStudent. I already had many of the books recommended by IAIDQ at home but hadn’t read them all yet, so while I waited for Amazon and AbeBooks to send me the books I thought were crucial, I began reading Deming, English, and Loshin.

Of all the books that began arriving on my doorstep, the most memorable was Journey to Data Quality by Richard Wang et al.

That book created a powerful image in my head of the information product “manufactured” by every organization. That image of the “information product” made the suggestions by the data quality gurus much clearer. They were showing how to apply quality techniques to the manufacture of Business Intelligence. The image gave me a framework upon which to hang the other knowledge I was gathering about data quality, so it was easier to keep pushing through the books and articles because each new piece could fit somewhere in that manufacturing process.

I slept well the night before the exam, and gave myself plenty of time to make it to the Castle exam site that afternoon. I took along several books on data quality, but hardly glanced at them. Instead I grabbed a quick lunch and then a strong coffee to carry me through the 3 hour exam. At 50 questions per hour I was very conscious of how long each question was taking me and every 10 questions or so I would check to see if was going to run into time trouble. It was obvious after 20 questions that I had plenty of time so I began to get into a groove, finishing the exam 30 minutes early, leaving plenty of time to review any questionable answers.

I found the exam eminently fair with no tricky question constructions at all, so I didn’t seem to fall into the over-thinking trap that I sometimes do. Even better, the exam wasn’t the type that drilled deeper and deeper into my knowledge gaps when I missed a question. Even though I felt confident that I had passed, I’ve got to tell you that the 6 weeks that the IAIDQ took to determine the passing threshold on this inaugural exam and send out passing notifications were the longest 6 weeks I have spent for a long time. Now that the passing mark is established, they swear that the notifications will be sent out much faster.

I still feel a warm glow as I think back on achieving IQCP certification. I am proud to say that I am a data quality consultant and I have the certificate proving the depth and breadth of my knowledge.

Gordon Hamilton is a Data Quality, Data Warehouse, and IQCP certified professional, whose 30 years’ experience in the information business encompasses many industries, including government, legal, healthcare, insurance and financial.

Studying Data Quality

The Blue Box of Information Quality

Data, Information, and Knowledge Management

Are you turning Ugly Data into Cute Information?

The Dichotomy Paradox, Data Quality and Zero Defects

The Data Quality Wager

September 15, 2011

The Blue Box of Information Quality

September 15, 2011/ Jim Harris

OCDQ Radio is a vendor-neutral podcast about data quality and its related disciplines, produced and hosted by Jim Harris.

On this episode, Daragh O Brien and I discuss the Blue Box of Information Quality, which is much bigger on the inside, as well as using stories as an analytical tool and change management technique, and why we must never forget that “people are cool.”

Daragh O Brien is one of Ireland’s leading Information Quality and Governance practitioners. After being born at a young age, Daragh has amassed a wealth of experience in quality information driven business change, from CRM Single View of Customer to Regulatory Compliance, to Governance and the taming of information assets to benefit the bottom line, manage risk, and ensure customer satisfaction. Daragh O Brien is the Managing Director of Castlebridge Associates, one of Ireland’s leading consulting and training companies in the information quality and information governance space.

Daragh O Brien is a founding member and former Director of Publicity for the IAIDQ, which he is still actively involved with. He was a member of the team that helped develop the Information Quality Certified Professional (IQCP) certification and he recently became the first person in Ireland to achieve this prestigious certification.

In 2008, Daragh O Brien was awarded a Fellowship of the Irish Computer Society for his work in developing and promoting standards of professionalism in Information Management and Governance.

Daragh O Brien is a regular conference presenter, trainer, blogger, and author with two industry reports published by Ark Group, the most recent of which is The Data Strategy and Governance Toolkit.

Popular OCDQ Radio Episodes

Clicking on the link will take you to the episode’s blog post:

Demystifying Data Science — Guest Melinda Thielbar, a Ph.D. Statistician, discusses what a data scientist does and provides a straightforward explanation of key concepts such as signal-to-noise ratio, uncertainty, and correlation.

Data Quality and Big Data — Guest Tom Redman (aka the “Data Doc”) discusses Data Quality and Big Data, including if data quality matters less in larger data sets, and if statistical outliers represent business insights or data quality issues.

Gaining a Competitive Advantage with Data — Guest William McKnight discusses some of the practical, hands-on guidance provided by his book Information Management: Strategies for Gaining a Competitive Advantage with Data.

Doing Data Governance — Guest John Ladley discusses his book How to Design, Deploy and Sustain Data Governance and how to understand the difference and relationship between data governance and enterprise information management.

Demystifying Master Data Management — Guest John Owens explains the three types of data (Transaction, Domain, Master), the four master data entities (Party, Product, Location, Asset), and the Party-Role Relationship, which is where we find many of the terms commonly used to describe the Party master data entity (e.g., Customer, Supplier, Employee).

Measuring Data Quality for Ongoing Improvement — Guest Laura Sebastian-Coleman discusses bringing together a better understanding of what is represented in data with the expectations for use in order to improve the overall quality of data.

Data Governance Star Wars — Special Guests Rob Karel and Gwen Thomas joined this extended, and Star Wars themed, discussion about how to balance bureaucracy and business agility during the execution of data governance programs.

Good-Enough Data for Fast-Enough Decisions — Guest Julie Hunt discusses Data Quality and Business Intelligence, including the speed versus quality debate of near-real-time decision making, and the future of predictive analytics.

The Johari Window of Data Quality — Guest Martin Doyle discusses helping people better understand their data and assess its business impacts, not just the negative impacts of bad data quality, but also the positive impacts of good data quality.

The Art of Data Matching — Guest Henrik Liliendahl Sørensen discusses data matching concepts and practices, including different match techniques, candidate selection, presentation of match results, and business applications of data matching.

Studying Data Quality — Guest Gordon Hamilton discusses the key concepts from recommended data quality books, including those which he has implemented in his career as a data quality practitioner.

September 01, 2011

Studying Data Quality

September 01, 2011/ Jim Harris

OCDQ Radio is a vendor-neutral podcast about data quality and its related disciplines, produced and hosted by Jim Harris.

On this episode, Gordon Hamilton and I discuss data quality key concepts, including those which we have studied in some of our favorite data quality books, and more important, those which we have implemented in our careers as data quality practitioners.

Gordon Hamilton is a Data Quality and Data Warehouse professional, whose 30 years’ experience in the information business encompasses many industries, including government, legal, healthcare, insurance and financial. Gordon was most recently engaged in the healthcare industry in British Columbia, Canada, where he continues to advise several health care authorities on data quality and business intelligence platform issues.

Gordon Hamilton’s passion is to bring together:

Exposure of business rules through data profiling as recommended by Ralph Kimball.
Monitoring business rules in the EQTL (Extract-Quality-Transform-Load) pipeline leading into the data warehouse.
Managing the business rule violations through systemic and specific solutions within the statistical process control framework of Shewhart/Deming.
Researching how to sustain data quality metrics as the “fit for purpose” definitions change faster than the information product process can easily adapt.

Gordon Hamilton’s moniker of DQStudent on Twitter hints at his plan to dovetail his Lean Six Sigma skills and experience with the data quality foundations to improve the manufacture of the “information product” in today’s organizations. Gordon is a member of IAIDQ, TDWI, and ASQ, as well as an enthusiastic reader of anything pertaining to data.

Gordon Hamilton recently became an Information Quality Certified Professional (IQCP), via the IAIDQ certification program.

Recommended Data Quality Books

By no means a comprehensive list, and listed in no particular order whatsoever, the following books were either discussed during this OCDQ Radio episode, or are otherwise recommended for anyone looking to study data quality and its related disciplines:

Data Driven: Profiting from Your Most Important Business Asset by Thomas Redman

The Practitioner’s Guide to Data Quality Improvement by David Loshin

Information Quality Applied: Best Practices for Improving Business Information, Processes and Systems by Larry English

Executing Data Quality Projects: Ten Steps to Quality Data and Trusted Information by Danette McGilvray

The Data Asset: How Smart Companies Govern Their Data for Business Success by Tony Fisher

The Data Governance Imperative by Steve Sarsfield

Data Quality Assessment by Arkady Maydanchik

Data Quality: The Accuracy Dimension by Jack Olson

Entity Resolution and Information Quality by John Talburt

Practical Data Migration by John Morris

Customer Data Integration: Reaching a Single Version of the Truth by Jill Dyché and Evan Levy

Master Data Management in Practice: Achieving True Customer MDM by Dalton Cervo and Mark Allen

Journey to Data Quality by Yang Lee, Leo Pipino, Richard Wang, James Funk

Quality Without Tears: The Art of Hassle-Free Management by Philip Crosby

Out of the Crisis by W. Edwards Deming

Juran’s Quality Handbook: The Complete Guide to Performance Excellence (Sixth Edition) by Joseph Juran

Popular OCDQ Radio Episodes

Clicking on the link will take you to the episode’s blog post:

Demystifying Data Science — Guest Melinda Thielbar, a Ph.D. Statistician, discusses what a data scientist does and provides a straightforward explanation of key concepts such as signal-to-noise ratio, uncertainty, and correlation.

Data Quality and Big Data — Guest Tom Redman (aka the “Data Doc”) discusses Data Quality and Big Data, including if data quality matters less in larger data sets, and if statistical outliers represent business insights or data quality issues.

Gaining a Competitive Advantage with Data — Guest William McKnight discusses some of the practical, hands-on guidance provided by his book Information Management: Strategies for Gaining a Competitive Advantage with Data.

Doing Data Governance — Guest John Ladley discusses his book How to Design, Deploy and Sustain Data Governance and how to understand the difference and relationship between data governance and enterprise information management.

Demystifying Master Data Management — Guest John Owens explains the three types of data (Transaction, Domain, Master), the four master data entities (Party, Product, Location, Asset), and the Party-Role Relationship, which is where we find many of the terms commonly used to describe the Party master data entity (e.g., Customer, Supplier, Employee).

Measuring Data Quality for Ongoing Improvement — Guest Laura Sebastian-Coleman discusses bringing together a better understanding of what is represented in data with the expectations for use in order to improve the overall quality of data.

The Blue Box of Information Quality — Guest Daragh O Brien on why Information Quality is bigger on the inside, using stories as an analytical tool and change management technique, and why we must never forget that “people are cool.”

Data Governance Star Wars — Special Guests Rob Karel and Gwen Thomas joined this extended, and Star Wars themed, discussion about how to balance bureaucracy and business agility during the execution of data governance programs.

Good-Enough Data for Fast-Enough Decisions — Guest Julie Hunt discusses Data Quality and Business Intelligence, including the speed versus quality debate of near-real-time decision making, and the future of predictive analytics.

The Johari Window of Data Quality — Guest Martin Doyle discusses helping people better understand their data and assess its business impacts, not just the negative impacts of bad data quality, but also the positive impacts of good data quality.

The Art of Data Matching — Guest Henrik Liliendahl Sørensen discusses data matching concepts and practices, including different match techniques, candidate selection, presentation of match results, and business applications of data matching.

August 25, 2011

DAMA International

August 25, 2011/ Jim Harris

OCDQ Radio is a vendor-neutral podcast about data quality and its related disciplines, produced and hosted by Jim Harris.

DAMA International is a non-profit, vendor-independent, global association of technical and business professionals dedicated to advancing the concepts and practices of information and data management.

On this episode, special guest Loretta Mahon Smith provides an overview of the Data Management Body of Knowledge (DMBOK) and Certified Data Management Professional (CDMP) certification program.

Loretta Mahon Smith is a visionary and influential data management professional known for her consistent awareness of trends in the forefront of the industry. Since 1983, she has worked in international financial services, and been actively involved in the maturity and growth of Information Architecture functions, specializing in Data Stewardship and Data Strategy Development.

Loretta Mahon Smith has been a member of DAMA for more than 10 years, with a lifetime membership to the DAMA National Capitol Region Chapter. As President of the chapter she has the opportunity to help the Washington DC and Baltimore data management communities. She serves the world community by her involvement on the DAMA International Board as VP of Communications. She additionally volunteers her time to work on the ICCP Certification Council, most recently working on the development of the Zachman and Data Governance examinations.

In the past, Loretta has facilitated Special Interest Group sessions on Governance and Stewardship and presented Stewardship training at numerous local chapters for DAMA, IIBA, TDWI, and ACM, as well as major conferences including Project World (IIBA), INFO360 (AIIM), EDW (DAMA) and the IQ. She earned Certified Computing Professional (CCP), Certified Business Intelligence Professional (CBIP), and Certified Data Management Professional (CDMP) designations, achieving mastery level proficiency rating in Data Warehousing, Data Management, and Data Quality.

Popular OCDQ Radio Episodes

Clicking on the link will take you to the episode’s blog post:

Demystifying Data Science — Guest Melinda Thielbar, a Ph.D. Statistician, discusses what a data scientist does and provides a straightforward explanation of key concepts such as signal-to-noise ratio, uncertainty, and correlation.

Data Quality and Big Data — Guest Tom Redman (aka the “Data Doc”) discusses Data Quality and Big Data, including if data quality matters less in larger data sets, and if statistical outliers represent business insights or data quality issues.

Gaining a Competitive Advantage with Data — Guest William McKnight discusses some of the practical, hands-on guidance provided by his book Information Management: Strategies for Gaining a Competitive Advantage with Data.

Doing Data Governance — Guest John Ladley discusses his book How to Design, Deploy and Sustain Data Governance and how to understand the difference and relationship between data governance and enterprise information management.

Demystifying Master Data Management — Guest John Owens explains the three types of data (Transaction, Domain, Master), the four master data entities (Party, Product, Location, Asset), and the Party-Role Relationship, which is where we find many of the terms commonly used to describe the Party master data entity (e.g., Customer, Supplier, Employee).

Measuring Data Quality for Ongoing Improvement — Guest Laura Sebastian-Coleman discusses bringing together a better understanding of what is represented in data with the expectations for use in order to improve the overall quality of data.

The Blue Box of Information Quality — Guest Daragh O Brien on why Information Quality is bigger on the inside, using stories as an analytical tool and change management technique, and why we must never forget that “people are cool.”

Data Governance Star Wars — Special Guests Rob Karel and Gwen Thomas joined this extended, and Star Wars themed, discussion about how to balance bureaucracy and business agility during the execution of data governance programs.

Good-Enough Data for Fast-Enough Decisions — Guest Julie Hunt discusses Data Quality and Business Intelligence, including the speed versus quality debate of near-real-time decision making, and the future of predictive analytics.

The Johari Window of Data Quality — Guest Martin Doyle discusses helping people better understand their data and assess its business impacts, not just the negative impacts of bad data quality, but also the positive impacts of good data quality.

The Art of Data Matching — Guest Henrik Liliendahl Sørensen discusses data matching concepts and practices, including different match techniques, candidate selection, presentation of match results, and business applications of data matching.

Studying Data Quality — Guest Gordon Hamilton discusses the key concepts from recommended data quality books, including those which he has implemented in his career as a data quality practitioner.

June 08, 2011

Data Quality Pro

June 08, 2011/ Jim Harris

OCDQ Radio is a vendor-neutral podcast about data quality and its related disciplines, produced and hosted by Jim Harris.

On this episode, I am joined by special guest Dylan Jones, the community leader of Data Quality Pro, the largest membership resource dedicated entirely to the data quality profession.

Dylan is currently overseeing the re-build and re-launch of Data Quality Pro into a next generation membership platform, and during our podcast discussion, Dylan describes some of the great new features that will be coming soon to Data Quality Pro.

Links for Data Quality Pro and Dylan Jones:

Popular OCDQ Radio Episodes

Clicking on the link will take you to the episode’s blog post:

Demystifying Data Science — Guest Melinda Thielbar, a Ph.D. Statistician, discusses what a data scientist does and provides a straightforward explanation of key concepts such as signal-to-noise ratio, uncertainty, and correlation.

Data Quality and Big Data — Guest Tom Redman (aka the “Data Doc”) discusses Data Quality and Big Data, including if data quality matters less in larger data sets, and if statistical outliers represent business insights or data quality issues.

Gaining a Competitive Advantage with Data — Guest William McKnight discusses some of the practical, hands-on guidance provided by his book Information Management: Strategies for Gaining a Competitive Advantage with Data.

Doing Data Governance — Guest John Ladley discusses his book How to Design, Deploy and Sustain Data Governance and how to understand the difference and relationship between data governance and enterprise information management.

Demystifying Master Data Management — Guest John Owens explains the three types of data (Transaction, Domain, Master), the four master data entities (Party, Product, Location, Asset), and the Party-Role Relationship, which is where we find many of the terms commonly used to describe the Party master data entity (e.g., Customer, Supplier, Employee).

Measuring Data Quality for Ongoing Improvement — Guest Laura Sebastian-Coleman discusses bringing together a better understanding of what is represented in data with the expectations for use in order to improve the overall quality of data.

The Blue Box of Information Quality — Guest Daragh O Brien on why Information Quality is bigger on the inside, using stories as an analytical tool and change management technique, and why we must never forget that “people are cool.”

Data Governance Star Wars — Special Guests Rob Karel and Gwen Thomas joined this extended, and Star Wars themed, discussion about how to balance bureaucracy and business agility during the execution of data governance programs.

Good-Enough Data for Fast-Enough Decisions — Guest Julie Hunt discusses Data Quality and Business Intelligence, including the speed versus quality debate of near-real-time decision making, and the future of predictive analytics.

The Johari Window of Data Quality — Guest Martin Doyle discusses helping people better understand their data and assess its business impacts, not just the negative impacts of bad data quality, but also the positive impacts of good data quality.

The Art of Data Matching — Guest Henrik Liliendahl Sørensen discusses data matching concepts and practices, including different match techniques, candidate selection, presentation of match results, and business applications of data matching.

Studying Data Quality — Guest Gordon Hamilton discusses the key concepts from recommended data quality books, including those which he has implemented in his career as a data quality practitioner.

April 18, 2011

How active is your data quality practice?

April 18, 2011/ Jim Harris

My recent blog post The Data Quality Wager received a provocative comment from Richard Ordowich that sparked another round of discussion and debate about proactive data quality versus reactive data quality in the LinkedIn Group for the IAIDQ.

“Data quality is a reactive practice,” explained Ordowich. “Perhaps that is not what is professed in the musings of others or the desired outcome, but it is nevertheless the current state of the best practices. Data profiling and data cleansing are after the fact data quality practices. The data is already defective. Proactive defect prevention requires a greater discipline and changes to organizational behavior that is not part of the current best practices. This I suggest is wishful thinking at this point in time.”

“How can data quality practices,” C. Lwanga Yonke responded, “that do not include proactive defect prevention (with the required discipline and changes to organizational behavior) be considered best practices? Seems to me a data quality program must include these proactive activities to be considered a best practice. And from what I see, there are many such programs out there. True, they are not the majority—but they do exist.”

After Ordowich requested real examples of proactive data quality practices, Jayson Alayay commented “I have implemented data quality using statistical process control techniques where expected volumes and ratios are predicted using forecasting models that self-adjust using historical trends. We receive an alert when significant deviations from forecast are detected. One of our overarching data quality goals is to detect a significant data issue as soon as it becomes detectable in the system.”

“It is possible,” replied Ordowich, “to estimate the probability of data errors in data sets based on the currency (freshness) and usage of the data. The problem is this process does not identify the specific instances of errors just the probability that an error may exist in the data set. These techniques only identify trends not specific instances of errors. These techniques do not predict the probability of a single instance data error that can wreak havoc. For example, the ratings of mortgages was a systemic problem, which data quality did not address. Yet the consequences were far and wide. Also these techniques do not predict systemic quality problems related to business policies and processes. As a result, their direct impact on the business is limited.”

“For as long as human hands key in data,” responded Alayay, “a data quality implementation to a great extent will be reactive. Improving data quality not only pertains to detection of defects, but also enhancement of content, e.g., address standardization, geocoding, application of rules and assumptions to replace missing values, etc. With so many factors in play, a real life example of a proactive data quality implementation that suits what you’re asking for may be hard to pinpoint. My opinion is that the implementation of ‘comprehensive’ data quality programs can have big rewards and big risks. One big risk is that it can slow time-to-market and kill innovation because otherwise talented people would be spending a significant amount of their time complying with rules and standards in the name of improving data quality.”

“When an organization embarks on a new project,” replied Ordowich, “at what point in the conversation is data quality discussed? How many marketing plans, new product development plans, or even software development plans have you seen include data quality? Data quality is not even an afterthought in most organizations, it is ignored. Data quality is not even in the vocabulary until a problem occurs. Data quality is not part of the culture or behaviors within most organizations.”

Please feel free to post a comment below and explain your vote or simply share your opinions and experiences.

A Tale of Two Q’s

What going to the dentist taught me about data quality

Groundhog Data Quality Day

Hyperactive Data Quality (Second Edition)

The General Theory of Data Quality

What Data Quality Technology Wants

MacGyver: Data Governance and Duct Tape

To Our Data Perfectionists

Finding Data Quality

Retroactive Data Quality

January 07, 2011

#FollowFriday Spotlight: @DataQualityPro

January 07, 2011/ Jim Harris

FollowFriday Spotlight is an OCDQ regular segment highlighting someone you should follow—and not just Fridays on Twitter.

Links for Data Quality Pro and Dylan Jones:

Data Quality Pro, founded and maintained by Dylan Jones, is a free and independent community resource dedicated to helping data quality professionals take their career or business to the next level. Data Quality Pro is your free expert resource providing data quality articles, webinars, forums and tutorials from the world’s leading experts, every day.

With the mission to create the most beneficial data quality resource that is freely available to members around the world, the goal of Data Quality Pro is “winning-by-sharing” and they believe that by contributing a small amount of their experience, skill or time to support other members then truly great things can be achieved.

Membership is 100% free and provides a broad range of additional content for professionals of all backgrounds and skill levels.

Check out the Best of Data Quality Pro, which includes the following great blog posts written by Dylan Jones in 2010:

#FollowFriday and Re-Tweet-Worthiness

#FollowFriday and The Three Tweets

Dilbert, Data Quality, Rabbits, and #FollowFriday

Twitter, Meaningful Conversations, and #FollowFriday

The Fellowship of #FollowFriday

Social Karma (Part 7) – Twitter

November 16, 2010

TDWI World Conference Orlando 2010

November 16, 2010/ Jim Harris

Last week I attended the TDWI World Conference held November 7-12 in Orlando, Florida at the Loews Royal Pacific Resort.

As always, TDWI conferences offer a variety of full-day and half-day courses taught in an objective, vendor-neutral manner, designed for professionals and taught by in-the-trenches practitioners who are well known in the industry.

In this blog post, I summarize a few key points from two of the courses I attended. I used Twitter to help me collect my notes, and you can access the complete archive of my conference tweets on Twapper Keeper.

A Practical Guide to Analytics

Wayne Eckerson, author of the book Performance Dashboards: Measuring, Monitoring, and Managing Your Business, described the four waves of business intelligence:

Reporting – What happened?
Analysis – Why did it happen?
Monitoring – What’s happening?
Prediction – What will happen?

“Reporting is the jumping off point for analytics,” explained Eckerson, “but many executives don’t realize this. The most powerful aspect of analytics is testing our assumptions.” He went on to differentiate the two strains of analytics:

Exploration and Analysis – Top-down and deductive, primarily uses query tools
Prediction and Optimization – Bottom-up and inductive, primarily uses data mining tools

“A huge issue for predictive analytics is getting people to trust the predictions,” remarked Eckerson. “Technology is the easy part, the hard part is selling the business benefits and overcoming cultural resistance within the organization.”

“The key is not getting the right answers, but asking the right questions,” he explained, quoting Ken Rudin of Zynga.

“Deriving insight from its unique information will always be a competitive advantage for every organization.” He recommended the book Competing on Analytics: The New Science of Winning as a great resource for selling the business benefits of analytics.

Data Governance for BI Professionals

Jill Dyché, a partner and co-founder of Baseline Consulting, explained that data governance transcends business intelligence and other enterprise information initiatives such as data warehousing, master data management, and data quality.

“Data governance is the organizing framework,” explained Dyché, “for establishing strategy, objectives, and policies for corporate data. Data governance is the business-driven policy making and oversight of corporate information.”

“Data governance is necessary,” remarked Dyché, “whenever multiple business units are sharing common, reusable data.”

“Data governance aligns data quality with business measures and acceptance, positions enterprise data issues as cross-functional, and ensures data is managed separately from its applications, thereby evolving data as a service (DaaS).”

In her excellent 2007 article Serving the Greater Good: Why Data Hoarding Impedes Corporate Growth, Dyché explained the need for “systemizing the notion that data – corporate asset that it is – belongs to everyone.”

“Data governance provides the decision rights around the corporate data asset.”

DQ-View: From Data to Decision

Podcast: Data Governance is Mission Possible

The Business versus IT—Tear down this wall!

MacGyver: Data Governance and Duct Tape

Live-Tweeting: Data Governance

Enterprise Data World 2010

Enterprise Data World 2009

TDWI World Conference Chicago 2009

Light Bulb Moments at DataFlux IDEAS 2010

DataFlux IDEAS 2009

September 20, 2010

DQ-Tip: “There is no such thing as data accuracy...”

September 20, 2010/ Jim Harris

Data Quality (DQ) Tips is an OCDQ regular segment. Each DQ-Tip is a clear and concise data quality pearl of wisdom.

“There is no such thing as data accuracy — There are only assertions of data accuracy.”

This DQ-Tip came from the Data Quality Pro webinar ISO 8000 Master Data Quality featuring Peter Benson of ECCMA.

You can download (.pdf file) quotes from this webinar by clicking on this link: Data Quality Pro Webinar Quotes - Peter Benson

ISO 8000 is the international standards for data quality. You can get more information by clicking on this link: ISO 8000

Data Accuracy

Accuracy, which, thanks to substantial assistance from my readers, was defined in a previous post as both the correctness of a data value within a limited context such as verification by an authoritative reference (i.e., validity) combined with the correctness of a valid data value within an extensive context including other data as well as business processes (i.e., accuracy).

“The definition of data quality,” according to Peter and the ISO 8000 standards, “is the ability of the data to meet requirements.”

Although accuracy is only one of many dimensions of data quality, whenever we refer to data as accurate, we are referring to the ability of the data to meet specific requirements, and quite often it’s the ability to support making a critical business decision.

I agree with Peter and the ISO 8000 standards because we can’t simply take an accuracy metric on a data quality dashboard (or however else the assertion is presented to us) at face value without understanding how the metric is both defined and measured.

However, even when well defined and properly measured, data accuracy is still only an assertion. Oftentimes, the only way to verify the assertion is by putting the data to its intended use.

If by using it you discover that the data is inaccurate, then by having established what the assertion of accuracy was based on, you have a head start on performing root cause analysis, enabling faster resolution of the issues—not only with the data, but also with the business and technical processes used to define and measure data accuracy.

August 01, 2010

El Festival del IDQ Bloggers (June and July 2010)

August 01, 2010/ Jim Harris

Welcome to the June and July 2010 issue of El Festival del IDQ Bloggers, which is a blog carnival by the IAIDQ that offers a great opportunity for both information quality and data quality bloggers to get their writing noticed and to connect with other bloggers around the world.

Definition Drift

Graham Rhind submitted his July blog post Definition drift, which examines the persistent problems facing attempts to define a consistent terminology within the data quality industry.

It is essential to the success of a data quality initiative that its key concepts are clearly defined and in a language that everyone can understand. Therefore, I also recommend that you check out the free online data quality glossary built and maintained by Graham Rhind by following this link: Data Quality Glossary.

Lemonade Stand Data Quality

Steve Sarsfield submitted his July blog post Lemonade Stand Data Quality, which explains that data quality projects are a form of capitalism, meaning that you need to sell your customers a refreshing glass and keep them coming back for more.

What’s In a Given Name?

Henrik Liliendahl Sørensen submitted his June blog post What’s In a Given Name?, which examines a common challenge facing data quality, master data management, and data matching—namely (pun intended), how to automate the interpretation of the “given name” (aka “first name”) component of a person’s name separately from their “family name” (aka “last name”).

Solvency II Standards for Data Quality

Ken O’Connor submitted his July blog post Solvency II Standards for Data Quality, which explains the Solvency II standards are common sense data quality standards, which can enable all organizations, regardless of their industry or region, to achieve complete, appropriate, and accurate data.

How Accuracy Has Changed

Scott Schumacher submitted his July blog post How Accuracy Has Changed, which explains that accuracy means being able to make the best use of all the information you have, putting data together where necessary, and keeping it apart where necessary.

Uniqueness is in the Eye of the Beholder

Marty Moseley submitted his June blog post Uniqueness is in the Eye of the Beholder, which beholds the challenge of uniqueness and identity matching, where determining if data records should be matched is often a matter of differing perspectives among groups within an organization, where what one group considers unique, another group considers non-unique or a duplicate.

Uniqueness in the Eye of the NSTIC

Jeffrey Huth submitted his July blog post Uniqueness in the Eye of the NSTIC, which examines a recently drafted document in the United States regarding a National Strategy for Trusted Identities in Cyberspace (NSTIC).

Profound Profiling

Daragh O Brien submitted his July blog post Profound Profiling, which recounts how he has found data profiling cropping up in conversations and presentations he’s been making recently, even where the topic of the day wasn’t “Information Quality” and shares his thoughts on the profound benefits of data profiling for organizations seeking to manage risk and ensure compliance.

Wanted: a Data Quality Standard for Open Government Data

Sarah Burnett submitted her July blog post Wanted: a Data Quality Standard for Open Government Data, which calls for the establishment of data quality standards for open government data (i.e., public data sets) since more of it is becoming available.

Data Quality Disasters in the Social Media Age

Dylan Jones submitted his July blog post The reality of data quality disasters in a social media age, which examines how bad news sparked by poor data quality travels faster and further than ever before, by using the recent story about the Enbridge Gas billing blunders as a practical lesson for all companies sitting on the data quality fence.

Finding Data Quality

Jim Harris (that’s me referring to myself in the third person) submitted my July blog post Finding Data Quality, which explains (with the help of the movie Finding Nemo) that although data quality is often discussed only in its relation to initiatives such as master data management, business intelligence, and data governance, eventually you’ll be finding data quality everywhere.

Editor’s Selections

In addition to the official submissions above, I selected the following great data quality blog posts published in June or July 2010:

Check out the past issues of El Festival del IDQ Bloggers

El Festival del IDQ Bloggers (May 2010) – edited by Castlebridge Associates

El Festival del IDQ Bloggers (April 2010) – edited by Graham Rhind

El Festival del IDQ Bloggers (March 2010) – edited by Phil Wright

El Festival del IDQ Bloggers (February 2010) – edited by William Sharp

El Festival del IDQ Bloggers (January 2010) – edited by Henrik Liliendahl Sørensen

El Festival del IDQ Bloggers (November 2009) – edited by Daragh O Brien

El Festival del IDQ Bloggers (October 2009) – edited by Vincent McBurney

El Festival del IDQ Bloggers (September 2009) – edited by Daniel Gent

El Festival del IDQ Bloggers (August 2009) – edited by William Sharp

El Festival del IDQ Bloggers (July 2009) – edited by Andrew Brooks

El Festival del IDQ Bloggers (June 2009) – edited by Steve Sarsfield

El Festival del IDQ Bloggers (May 2009) – edited by Daragh O Brien

El Festival del IDQ Bloggers (April 2009) – edited by Jim Harris

July 28, 2010

DQ-View: Is Data Quality the Sun?

July 28, 2010/ Jim Harris

Data Quality (DQ) View is an OCDQ regular segment. Each DQ-View is a brief video discussion of a data quality key concept.

DataQualityPro

This recent tweet by Dylan Jones of Data Quality Pro succinctly expresses a vitally important truth about the data quality profession.

Although few would debate the necessary requirement of skill, some might doubt the need for passion. Therefore, in this new DQ-View segment, I want to discuss why data quality initiatives require passionate data professionals.

DQ-View: Is Data Quality the Sun?

If you are having trouble viewing this video, then you can watch it on Vimeo by clicking on this link: DQ-View on Vimeo

Data Gazers

Finding Data Quality

Oh, the Data You’ll Show!

Data Rock Stars: The Rolling Forecasts

The Second Law of Data Quality

The General Theory of Data Quality

DQ-Tip: “Start where you are...”

Sneezing Data Quality

July 13, 2010

The 2010 Data Quality Blogging All-Stars

July 13, 2010/ Jim Harris

The 2010 Major League Baseball (MLB) All-Star Game is being held tonight (July 13) at Angel Stadium in Anaheim, California.

For those readers who are not baseball fans, the All-Star Game is an annual exhibition held in mid-July that showcases the players with (for the most part) the best statistical performances during the first half of the MLB season.

Last summer, I began my own annual exhibition of showcasing the bloggers whose posts I have personally most enjoyed reading during the first half of the data quality blogging season.

Therefore, this post provides links to stellar data quality blog posts that were published between January 1 and June 30 of 2010. My definition of a “data quality blog post” also includes Data Governance, Master Data Management, and Business Intelligence.

Please Note: There is no implied ranking in the order that bloggers or blogs are listed, other than that Individual Blog All-Stars are listed first, followed by Vendor Blog All-Stars, and the blog posts are listed in reverse chronological order by publication date.

Henrik Liliendahl Sørensen

From Liliendahl on Data Quality:

Dylan Jones

From Data Quality Pro:

Julian Schwarzenbach

From Data and Process Advantage Blog:

Radioactive gold data
Does data make you lonely ?!?
The Data Accident Investigation Board
The Data Zoo (Five Part Series and White Paper): Part 1, Part 2, Part 3, Part 4, Part 5
How tasty is your data quality cheese?

Rich Murnane

From Rich Murnane's Blog:

Phil Wright

From Data Factotum:

How are you Executing your Data Quality Strategy?
How do you identify your strategic data?
The First Step on your Data Quality Roadmap
Can motivations impact the state of data quality?
A balanced approach to scoring data quality (Six Part Series): Part 1, Part 2, Part 3, Part 4, Part 5, Part 6
The Great Expectations of BI
Questions to measure BI DQ/DM success

Initiate – an IBM Company

From Mastering Data Management:

Baseline Consulting

From their three blogs: Inside the Biz with Jill Dyché, Inside IT with Evan Levy, and In the Field with our Experts:

DataFlux – a SAS Company

From Community of Experts:

Recently Read: May 15, 2010

Recently Read: March 22, 2010

Recently Read: March 6, 2010

Recently Read: January 23, 2010

The 2009 Data Quality Blogging All-Stars

Additional Resources

From the IAIDQ, read the 2010 issues of the Blog Carnival for Information/Data Quality:

April 12, 2010

Microwavable Data Quality

April 12, 2010/ Jim Harris

Data quality is definitely not a one-time project, but instead requires a sustained program of enterprise-wide best practices that are best implemented within a data governance framework that “bakes in” defect prevention, data quality monitoring, and near real-time standardization and matching services—all ensuring high quality data is available to support daily business decisions.

However, implementing a data governance program is an evolutionary process requiring time and patience.

Baking and cooking also require time and patience. Microwavable meals can be an occasional welcome convenience, and if you are anything like me (my condolences) and you can’t bake or cook, then microwavable meals can be an absolute necessity.

Data cleansing can also be an occasional (not necessarily welcome) convenience, or a relative necessity (i.e., a “necessary evil”).

Last year on Data Quality Pro, Dylan Jones hosted a great debate on the necessity of data cleansing, which is well worth reading, especially since the over 25 (and continuing) comments it received proves it is a polarizing topic for the data quality profession.

I reheated this debate (using the Data Quality Microwave, of course) earlier this year with my A Tale of Two Q’s blog post, which also received many commendable comments (but far less than Dylan’s blog post—not that I am counting or anything).

Similarly, a heated debate can be had over the health implications of the microwave. Eating too many microwavable meals is certainly not healthy, but I have many friends and family who would argue quite strongly for either side of this “food fight.”

Both of these great debates can be as deeply polarizing as Pepsi vs. Coke and Soccer vs. Football. Just for the official record, I am firmly for both Pepsi and Football—and by Football, I mean NFL Football—and firmly against both Coke and Soccer.

Just as I advocate that everyone (myself included) should learn how to cook, but still accept the eternal reality of the microwave, I definitely advocate the implementation of a data governance program, but I also accept the eternal reality of data cleansing.

However, my lawyers have advised me to report that beta testing for an actual Data Quality Microwave has not been promising.

A Tale of Two Q’s

Hyperactive Data Quality (Second Edition)

The General Theory of Data Quality

Follow OCDQ

If you enjoyed this blog post, then please subscribe to OCDQ via my RSS feed, my E-mail updates, or Google Reader.

You can also follow OCDQ on Twitter, fan the Facebook page for OCDQ, and connect with me on LinkedIn.

OCDQ Blog

Social Karma Blog Series

Related Social Media Posts

Taking the first IQCP exam

Related Posts

Popular OCDQ Radio Episodes

Recommended Data Quality Books

Popular OCDQ Radio Episodes

Popular OCDQ Radio Episodes

Popular OCDQ Radio Episodes

Related Posts

Related Posts

A Practical Guide to Analytics

Data Governance for BI Professionals

Related Posts

Data Accuracy

Definition Drift

Lemonade Stand Data Quality

What’s In a Given Name?

Solvency II Standards for Data Quality

How Accuracy Has Changed

Uniqueness is in the Eye of the Beholder

Uniqueness in the Eye of the NSTIC

Profound Profiling

Wanted: a Data Quality Standard for Open Government Data

Data Quality Disasters in the Social Media Age

Finding Data Quality

Editor’s Selections

Check out the past issues of El Festival del IDQ Bloggers

DQ-View: Is Data Quality the Sun?

Related Posts

Henrik Liliendahl Sørensen

Dylan Jones

Julian Schwarzenbach

Rich Murnane

Phil Wright

Initiate – an IBM Company

Baseline Consulting

DataFlux – a SAS Company

Related Posts

Additional Resources

Related Posts

Follow OCDQ

OCDQ Blog