Studying Data Quality
/OCDQ Radio is a vendor-neutral podcast about data quality and its related disciplines, produced and hosted by Jim Harris.
On this episode, Gordon Hamilton and I discuss data quality key concepts, including those which we have studied in some of our favorite data quality books, and more important, those which we have implemented in our careers as data quality practitioners.
Gordon Hamilton is a Data Quality and Data Warehouse professional, whose 30 years’ experience in the information business encompasses many industries, including government, legal, healthcare, insurance and financial. Gordon was most recently engaged in the healthcare industry in British Columbia, Canada, where he continues to advise several health care authorities on data quality and business intelligence platform issues.
Gordon Hamilton’s passion is to bring together:
Exposure of business rules through data profiling as recommended by Ralph Kimball.
Monitoring business rules in the EQTL (Extract-Quality-Transform-Load) pipeline leading into the data warehouse.
Managing the business rule violations through systemic and specific solutions within the statistical process control framework of Shewhart/Deming.
Researching how to sustain data quality metrics as the “fit for purpose” definitions change faster than the information product process can easily adapt.
Gordon Hamilton’s moniker of DQStudent on Twitter hints at his plan to dovetail his Lean Six Sigma skills and experience with the data quality foundations to improve the manufacture of the “information product” in today’s organizations. Gordon is a member of IAIDQ, TDWI, and ASQ, as well as an enthusiastic reader of anything pertaining to data.
Gordon Hamilton recently became an Information Quality Certified Professional (IQCP), via the IAIDQ certification program.
Recommended Data Quality Books
By no means a comprehensive list, and listed in no particular order whatsoever, the following books were either discussed during this OCDQ Radio episode, or are otherwise recommended for anyone looking to study data quality and its related disciplines:
Data Driven: Profiting from Your Most Important Business Asset by Thomas Redman
The Practitioner’s Guide to Data Quality Improvement by David Loshin
Information Quality Applied: Best Practices for Improving Business Information, Processes and Systems by Larry English
Executing Data Quality Projects: Ten Steps to Quality Data and Trusted Information by Danette McGilvray
The Data Governance Imperative by Steve Sarsfield
Data Quality Assessment by Arkady Maydanchik
Data Quality: The Accuracy Dimension by Jack Olson
Entity Resolution and Information Quality by John Talburt
Practical Data Migration by John Morris
Customer Data Integration: Reaching a Single Version of the Truth by Jill Dyché and Evan Levy
Master Data Management in Practice: Achieving True Customer MDM by Dalton Cervo and Mark Allen
Journey to Data Quality by Yang Lee, Leo Pipino, Richard Wang, James Funk
Quality Without Tears: The Art of Hassle-Free Management by Philip Crosby
Out of the Crisis by W. Edwards Deming
Popular OCDQ Radio Episodes
Clicking on the link will take you to the episode’s blog post:
- Demystifying Data Science — Guest Melinda Thielbar, a Ph.D. Statistician, discusses what a data scientist does and provides a straightforward explanation of key concepts such as signal-to-noise ratio, uncertainty, and correlation.
- Data Quality and Big Data — Guest Tom Redman (aka the “Data Doc”) discusses Data Quality and Big Data, including if data quality matters less in larger data sets, and if statistical outliers represent business insights or data quality issues.
- Gaining a Competitive Advantage with Data — Guest William McKnight discusses some of the practical, hands-on guidance provided by his book Information Management: Strategies for Gaining a Competitive Advantage with Data.
- Doing Data Governance — Guest John Ladley discusses his book How to Design, Deploy and Sustain Data Governance and how to understand the difference and relationship between data governance and enterprise information management.
- Demystifying Master Data Management — Guest John Owens explains the three types of data (Transaction, Domain, Master), the four master data entities (Party, Product, Location, Asset), and the Party-Role Relationship, which is where we find many of the terms commonly used to describe the Party master data entity (e.g., Customer, Supplier, Employee).
- Measuring Data Quality for Ongoing Improvement — Guest Laura Sebastian-Coleman discusses bringing together a better understanding of what is represented in data with the expectations for use in order to improve the overall quality of data.
- The Blue Box of Information Quality — Guest Daragh O Brien on why Information Quality is bigger on the inside, using stories as an analytical tool and change management technique, and why we must never forget that “people are cool.”
- Data Governance Star Wars — Special Guests Rob Karel and Gwen Thomas joined this extended, and Star Wars themed, discussion about how to balance bureaucracy and business agility during the execution of data governance programs.
- Good-Enough Data for Fast-Enough Decisions — Guest Julie Hunt discusses Data Quality and Business Intelligence, including the speed versus quality debate of near-real-time decision making, and the future of predictive analytics.
- The Johari Window of Data Quality — Guest Martin Doyle discusses helping people better understand their data and assess its business impacts, not just the negative impacts of bad data quality, but also the positive impacts of good data quality.
- The Art of Data Matching — Guest Henrik Liliendahl Sørensen discusses data matching concepts and practices, including different match techniques, candidate selection, presentation of match results, and business applications of data matching.