Availability Bias and Data Quality Improvement
/The availability heuristic is a mental shortcut that occurs when people make judgments based on the ease with which examples come to mind. Although this heuristic can be beneficial, such as when it helps us recall examples of a dangerous activity to avoid, sometimes it leads to availability bias, where we’re affected more strongly by the ease of retrieval than by the content retrieved.
In his thought-provoking book Thinking, Fast and Slow, Daniel Kahneman explained how availability bias works by recounting an experiment where different groups of college students were asked to rate a course they had taken the previous semester by listing ways to improve the course — while varying the number of improvements that different groups were required to list.
Counterintuitively, students in the group required to list more necessary improvements gave the course a higher rating, whereas students in the group required to list fewer necessary improvements gave the course a lower rating.
According to Kahneman, the extra cognitive effort expended by the students required to list more improvements biased them into believing it was difficult to list necessary improvements, leading them to conclude that the course didn’t need much improvement, and conversely, the little cognitive effort expended by the students required to list few improvements biased them into concluding, since it was so easy to list necessary improvements, that the course obviously needed improvement.
This is counterintuitive because you’d think that the students would rate the course based on an assessment of the information retrieved from their memory regardless of how easy that information was to retrieve. It would have made more sense for the course to be rated higher for needing fewer improvements, but availability bias lead the students to the opposite conclusion.
Availability bias can also affect an organization’s discussions about the need for data quality improvement.
If you asked stakeholders to rate the organization’s data quality by listing business-impacting incidents of poor data quality, would they reach a different conclusion if you asked them to list one incident versus asking them to list at least ten incidents?
In my experience, an event where poor data quality negatively impacted the organization, such as a regulatory compliance failure, is often easily dismissed by stakeholders as an isolated incident to be corrected by a one-time data cleansing project.
But would forcing stakeholders to list ten business-impacting incidents of poor data quality make them concede that data quality improvement should be supported by an ongoing program? Or would the extra cognitive effort bias them into concluding, since it was so difficult to list ten incidents, that the organization’s data quality doesn’t really need much improvement?
I think that the availability heuristic helps explain why most organizations easily approve reactive data cleansing projects, and availability bias helps explain why most organizations usually resist proactively initiating a data quality improvement program.
Related Posts
DQ-View: The Five Stages of Data Quality
Data Quality and Chicken Little Syndrome
You only get a Return from something you actually Invest in
“Some is not a number and soon is not a time”
Why isn’t our data quality worse?
Data Quality and the Bystander Effect
Perception Filters and Data Quality
Related OCDQ Radio Episodes
Clicking on the link will take you to the episode’s blog post:
- Data Driven — Guest Tom Redman (aka the “Data Doc”) discusses concepts from one of my favorite data quality books, which is his most recent book: Data Driven: Profiting from Your Most Important Business Asset.
- Organizing for Data Quality — Guest Tom Redman (aka the “Data Doc”) discusses how your organization should approach data quality, including his call to action for your role in the data revolution.
- The Johari Window of Data Quality — Guest Martin Doyle discusses helping people better understand their data and assess its business impacts, not just the negative impacts of bad data quality, but also the positive impacts of good data quality.
- Redefining Data Quality — Guest Peter Perera discusses his proposed redefinition of data quality, as well as his perspective on the relationship of data quality to master data management and data governance.
- The Blue Box of Information Quality — Guest Daragh O Brien on why Information Quality is bigger on the inside, using stories as an analytical tool and change management technique, and why we must never forget that “people are cool.”
- Studying Data Quality — Guest Gordon Hamilton discusses the key concepts from recommended data quality books, including those which he has implemented in his career as a data quality practitioner.