Schrödinger's Data Quality

In 1935, Austrian physicist Erwin Schrödinger described a now famous thought experiment where:

  “A cat, a flask containing poison, a tiny bit of radioactive substance and a Geiger counter are placed into a sealed box for one hour.  If the Geiger counter doesn't detect radiation, then nothing happens and the cat lives.  However if radiation is detected, then the flask is shattered, releasing the poison which kills the cat.  According to the Copenhagen interpretation of quantum mechanics, until the box is opened, the cat is simultaneously alive and dead.  Yet, once you open the box, the cat will either be alive or dead, not a mixture of alive and dead.” 

This was only a thought experiment.  Therefore, no actual cat was harmed. 

This paradox of quantum physics, known as Schrödinger's Cat, poses the question:

  “When does a quantum system stop existing as a mixture of states and become one or the other?”

 

Unfortunately, data quality projects are not thought experiments.  They are complex, time consuming and expensive enterprise initiatives.  Typically, a data quality tool is purchased, expert consultants are hired to supplement staffing, production data is copied to a development server and the project begins.  Until it is completed and the new system goes live, the project is a potential success or failure.  Yet, once the new system starts being used, the project will become either a success or failure.

This paradox, which I refer to as Schrödinger's Data Quality, poses the question:

  “When does a data quality project stop existing as potential success or failure and become one or the other?”

 

Data quality projects should begin with the parallel and complementary efforts of drafting the business requirements while also performing a data quality assessment, which can help you:

  • Verify data matches the metadata that describes it
  • Identify potential missing, invalid and default values
  • Prepare meaningful questions for subject matter experts
  • Understand how data is being used
  • Prioritize critical data errors
  • Evaluate potential ROI of data quality improvements
  • Define data quality standards
  • Reveal undocumented business rules
  • Review and refine the business requirements
  • Provide realistic estimates for development, testing and implementation

Therefore, the data quality assessment assists with aligning perception with reality and gets the project off to a good start by providing a clear direction and a working definition of success.

 

However, a common mistake is to view the data quality assessment as a one-time event that ends when development begins. 

 

Projects should perform iterative data quality assessments throughout the entire development lifecycle, which can help you:

  • Gain a data-centric view of the project's overall progress
  • Build data quality monitoring functionality into the new system
  • Promote data-driven development
  • Enable more effective unit testing
  • Perform impact analysis on requested enhancements (i.e. scope creep)
  • Record regression cases for testing modifications
  • Identify data exceptions that require suspension for manual review and correction
  • Facilitate early feedback from the user community
  • Correct problems that could undermine user acceptance
  • Increase user confidence that the new system will meet their needs

 

If you wait until the end of the project to learn if you have succeeded or failed, then you treat data quality like a game of chance.

And to paraphrase Albert Einstein:

  “Do not play dice with data quality.”