An example of the challenge of data accuracy and the possible misinformation provided by key performance metrics inspired by the investigative reporting of the HBO satirical news show Last Week Tonight with John Oliver.Read More
An example of the importance of performing some basic data quality checks before displaying information, inspired by a graphic about the recent referendum on Scotland’s independence from Great Britain.Read More
While the use of a postal validation service is a highly recommended best practice for ensuring valid addresses are entered when data is created, just because you have valid data doesn’t guarantee that you have accurate data.Read More
Over the weekend, in preparation for watching the Boston Red Sox, I bought some beer and pizza. Later that night, after a thrilling victory that sent the Red Sox to the 2013 World Series, I was cleaning up the kitchen and was about to throw out the receipt when I couldn’t help but notice two data quality issues.
First, although I had purchased Samuel Adams Octoberfest, the receipt indicated I had bought Spring Ale, which, although it’s still available in some places and it’s still good beer, it’s three seasonal beers (Summer Ale, Winter Lager, Octoberfest) old. This data quality issue impacts the store’s inventory and procurement systems (e.g., maybe the store orders more Spring Ale next year because people were apparently still buying it in October this year).
The second, and far more personal, data quality issue was that the age verification portion of my receipt indicated I was born on or before November 22, 1922, making me at least 91 years old! While I am of the age (42) typical of a midlife crisis, I wasn’t driving a new red sports car, just wearing my old Red Sox sports jersey and hat. As for the store, this data quality issue could be viewed as a regulatory compliance failure since it seems like their systems are set up by default to allow the sale of alcohol without proper age verification. Additionally, this data quality issue might make it seem like their only alcohol-purchasing customers are very senior citizens.
What examples (good or poor) of data quality have you encountered? Please share them by posting a comment below.
As an avid reader, I tend to redeem most of my American Express Membership Rewards points for Barnes & Noble gift cards to buy new books for my Nook. As a data quality expert, I tend to notice when something is amiss with data. As shown above, for example, my recent gift card was apparently issued on — and only available for use until — January 1, 1900.
At first, I thought I might have encountered the time traveling gift card. However, I doubted the gift card would be accepted as legal tender in 1900. Then I thought my gift card was actually worth $1,410 (what $50 in 1900 would be worth today), which would allow me to buy a lot more books — as long as Barnes & Noble would overlook the fact the gift card expired 113 years ago.
Fortunately, I was able to use the gift card to purchase $50 worth of books in 2013.
So, I guess the moral of this story is that sometimes poor data quality does pay. However, it probably never pays to display your poor data quality to someone who runs an obsessive-compulsive data quality blog with a series about data quality by example.
What examples (good or poor) of data quality have you encountered in your time travels?
I recently received my invitation to the Data Governance and Information Quality Conference, which will be held June 27-30 in San Diego, California at the Catamaran Resort Hotel and Spa. Well, as shown above, I actually received both of my invitations.
Although my postal address is complete, accurate, and exactly the same on both of the invitations, my name is slightly different (“James” vs. “Jim”), and my title (“Data Quality Journalist” vs. “Blogger-in-Chief”) and company (“IAIDQ” vs. “OCDQ Blog”) are both completely different. I wonder how many of the data quality software vendors sponsoring this conference would consider my invitations to be duplicates. (Maybe I’ll use the invitations to perform a vendor evaluation on the exhibit floor.)
So it would seem that even “The Premier Event in Data Governance and Data Quality” can experience data quality problems.
No worries, I doubt the invitation system will be one of the “Practical Approaches and Success Stories” presented—unless it’s used as a practical approach to a success story about demonstrating how embarrassing it might be to send duplicate invitations to a data quality journalist and blogger-in-chief. (I wonder if this blog post will affect the approval of my Press Pass for the event.)
Okay, on a far more serious note, you should really consider attending this event. As the conference agenda shows, there will be great keynote presentations, case studies, tutorials, and other sessions conducted by experts in data governance and data quality, including (among many others) Larry English, Danette McGilvray, Mike Ferguson, David Loshin, and Thomas Redman.
The term “valued customer” is bandied about quite frequently and is often at the heart of enterprise data management initiatives such as Customer Data Integration (CDI), 360° Customer View, and Customer Master Data Management (MDM).
The role of data quality in these initiatives is an important, but sometimes mistakenly overlooked, consideration.
For example, the Service Contract Renewal Notice (shown above) I recently received exemplifies the impact of poor data quality on Customer Relationship Management (CRM) since one of my service providers wants me—as a valued customer—to purchase a new service contract for one of my laptop computers.
Let’s give them props for generating a 100% accurate residential postal address, since how could I even consider renewing my service contract if I don’t receive the renewal notice in the mail? Let’s also acknowledge my Customer ID is also 100% accurate, since that is the “unique identifier” under which I have purchased all of my products and services from this company.
However, the biggest data quality mistake is that the name of their “Valued Customer” is not INDEPENDENT CONSULTANT. (And they get bonus negative points for writing it in ALL CAPS).
The moral of the story is that if you truly value your customers, then you should truly value your customer data quality.
At the very least—get your customer’s name right.
Photo via Flickr by: Leo Reynolds
Like truth, beauty, and singing ability, data quality is in the eyes of the beholder.
Data’s quality is determined by evaluating its fitness for the purpose of use. However, in the vast majority of cases, data has multiple uses, and data of sufficient quality for one use may not be of sufficient quality for other uses.
Therefore, to be more accurate, data quality is in the eyes of the user.
The perspective of the user provides a relative context for data quality. Many argue an absolute context for data quality exists, one which is independent of the often conflicting perspectives of different users.
This absolute context is often referred to as a “Single Version of the Truth.”
As one example of the challenges inherent in this data quality key concept, let’s consider if there is a “Single Version of the Time.”
Single Version of the Time
I am writing this blog post at 10:00 AM. I am using time in a relative context, meaning that from my perspective it is 10 o’clock in the morning. I live in the Central Standard time zone (CST) of the United States.
My friend in Europe would say that I am writing this blog post at 5:00 PM. He is also using time in a relative context, meaning that from his perspective it is 5 o’clock in the afternoon. My friend lives in the Central European time zone (CET).
We could argue that an absolute time exists, as defined by Coordinated Universal Time (UTC). Local times around the world can be expressed as a relative time using positive or negative offsets from UTC. For example, my relative time is UTC-6 and my friend’s relative time is UTC+1. Alternatively, we could use absolute time and say that I am writing this blog post at 16:00 UTC.
Although using an absolute time is an absolute necessity if, for example, my friend and I wanted to schedule a time to have a telephone (or Skype) discussion, it would be confusing to use UTC when referring to events relative to our local time zone.
In other words, the relative context of the user’s perspective is valid and an absolute context independent of the perspectives of different users is also valid—especially whenever a shared perspective is necessary in order to facilitate dialogue and discussion.
Therefore, instead of calling UTC a Single Version of the Time, we could call it a Shared Version of the Time and when it comes to the data quality concept of a Single Version of the Truth, perhaps it’s time we started calling it a Shared Version of the Truth.
“Good morning sir!” said the smiling gentleman behind the counter—and a little too cheerily for 5 o’clock in the morning. “Welcome to the check-in counter for Data Quality Airlines. My name is Edward. How may I help you today?”
“Good morning Edward,” I replied. “My name is John Smith. I am traveling to Boston today on flight number 221.”
“Thank you for choosing Data Quality Airlines!” responded Edward. “May I please see your driver’s license, passport, or other government issued photo identification so that I can verify your data accuracy.”
As I handed Edward my driver’s license, I explained “it’s an old photograph in which I was clean-shaven, wearing contact lenses, and ten pounds lighter” since I now had a full beard, was wearing glasses, and, to be honest, was actually thirty pounds heavier.
“Oh,” said Edward, his plastic smile morphing into a more believable and stern frown. “I am afraid you are on the No Fly List.”
“Oh, that’s right—because of my name being so common!” I replied while fumbling through my backpack, frantically searching for the piece of paper, which I then handed to Edward. “I’m supposed to give you my Redress Control Number.”
“Actually, you’re supposed to use your Redress Control Number when making your reservation,” Edward retorted.
“In other words,” I replied, while sporting my best plastic smile, “although you couldn’t verify the accuracy of my customer data when I made my reservation on-line last month, you were able to verify the authorization to immediately charge my credit card for the full price of purchasing a non-refundable plane ticket to fly on Data Quality Airlines.”
“I don’t appreciate your sense of humor,” replied Edward. “Everyone at Data Quality Airlines takes accuracy very seriously.”
Edward printed my boarding pass, wrote BCS on it in big letters, handed it to me, and with an even more plastic smile cheerily returning to his face, said: “Please proceed to the security checkpoint. Thank you again for choosing Data Quality Airlines!”
“Boarding pass?” asked the not-at-all smiling woman at the security checkpoint. After I handed her my boarding pass, she said, “And your driver’s license, passport, or other government issued photo identification so that I can verify your data accuracy.”
“I guess my verified data accuracy at the Data Quality Airlines check-in counter must have already expired,” I joked as I handed her my driver’s license. “It’s an old photograph in which I was clean-shaven, wearing contact lenses, and ten pounds lighter.”
The woman silently examined my boarding pass and driver’s license, circled BCS with a magic marker, and then shouted over her shoulder to a group of not-at-all smiling security personnel standing behind her: “Randomly selected security screening!”
One of them, a very large man, stepped toward me as the sound from the snap of the fresh latex glove he had just placed on his very large hand echoed down the long hallway that he was now pointing me toward. “Right this way sir,” he said with a smile.
Ten minutes later, as I slowly walked to the gate for Data Quality Airlines Flight Number 221 to Boston, the thought echoing through my mind was that there is no such thing as data accuracy—there are only verifiable assertions of data accuracy . . .