Data Quality and Miracle Exceptions

“Reading superhero comic books with the benefit of a Ph.D. in physics,” James Kakalios explained in The Physics of Superheroes, “I have found many examples of the correct description and application of physics concepts.  Of course, the use of superpowers themselves involves direct violations of the known laws of physics, requiring a deliberate and willful suspension of disbelief.”

“However, many comics need only a single miracle exception — one extraordinary thing you have to buy into — and the rest that follows as the hero and the villain square off would be consistent with the principles of science.”

“Data Quality is all about . . .”

It is essential to foster a marketplace of ideas about data quality in which a diversity of viewpoints is freely shared without bias, where everyone is invited to get involved in discussions and debates and have an opportunity to hear what others have to offer.

However, one of my biggest pet peeves about the data quality industry is when I listen to analysts, vendors, consultants, and other practitioners discuss data quality challenges, I am often required to make a miracle exception for data quality.  In other words, I am given one extraordinary thing I have to buy into in order to be willing to buy their solution to all of my data quality problems.

These superhero comic book style stories usually open with a miracle exception telling me that “data quality is all about . . .”

Sometimes, the miracle exception is purchasing technology from the right magic quadrant.  Other times, the miracle exception is either following a comprehensive framework, or following the right methodology from the right expert within the right discipline (e.g., data modeling, business process management, information quality management, agile development, data governance, etc.).

But I am especially irritated by individuals who bash vendors for selling allegedly only reactive data cleansing tools, while selling their allegedly only proactive defect prevention methodology, as if we could avoid cleaning up the existing data quality issues, or we could shut down and restart our organizations, so that before another single datum is created or business activity is executed, everyone could learn how to “do things the right way” so that “the data will always be entered right, the first time, every time.”

Although these and other miracle exceptions do correctly describe the application of data quality concepts in isolation, by doing so, they also oversimplify the multifaceted complexity of data quality, requiring a deliberate and willful suspension of disbelief.

Miracle exceptions certainly make for more entertaining stories and more effective sales pitches, but oversimplifying complexity for the purposes of explaining your approach, or, even worse and sadly more common, preaching at people that your approach definitively solves their data quality problems, is nothing less than applying the principle of deus ex machina to data quality.

Data Quality and deus ex machina

Deus ex machina is a plot device whereby a seemingly unsolvable problem is suddenly and abruptly solved with the contrived and unexpected intervention of some new event, character, ability, or object.

This technique is often used in the marketing of data quality software and services, where the problem of poor data quality can seemingly be solved by a new event (e.g., creating a data governance council), a new character (e.g., hiring an expert consultant), a new ability (e.g., aligning data quality metrics with business insight), or a new object (e.g., purchasing a new data quality tool).

Now, don’t get me wrong.  I do believe various technologies and methodologies from numerous disciplines, as well as several core principles (e.g., communication, collaboration, and change management) are all important variables in the data quality equation, but I don’t believe that any particular variable can be taken in isolation and deified as the God Particle of data quality physics.

Data Quality is Not about One Extraordinary Thing

Data quality isn’t all about technology, nor is it all about methodology.  And data quality isn’t all about data cleansing, nor is it all about defect prevention.  Data quality is not about only one thing — no matter how extraordinary any one of its things may seem.

Battling the dark forces of poor data quality doesn’t require any superpowers, but it does require doing the hard daily work of continuously improving your data quality.  Data quality does not have a miracle exception, so please stop believing in one.

And for the love of high-quality data everywhere, please stop trying to sell us one.

A Superb Lyrebird is a Superb Liar

Superb Lyrebird

The Superb Lyrebird is a small ground-dwelling Australian bird that is most notable for its superb ability to mimic almost any sound.  During an excellent special that I watched recently on the Discovery Channel, a Superb Lyrebird demonstrated this extraordinary ability by mimicking not only the sounds of many animals, which also included the human voice, but also various musical instruments, power tools such as drills and chainsaws, electronic devices such as car and fire alarms, and even some incredibly realistic sounding gunshots and explosions.

Male lyrebirds use this ability mainly during their song and dance courtship rituals.

As fascinating (well, I find it fascinating) as this information is, you are probably wondering why I am blogging about it. 

No, despite the rumors circulating the Twitterverse, I am not auditioning for my own primetime show on Animal Planet

However, I have recently been participating in the song and dance courtship ritual otherwise known as job interviews.

 

Résumés

I have always found the very concept of a résumé (or a curriculum vitae or far more often nowadays, a LinkedIn profile) to be truly fascinating.  The idea that a well-written document (printed, electronic, or online) that provides a mixture of summarized and detailed information about your professional experience, career goals, job history, academic qualifications, and references, can somehow encapsulate what kind of employee you would make is highly specious—at least in my humble opinion.

I think that the Superb Lyrebird is an excellent metaphor for a résumé because the job seeker is essentially attempting to mimic the sounds that the employer wants to hear.  Do you have an academic degree in a discipline relevant to the job opening?  If not, did you at least graduate from a prestigious college or university?  Does your job history include professional experience relevant to the job opening?  If not, did you at least have some past jobs with either impressive descriptions or titles?  Are your career goals ambitious enough—but not so ambitious that they could be considered potentially threatening to your new direct manager?

I am not suggesting that these questions are completely irrelevant, nor am I suggesting that some level of screening can’t be effectively performed using them.  However, is it really difficult to make sure that your résumé at least sounds good?

 

Gaming the System

Although you can’t embellish your education, you can easily get quite creative with the rest, such as using the right keywords in your job descriptions.  A cursory review (either manual or automated) of keywords is still a very common practice performed by human resources (HR) during the preliminary screening to determine what résumés will reach the desks of hiring managers.

So it would seem that “gaming the system” is what you have to do if you want to secure gainful employment.  In fact, it could be easily argued that the system is purposely designed to be gamed.

This is akin to my university literature professor not really caring what I actually thought about Don QuixoteIf I wanted to pass the final exam, then I had to mimic the professor’s belief that Miguel de Cervantes intended his wonderful novel to be an allegory for the critical but sometimes dangerous role that an active imagination can play in the human experience.

Telling my literature professor what he wanted to hear doesn’t mean that I truly appreciated or even understood the brilliance of the novel.  Although I gained the experience of reading it, passed a course that contributed to my graduation, and can sound good at a dinner party where guests have an interest in discussing the novel with me, does that really qualify me as an expert?

I can play buzzword bingo with the best of the best.  I can quote from the books and blogs of industry thought leaders.  I can customize my résumé so its loaded with all the right keywords.  I can use my Internet prowess to wow you during a telephone interview by using Google and Wikipedia to sound like the smartest man on the planet.  I can cram for the in-person interview like I crammed for my literature final exam because if I do my research well, I will know every question you are going to ask, and I will know exactly how you want me to answer them.

 

A Superb Lyrebird is a Superb Liar

Just like a Superb Lyrebird convincingly mimicking a lion only makes it sound like a lion, and convincingly mimicking what my professor wants to hear only makes me sound like a great student, convincingly mimicking what you are looking for only makes me sound like a potentially great employee.  But how many “lions on paper” or “lions during the interview” have you or your organization hired only to end up with a mostly flightless bird incapable of doing anything other than sounding impressive?

The reason that this happens is incredibly simple—a Superb Lyrebird is a Superb Liar.

However, my point is not to suggest that either job seekers are deceptive or that employers are easily deceived. 

My point is I believe that the system is fundamentally broken because it actually encourages job seekers to act like lyrebirds and actually encourages employers to hire lyrebirds.

In my career, I have been on both sides of the interview desk.  I have made hiring recommendations that resulted in terrible employees, as well as disagreed with hiring decisions that resulted in excellent employees.  I have performed poorly during interviews that resulted in getting hired anyway, as well as performed brilliantly during interviews that resulted in no offer. 

I acknowledge that some truly qualified people, who would make great employees, simply do not interview well.  Some people (including so-called “professional students”) excel at interviews (and in the classroom), but at absolutely nothing else.  Also, some interviewers simply do not know how to conduct a truly effective interview (or in some cases, how to conduct a legal interview).

Therefore, I completely accept that there is no way to perfect the process (and that I am also making sweeping generalizations).

 

Tilting at Windmills

Recently I have been very disappointed with both the questions that I have and have not been asked during an interview. 

I have also been very disappointed to observe interviewers getting frustrated with me for telling them the truth as opposed to telling them what they wanted to hear. 

Perhaps I should just play along like a good little Superb Lyrebird?  It certainly sounds like that is what is expected of me.

After all, The Ingenious Hidalgo Don Quixote of La Mancha is really an allegory about deception, both self-deception and the deception imposed on us by others—and about acknowledging not only the negative, but also the positive aspects of deception.

My good friend Sancho has just arrived, meaning it’s time once again to do battle with the hulking giants and try to slay them. 

Even though I know that I am really only tilting at windmills, for whatever reason, it still always makes me feel better anyway.

Freemium is the future – and the future is now

Earlier this week, two excellent blog posts—Three Ways to Start a Revolution by James Chartrand on Men with Pens, and Your Dream is Under Attack by Nathan Hangen on Copyblogger—discussed the somewhat polarizing debate about making money from blogging, which is one of many examples of the so-called “freemium” business model, which was first articulated in 2006 by venture capitalist Fred Wilson:

“Give your service away for free, acquire a lot of customers very efficiently through word of mouth and referral networks, then offer premium priced, value added services or an enhanced version of your service to your customer base.”

In 2009, Chris Anderson published the book Free: The Future of a Radical Price, which among numerous other coverage, was critically reviewed in the article Priced to Sell by Malcolm Gladwell, and discussed in an interview conducted by Charlie Rose.

 

Isn't everything on the Internet supposed to be free?

The freemium model, as well as the concept expressed in Anderson's book, is not entirely about the Internet.  However, it is most often at the center of polarized debates because more and more businesses, in varying degrees, are becoming online businesses.

General public perception is that the Internet is free—getting on the Internet does have a cost (sometimes conveniently ignored), in terms of electricity, ISPs, and the various computer and mobile devices used to access it.  However, once you are connected, the content on the Internet is either free or is supposed to be free—according to the “logic” of a very common perspective.

To be fair, this is somewhat understandable, especially given the fact that many of the most popular online services, such as Twitter, Facebook, and YouTube, to name but three examples from countless others, are in fact, free – and their users often defiantly claim that they would never pay any amount of money for such a service.

 

So how does the Internet make money?

The Internet has traditionally made money the same way broadcast television (also “free” when you conveniently ignore the cost of electricity, cable and satellite providers, and the various devices used to access it) has traditionally made money – advertising.

Paraphrasing (and oversimplifying) the words of Chris Anderson, the three generations of making money on the Internet:

  1. Pop-Up Ads – in the beginning was the Pop-up Ad—and it was not good.  Do you still remember (or are you old enough to remember) the early days of the Internet?  Nearly every website you visited brought the seemingly random attack of pop-up ads.  Even after the invention of pop-up blockers and the advent of alternatives to pop-up ads, online advertising was not very context sensitive and not only annoying, but also largely ineffective.

     

  2. Google AdSense – the next generation of advertising was basically pioneered by Google (or companies they now own).  Exemplified by the now somewhat ubiquitous Google AdSense, ads specific to website content provided online advertising that is both less annoying and seemingly far more effective.

     

  3. Freemium – we are just entering the third generation of making money on the Internet, and the first one not ruled by advertising—at least not advertising in the “traditional” sense.  Under this new model, free online content is made available to everyone—providing the opportunity to “up-sell” premium content to a (typically small) percentage of your audience.

 

Freemium is NOT a new concept

Although many Internet users become seemingly outraged by the very notion of the option to purchase premium content, the idea of giving away something for free in order to facilitate a potential purchase is by no means a new concept.

Just a few simple examples include:

  • Samples at the mall food court are free, but you have to pay to eat a full meal
  • Movie previews are free, but you have to pay to watch an entire movie
  • Broadcast television shows are free, but you have to pay for the DVD box sets

The Internet, however, has seemingly always been viewed as a special case.

I believe this is mostly due to the ratio of free to premium.  Food samples, movie previews, and an individual episode of a television show, are small compared to the size of a full meal, a full-length movie, and a full season (or series) of episodes.

In other words, what we get for free isn't much, so paying for the rest makes more sense.  On the Internet, this ratio is reversed. 

Since almost everything on the Internet is free (again, after the cost of connection), we are genuinely, and perhaps really quite understandably, surprised or even annoyed when we encounter something that we are asked to pay for.

In other words, since we get so much for free, paying just to get a little more simply doesn't seem to make sense. 

After all, if the full meals at the mall food court were free, we certainly wouldn't pay just to eat samples.

(And yes—I do realize that was a terrible analogy on so many levels—so please stop yelling at me.)

 

Isn't freemium the end of the world as we know it?

Obviously, the real issue is not the ratio of free to premium, or how much you should (or should not) expect to get for free. 

The fundamental argument is that anything you pay for should be worth the price.

Historically, price has been the indicator of value, meaning something has value only if people are willing to pay for it.  Higher prices, in theory at least, indicate higher value, especially if people are willing to purchase at the higher price.

So, if people are willing to pay for it, then this indicates there is a demand for it, for which a supply of it must be produced. 

(And yes—I do realize that was a huge oversimplification of economic theory—so yet again, please stop yelling at me.)

One of the most common counter-arguments to the freemium model is that if price is allowed to essentially drop to zero, then there will be no way to accurately measure demand, which means there will be no way for content producers to determine what to supply.  Furthermore, if almost everything is free, then why would content consumers be willing to pay for anything at all.

If nobody is willing to pay, then nobody can possibly get paid, and all online content will be completely user-generated, and following Andrew Keen's argument in The Cult of the Amateur, a cultural apocalypse occurs, which results in not only the Internet, but the entirety of human expression, being reduced to us hurling our feces at each other just like our primate cousins.

(You may feel free to resume yelling at me now.)

 

Freemium is the future—and the future is now

Obviously, the freemium business model doesn't only apply to blogging.  By the way, it is totally understandable if you had forgotten that my lunatic fringe was ignited by the debate over making money from blogging.

Freemium is the future of most of the business world—and the harsh reality is—the future has already arrived.

In my opinion, too many people, companies, and in some cases, entire industries, are wasting their time, effort, and money trying to fight the unrelenting reality of freemium.  Instead of refusing to accept that the price of what you are now offering may be falling essentially to zero—focus on creating something new that people would be willing to pay for.

Once again, to paraphrase Chris Anderson, “free” is only one of many markets—and only one of many additional pricing levels. 

Don't stop at thinking about just two versions of each individual product or service—one free version and one premium version.  You should be thinking about one free version and multiple tiers of premium.  Value still drives price.  Therefore, if you can truly add more value at each tier, then you can successfully demand a higher price.

Freemium works as a viable model because people will always be willing to pay a premium for something worth its price.

If you can't (or can no longer) produce something your customers are willing to pay for—that's your problem, not theirs.

Data Quality Whitepapers are Worthless

During a 1609 interview, William Shakespeare was asked his opinion about an emerging genre of theatrical writing known as Data Quality Whitepapers.  The "Bard of Avon" was clearly not a fan.  His famously satirical response was:

Data quality's but a writing shadow, a poor paper

That struts and frets its words upon the page

And then is heard no more:  it is a tale

Told by a vendor, full of sound and fury

Signifying nothing.

 

Four centuries later, I find myself in complete agreement with Shakespeare (and not just because Harold Bloom told me so).

 

Today is April Fool's Day, but I am not joking around - call Dennis Miller and Lewis Black - because I am ready to RANT.

 

I am sick and tired of reading whitepapers.  Here is my "Bottom Ten List" explaining why: 

  1. Ones that make me fill out a "please mercilessly spam me later" contact information form before I am allowed to download them remind me of Mrs. Bun: "I DON'T LIKE SPAM!"
  2. Ones that after I read their supposed pearls of wisdom, make me shake my laptop violently like an Etch-A-Sketch.  I have lost count of how many laptops I have destroyed this way.  I have starting buying them in bulk at Wal-Mart.
  3. Ones comprised entirely of the exact same information found on the vendor's website make www = World Wide Worthless.
  4. Ones that start out good, but just when they get to the really useful stuff, refer to content only available to paying customers.  What a great way to guarantee that neither I nor anyone I know will ever become your paying customer!
  5. Ones that have a "Shock and Awe" title followed by "Aw Shucks" content because apparently the entire marketing budget was spent on the title.
  6. Ones that promise me the latest BUZZ but deliver only ZZZ are not worthless only when I have insomnia.
  7. Ones that claim to be about data quality, but have nothing at all to do with data quality:  "...don't make me angry.  You wouldn't like me when I'm angry."
  8. Ones that take the adage "a picture is worth a thousand words" too far by using a dizzying collage of logos, charts, graphs and other visual aids.  This is one reason we're happy that Pablo Picasso was a painter.  However, he did once write that "art is a lie that makes us realize the truth."  Maybe he was defending whitepapers.
  9. Ones that use acronyms without ever defining what they stand for remind me of that scene from Good Morning, Vietnam: "Excuse me, sir.  Seeing as how the VP is such a VIP, shouldn't we keep the PC on the QT?  Because if it leaks to the VC he could end up MIA, and then we'd all be put out in KP."
  10. Ones that really know they're worthless but aren't honest about it.  Don't promise me "The Top 10 Metrics for Data Quality Scorecards" and give me a list as pointless as this one.

 

I am officially calling out all writers of Data Quality Whitepapers. 

Shakespeare and I both believe that you can't write anything about data quality that is worth reading. 

Send your data quality whitepapers to Obsessive-Compulsive Data Quality and if it is not worthless, then I will let the world know that you proved Shakespeare and I wrong.

 

And while I am on a rant roll, I am officially calling out all Data Quality Bloggers.

The International Association for Information and Data Quality (IAIDQ) is celebrating its five year anniversary by hosting:

El Festival del IDQ Bloggers – A Blog Carnival for Information/Data Quality Bloggers

For more information about the blog carnival, please follow this link:  IAIDQ Blog Carnival