The Hawthorne Effect, Helter Skelter, and Data Governance

In his book The Half-life of Facts: Why Everything We Know Has an Expiration Date, Samuel Arbesman introduced me to the Hawthorne Effect, which is “when subjects behave differently if they know they are being studied.  The effect was named after what happened in a factory called Hawthorne Works outside Chicago in the 1920s and 1930s.”

“Scientists wished to measure,” Arbesman explained, “the effects of environmental changes on the productivity of workers.  They discovered whatever they did to change the workers’ behaviors — whether they increased the lighting or altered any other aspect of the environment — resulted in increased productivity.  However, as soon as the study was completed, productivity dropped.  The researchers concluded that the observations themselves were affecting productivity and not the experimental changes.”

I couldn’t help but wonder how the Hawthorne Effect could affect a data governance program.  When data governance policies are first defined, and their associated procedures and processes are initially implemented, after a little while, and usually after a little resistance, productivity often increases and the organization begins to advance its data governance maturity level.

Perhaps during these early stages employees are well-aware that they’re being observed to make sure they’re complying with the new data governance policies, and this observation itself accounts for advancing to the next maturity level.  Especially since after progress stops being studied so closely, it’s not uncommon for an organization to backslide to an earlier maturity level.

You might be tempted to conclude that continuous monitoring, especially of the Orwellian Big Brother variety, might be able to prevent this from happening, but I doubt it.  Data governance maturity is often misperceived in the same way that expertise is misperceived — as a static state that once achieved signifies a comforting conclusion to all the grueling effort that was required, either to become an expert, or reach a particular data governance maturity level.

However, just like the five stages of data quality, oscillating between different levels of data governance maturity, and perhaps even occasionally coming full circle, may be an inevitable part of the ongoing evolution of a data governance program, which can often feel like a top-down/bottom-up amusement park ride of the Beatles “Helter Skelter” variety:

When you get to the bottom, you go back to the top, where you stop and you turn, and you go for a ride until you get to the bottom — and then you do it again.

Come On Tell Me Your Answers

Do you, don’t you . . . think the Hawthorne Effect affects data governance?

Do you, don’t you . . . think data governance is Helter Skelter?

Tell me, tell me, come on tell me your answers — by posting a comment below.

Big Data and the Infinite Inbox

Occasionally it’s necessary to temper the unchecked enthusiasm accompanying the peak of inflated expectations associated with any hype cycle.  This may be especially true for big data, and especially now since, as Svetlana Sicular of Gartner recently blogged, big data is falling into the trough of disillusionment and “to minimize the depth of the fall, companies must be at a high enough level of analytical and enterprise information management maturity combined with organizational support of innovation.”

I fear the fall may feel bottomless for those who fell hard for the hype and believe the Big Data Psychic capable of making better, if not clairvoyant, predictions.  When, in fact, “our predictions may be more prone to failure in the era of big data,” explained Nate Silver in his book The Signal and the Noise: Why Most Predictions Fail but Some Don't.  “There isn’t any more truth in the world than there was before the Internet.  Most of the data is just noise, as most of the universe is filled with empty space.”

Proposing the 3Ss (Small, Slow, Sure) as a counterpoint to the 3Vs (Volume, Velocity, Variety), Stephen Few recently blogged about the slow data movement.  “Data is growing in volume, as it always has, but only a small amount of it is useful.  Data is being generated and transmitted at an increasing velocity, but the race is not necessarily for the swift; slow and steady will win the information race.  Data is branching out in ever-greater variety, but only a few of these new choices are sure.”

Big data requires us to revisit information overload, a term that was originally about, not the increasing amount of information, but instead the increasing access to information.  As Clay Shirky stated, “It’s not information overload, it’s filter failure.”

As Silver noted, the Internet (like the printing press before it) was a watershed moment in our increased access to information, but its data deluge didn’t increase the amount of truth in the world.  And in today’s world, where many of us strive on a daily basis to prevent email filter failure and achieve what Merlin Mann called Inbox Zero, I find unfiltered enthusiasm about big data to be rather ironic, since big data is essentially enabling the data-driven decision making equivalent of the Infinite Inbox.

Imagine logging into your email every morning and discovering: You currently have () Unread Messages.

However, I’m sure most of it probably would be spam, which you obviously wouldn’t have any trouble quickly filtering (after all, infinity minus spam must be a back of the napkin calculation), allowing you to only read the truly useful messages.  Right?

 

Related Posts

HoardaBytes and the Big Data Lebowski

OCDQ Radio - Data Quality and Big Data

Open MIKE Podcast — Episode 05: Defining Big Data

Will Big Data be Blinded by Data Science?

Data Silence

Magic Elephants, Data Psychics, and Invisible Gorillas

The Graystone Effects of Big Data

Information Overload Revisited

Exercise Better Data Management

A Tale of Two Datas

A Statistically Significant Resolution for 2013

It’s Not about being Data-Driven

Big Data, Sporks, and Decision Frames

Big Data: Structure and Quality

Darth Vader, Big Data, and Predictive Analytics

Big Data, Predictive Analytics, and the Ideal Chronicler

The Big Data Theory

Swimming in Big Data

What Magic Tricks teach us about Data Science

What Mozart for Babies teaches us about Data Science

Popeye, Spinach, and Data Quality

As a kid, one of my favorite cartoons was Popeye the Sailor, who was empowered by eating spinach to take on many daunting challenges, such as battling his brawny nemesis Bluto for the affections of his love interest Olive Oyl, often kidnapped by Bluto.

I am reading the book The Half-life of Facts: Why Everything We Know Has an Expiration Date by Samuel Arbesman, who explained, while examining how a novel fact, even a wrong one, spreads and persists, that one of the strangest examples of the spread of an error is related to Popeye the Sailor.  “Popeye, with his odd accent and improbable forearms, used spinach to great effect, a sort of anti-Kryptonite.  It gave him his strength, and perhaps his distinctive speaking style.  But why did Popeye eat so much spinach?  What was the reason for his obsession with such a strange food?”

The truth begins over fifty years before the comic strip made its debut.  “Back in 1870,” Arbesman explained, “Erich von Wolf, a German chemist, examined the amount of iron within spinach, among many other green vegetables.  In recording his findings, von Wolf accidentally misplaced a decimal point when transcribing data from his notebook, changing the iron content in spinach by an order of magnitude.  While there are actually only 3.5 milligrams of iron in a 100-gram serving of spinach, the accepted fact became 35 milligrams.  Once this incorrect number was printed, spinach’s nutritional value became legendary.  So when Popeye was created, studio executives recommended he eat spinach for his strength, due to its vaunted health properties, and apparently Popeye helped increase American consumption of spinach by a third!”

“This error was eventually corrected in 1937,” Arbesman continued, “when someone rechecked the numbers.  But the damage had been done.  It spread and spread, and only recently has gone by the wayside, no doubt helped by Popeye’s relative obscurity today.  But the error was so widespread, that the British Medical Journal published an article discussing this spinach incident in 1981, trying its best to finally debunk the issue.”

“Ultimately, the reason these errors spread,” Arbesman concluded, “is because it’s a lot easier to spread the first thing you find, or the fact that sounds correct, than to delve deeply into the literature in search of the correct fact.”

What “spinach” has your organization been falsely consuming because of a data quality issue that was not immediately obvious, and which may have led to a long, and perhaps ongoing, history of data-driven decisions based on poor quality data?

Popeye said “I yam what I yam!”  Your organization yams what your data yams, so you had better make damn sure it’s correct.

 

Related Posts

The Family Circus and Data Quality

Can Data Quality avoid the Dustbin of History?

Retroactive Data Quality

Spartan Data Quality

Pirates of the Computer: The Curse of the Poor Data Quality

The Tooth Fairy of Data Quality

The Dumb and Dumber Guide to Data Quality

Darth Data

Occurred, a data defect has . . .

The Data Quality Placebo

Data Quality is People!

DQ-View: The Five Stages of Data Quality

DQ-BE: Data Quality Airlines

Wednesday Word: Quality-ish

The Five Worst Elevator Pitches for Data Quality

Shining a Social Light on Data Quality

The Poor Data Quality Jar

Data Quality and #FollowFriday the 13th

Dilbert, Data Quality, Rabbits, and #FollowFriday

Data Love Song Mashup

Open Source Business Intelligence

OCDQ Radio is a vendor-neutral podcast about data quality and its related disciplines, produced and hosted by Jim Harris.

During this episode, I discuss open source business intelligence (OSBI) with Lyndsay Wise, author of the insightful new book Using Open Source Platforms for Business Intelligence: Avoid Pitfalls and Maximize ROI.

Lyndsay Wise is the President and Founder of WiseAnalytics, an independent analyst firm and consultancy specializing in business intelligence for small and mid-sized organizations.  For more than ten years, Lyndsay Wise has assisted clients in business systems analysis, software selection, and implementation of enterprise applications.

Lyndsay Wise conducts regular research studies, consults, writes articles, and speaks about how to implement a successful business intelligence approach and improving the value of business intelligence within organizations.

Related OCDQ Radio Episodes

Clicking on the link will take you to the episode’s blog post:

  • Studying Data Quality — Guest Gordon Hamilton discusses the key concepts from recommended data quality books, including those which he has implemented in his career as a data quality practitioner.

Data Quality and Anton’s Syndrome

In his book Incognito: The Secret Lives of the Brain, David Eagleman discussed aspects of a bizarre, and rare, brain disorder called Anton’s Syndrome in which a stroke renders a person blind — but the person denies their blindness.

“Those with Anton’s Syndrome truly believe they are not blind,” Eagleman explained.  “It is only after bumping into enough furniture and walls that they begin to feel that something is amiss.  They are experiencing what they take to be vision, but it is all internally generated.  The external data is not getting to the right places because of the stroke, and so their reality is simply that which is generated by the brain, with little attachment to the real world.  In this sense, what they experience is no different from dreaming, drug trips, or hallucinations.”

Data quality practitioners often complain that business leaders are blind to the importance of data quality to business success, or that they deny data quality issues exist in their organization.  As much as we wish it wasn’t so, often it isn’t until business leaders bump into enough of the negative effects of poor data quality that they begin to feel that something is amiss.  However, as much as we would like to, we can’t really attribute their denial to drug-induced hallucinations.

Sometimes an illusion-of-quality effect is caused when data is excessively filtered and cleansed before it reaches business leaders, perhaps as the result of a perception filter for data quality issues created as a natural self-defense mechanism by the people responsible for the business processes and technology surrounding data, since no one wants to be blamed for causing, or failing to fix, data quality issues.  Unfortunately, this might really leave the organization’s data with little attachment to the real world.

In fairness, sometimes it’s also the blind leading the blind because data quality practitioners often suffer from business blindness by presenting data quality issues without providing business context, without relating data quality metrics in a tangible manner to how the business uses data to support a business process, accomplish a business objective, or make a business decision.

A lot of the disconnect between business leaders, who believe they are not blind to data quality, and data quality practitioners, who believe they are not blind to business context, comes from a crisis of perception.  Each side in this debate believes they have a complete vision, but it’s only after bumping into each other enough times that they begin to envision the organizational blindness caused when data quality is not properly measured within a business context and continually monitored.

 

Related Posts

Data Quality and Chicken Little Syndrome

Data Quality and Miracle Exceptions

Data Quality: Quo Vadimus?

Availability Bias and Data Quality Improvement

Finding Data Quality

“Some is not a number and soon is not a time”

The Data Quality of Dorian Gray

The Data Quality Wager

DQ-View: The Five Stages of Data Quality

Data Quality and the Bystander Effect

Data Quality and the Q Test

Why isn’t our data quality worse?

The Illusion-of-Quality Effect

Perception Filters and Data Quality

WYSIWYG and WYSIATI

Predictably Poor Data Quality

Data Psychedelicatessen

Data Geeks and Business Blindness

The Real Data Value is Business Insight

Is your data accurate, but useless to your business?

Data Quality Measurement Matters

Data Myopia and Business Relativity

Data and its Relationships with Quality

Plato’s Data

An Enterprise Resolution

This blog post is sponsored by the Enterprise CIO Forum and HP.

Since just before Christmas I posted An Enterprise Carol, I decided just before New Year’s to post An Enterprise Resolution.

In her article The Irrational Allure of the Next Big Thing, Karla Starr examined why people value potential over achievement in books, sports, and politics.  However, her findings apply equally well to technology and the enterprise’s relationship with IT.

“Subjectivity and hype,” Starr explained, “make people particularly prone to falling for Next Best Thing-ism.”

“Our collective willingness to jump on the bandwagon,” Starr continued, “seems at odds with one of psychology’s most robust findings: We are averse to uncertainty.  But as it turns out, our reaction to incomplete information depends on our interpretation of the scant data we do have.  Uncertainty is a sort of amplifier, intensifying our response whether it’s positive or negative.  As long as we react positively to the little information shown, we’re actually attracted to uncertainty.  It’s curiosity rather than knowledge that leads to increased cognitive engagement.  If the only information at hand is positive, your mind is going to fill in the gaps with other positive details.  A whiff of positive information is all we need to set our minds aflutter.”

In his book Thinking, Fast and Slow, Daniel Kahneman explained “when people are favorably disposed toward a technology, they rate it as offering large benefits and imposing little risk; when they dislike a technology, they can think only of its disadvantages, and few advantages come to mind.  People who receive a message extolling the benefits of a technology also change their beliefs about its risks.  Good technologies have few costs in the imaginary world we inhabit, bad technologies have no benefits, and all decisions are easy.  In the real world of course, we often face painful tradeoffs between benefits and costs.”

In his book What Technology Wants, Kevin Kelly explained that technology has a social dimension beyond the mere functionality it provides.  “We adopt new technologies largely because of what they do for us, but also in part because of what they mean to us.  Often we refuse to adopt technology for the same reason: because of how the avoidance reinforces or shapes our identity.”

So, in 2013, as the big data hype cycle comes down from the peak of inflated expectations, as the painful tradeoffs between the benefits and costs of cloud computing are faced, and as IT consumerization continues to reshape the identity of the IT function, let’s make an enterprise resolution to deal with these realities before we go off chasing the next best thing.  Happy New Year!

This blog post is sponsored by the Enterprise CIO Forum and HP.

 

Related Posts

An Enterprise Carol

Why does the sun never set on legacy applications?

Are Applications the La Brea Tar Pits for Data?

The Diffusion of the Consumerization of IT

Serving IT with a Side of Hash Browns

The Cloud is shifting our Center of Gravity

A Swift Kick in the AAS

Sometimes all you Need is a Hammer

Shadow IT and the New Prometheus

The IT Consumerization Conundrum

The Diderot Effect of New Technology

More Tethered by the Untethered Enterprise?

The Return of the Dumb Terminal

Magic Elephants, Data Psychics, and Invisible Gorillas

Big Data el Memorioso

Information Overload Revisited

The Limitations of Historical Analysis

OCDQ Radio - The Evolution of Enterprise Security

Enterprise Security and Social Engineering

Can the Enterprise really be Secured?

Big Data is not just for Big Businesses

“It is widely assumed that big data, which imbues a sense of grandiosity, is only for those large enterprises with enormous amounts of data and the dedicated IT staff to tackle it,” opens the recent article Big data: Why it matters to the midmarket.

Much of the noise generated these days about the big business potential of big data certainly seems to contain very little signal directed at small and midsize businesses.  Although it’s true that big businesses generate more data, faster, and in more varieties, a considerable amount of big data is externally generated, much of which is freely available for use by businesses of all sizes.

The easiest example is the poster child for leveraging big data — Google Search.  But there’s also a growing number of open data sources (e.g., weather data) and social data sources (e.g., Twitter), and, since more of the world is becoming directly digitized, more businesses are now using more data no matter how big they are.  Additionally, as Phil Simon wrote about in The New Small, the free and open source software, as-a-service, cloud, mobile, and social technology trends driving the consumerization of IT are enabling small and midsize businesses to, among other things, use more data and be more competitive with big businesses.

“Each minute of every day, information is produced about the activities of your business, your customers, and your industry,” explained Sarita Harbour in her recent blog post Harnessing Big Data: Giving Midsize Business a Competitive Edge.  “Hidden within this enormous amount of data are trends, patterns, and indicators that, if extracted and identified, can yield important information to make your business more efficient and more competitive, and ultimately, it can make you more money.”

However, the biggest driver of the misperception about big data is its over-identification with data volume.  Which is why earlier this year in his blog post It’s time for a new definition of big data, Robert Hillard used several examples to explain that big data refers more to big complexity than big volume.  While acknowledging that complex datasets tend to grow rapidly, thus making big data voluminous, his wonderfully pithy conclusion was that “big data can be very small and not all large datasets are big.”

Therefore, by extension we could say that the businesses using big data can be small, or mid-sized, and not all the businesses using big data are big.  But, of course, that’s not quite pithy enough.  So let’s simply say that big data is not just for big businesses.

 

This post was written as part of the IBM for Midsize Business program, which provides midsize businesses with the tools, expertise and solutions they need to become engines of a smarter planet.

 

Related Posts

Will Big Data be Blinded by Data Science?

Big Data Lessons from Orbitz

The Graystone Effects of Big Data

Word of Mouth has become Word of Data

Information Asymmetry versus Empowered Customers

Talking Business about the Weather

Magic Elephants, Data Psychics, and Invisible Gorillas

Open MIKE Podcast — Episode 05: Defining Big Data

Open MIKE Podcast — Episode 06: Getting to Know NoSQL

OCDQ Radio - Data Quality and Big Data

HoardaBytes and the Big Data Lebowski

Sometimes it’s Okay to be Shallow

How Predictable Are You?

The Wisdom of Crowds, Friends, and Experts

Exercise Better Data Management

A Tale of Two Datas

Darth Vader, Big Data, and Predictive Analytics

The Big Data Theory

Data Management: The Next Generation

Big Data: Structure and Quality

The Wisdom of Crowds, Friends, and Experts

I recently finished reading the TED Book by Jim Hornthal, A Haystack Full of Needles, which included an overview of the different predictive approaches taken by one of the most common forms of data-driven decision making in the era of big data, namely, the recommendation engines increasingly provided by websites, social networks, and mobile apps.

These recommendation engines primarily employ one of three techniques, choosing to base their data-driven recommendations on the “wisdom” provided by either crowds, friends, or experts.

 

The Wisdom of Crowds

In his book The Wisdom of Crowds, James Surowiecki explained that the four conditions characterizing wise crowds are diversity of opinion, independent thinking, decentralization, and aggregation.  Amazon is a great example of a recommendation engine using this approach by assuming that a sufficiently large population of buyers is a good proxy for your purchasing decisions.

For example, Amazon tells you that people who bought James Surowiecki’s bestselling book also bought Thinking, Fast and Slow by Daniel Kahneman, Crowdsourcing: Why the Power of the Crowd Is Driving the Future of Business by Jeff Howe, and Wikinomics: How Mass Collaboration Changes Everything by Don Tapscott.  However, Amazon neither provides nor possesses knowledge of why people bought all four of these books or qualification of the subject matter expertise of these readers.

However, these concerns, which we could think of as potential data quality issues, and which would be exacerbated within a small amount of transaction data where the eclectic tastes and idiosyncrasies of individual readers would not help us decide what books to buy, within a large amount of transaction data, we achieve the Wisdom of Crowds effect when, taken in aggregate, we receive a general sense of what books we might like to read based on what a diverse group of readers collectively makes popular.

As I blogged about in my post Sometimes it’s Okay to be Shallow, sometimes the aggregated, general sentiment of a large group of unknown, unqualified strangers will be sufficient to effectively make certain decisions.

 

The Wisdom of Friends

Although the influence of our friends and family is the oldest form of data-driven decision making, historically this influence was delivered by word of mouth, which required you to either be there to hear those influential words when they were spoken, or have a large enough network of people you knew that would be able to eventually pass along those words to you.

But the rise of social networking services, such as Twitter and Facebook, has transformed word of mouth into word of data by transcribing our words into short bursts of social data, such as status updates, online reviews, and blog posts.

Facebook “Likes” are a great example of a recommendation engine that uses the Wisdom of Friends, where our decision to buy a book, see a movie, or listen to a song might be based on whether or not our friends like it.  Of course, “friends” is used in a very loose sense in a social network, and not just on Facebook, since it combines strong connections such as actual friends and family, with weak connections such as acquaintances, friends of friends, and total strangers from the periphery of our social network.

Social influence has never ended with the people we know well, as Nicholas Christakis and James Fowler explained in their book Connected: The Surprising Power of Our Social Networks and How They Shape Our Lives.  But the hyper-connected world enabled by the Internet, and further facilitated by mobile devices, has strengthened the social influence of weak connections, and these friends form a smaller crowd whose wisdom is involved in more of our decisions than we may even be aware of.

 

The Wisdom of Experts

Since it’s more common to associate wisdom with expertise, Pandora is a great example of a recommendation engine that uses the Wisdom of Experts.  Pandora used a team of musicologists (professional musicians and scholars with advanced degrees in music theory) to deconstruct more than 800,000 songs into 450 musical elements that make up each performance, including qualities of melody, harmony, rhythm, form, composition, and lyrics, as part of what Pandora calls the Music Genome Project.

As Pandora explains, their methodology uses precisely defined terminology, a consistent frame of reference, redundant analysis, and ongoing quality control to ensure that data integrity remains reliably high, believing that delivering a great radio experience to each and every listener requires an incredibly broad and deep understanding of music.

Essentially, experts form the smallest crowd of wisdom.  Of course, experts are not always right.  At the very least, experts are not right about every one of their predictions.  Nor do experts always agree with other, which is why I imagine that one of the most challenging aspects of the Music Genome Project is getting music experts to consistently apply precisely the same methodology.

Pandora also acknowledges that each individual has a unique relationship with music (i.e., no one else has tastes exactly like yours), and allows you to “Thumbs Up” or “Thumbs Down” songs without affecting other users, producing more personalized results than either the popularity predicted by the Wisdom of Crowds or the similarity predicted by the Wisdom of Friends.

 

The Future of Wisdom

It’s interesting to note that the Wisdom of Experts is the only one of these approaches that relies on what data management and business intelligence professionals would consider a rigorous approach to data quality and decision quality best practices.  But this is also why the Wisdom of Experts is the most time-consuming and expensive approach to data-driven decision making.

In the past, the Wisdom of Crowds and Friends was ignored in data-driven decision making for the simple reason that this potential wisdom wasn’t digitized.  But now, in the era of big data, not only are crowds and friends digitized, but technological advancements combined with cost-effective options via open source (data and software) and cloud computing make these approaches quicker and cheaper than the Wisdom of Experts.  And despite the potential data quality and decision quality issues, the Wisdom of Crowds and/or Friends is proving itself a viable option for more categories of data-driven decision making.

I predict that the future of wisdom will increasingly become an amalgamation of experts, friends, and crowds, with the data and techniques from all three potential sources of wisdom often acknowledged as contributors to data-driven decision making.

 

Related Posts

Sometimes it’s Okay to be Shallow

Word of Mouth has become Word of Data

The Wisdom of the Social Media Crowd

Data Management: The Next Generation

Exercise Better Data Management

Darth Vader, Big Data, and Predictive Analytics

Data-Driven Intuition

The Big Data Theory

Finding a Needle in a Needle Stack

Big Data, Predictive Analytics, and the Ideal Chronicler

The Limitations of Historical Analysis

Magic Elephants, Data Psychics, and Invisible Gorillas

OCDQ Radio - Data Quality and Big Data

Big Data: Structure and Quality

HoardaBytes and the Big Data Lebowski

The Data-Decision Symphony

OCDQ Radio - Decision Management Systems

A Tale of Two Datas

Social Business is more than Social Marketing

Although much of the early business use of social media was largely focused on broadcasting marketing messages at customers, social media transformed word of mouth into word of data and empowered customers to add their voice to marketing messages, forcing marketing to evolve from monologues to dialogues.  But is the business potential of social media limited to marketing?

During the MidMarket IBM Social Business #Futurecast, a panel discussion from earlier this month, Ed Brill, author of the forthcoming book Opting In: Lessons in Social Business from a Fortune 500 Product Manager, defined the term social business as “an organization that engages employees in a socially-enabled process that brings together how employees interact with each other, partners, customers, and the marketplace.  It’s about bringing all the right people, both internally and externally, together in a conversation to solve problems, be innovative and responsive, and better understand marketplace dynamics.”

“Most midsize businesses today,” Laurie McCabe commented, “are still grappling with how to supplement traditional applications and tools with some of the newer social business tools.  Up until now, the focus has been on integrating social media into a lot of marketing communications, and we haven’t yet seen the integration of social media into other business processes.”

“Midsize businesses understand,” Handly Cameron remarked, “how important it is to get into social media, but they’re usually so focused on daily operations that they think that a social business is simply one that uses social media, and therefore they cite the facts that they created Twitter and Facebook accounts as proof that they are a social business, but again, they are focusing on external uses of social media and not internal uses such as improving employee collaboration.”

Collaboration was a common theme throughout the panel discussion.  Brill said a social business is one that has undergone the cultural transformation required to embrace the fact that it is a good idea to share knowledge.  McCabe remarked that the leadership of a social business rewards employees for sharing knowledge, not for hoarding knowledge.  She also emphasized the importance of culture before tools since simply giving individuals social tools will not automatically create a collaborative culture.

Cameron also noted how the widespread adoption of cloud computing and mobile devices is helping to drive the adoption of social tools for collaboration, and helping to break down a lot of the traditional boundaries to knowledge sharing, especially as more organizations are becoming less bounded by the physical proximity of their employees, partners, and customers.

From my perspective, even though marketing might have been how social media got in the front door of many organizations, social media has always been about knowledge sharing and collaboration.  And with mobile, cloud, and social technologies so integrated into our personal and professional lives, life and business are both more social and collaborative than ever before.  So, even if collaboration isn’t in the genes of your organization, it’s no longer possible to put the collaboration genie back in the bottle.

 

This post was written as part of the IBM for Midsize Business program, which provides midsize businesses with the tools, expertise and solutions they need to become engines of a smarter planet.

 

Related Posts

Social Media Marketing: From Monologues to Dialogues

OCDQ Radio - Social Media for Midsize Businesses

Word of Mouth has become Word of Data

Information Asymmetry versus Empowered Customers

OCDQ Radio - Social Media Strategy

The Challenging Gift of Social Media

Listening and Broadcasting

Quality is more important than Quantity

Demystifying Social Media

Social Karma

The Limitations of Historical Analysis

This blog post is sponsored by the Enterprise CIO Forum and HP.

“Those who cannot remember the past are condemned to repeat it,” wrote George Santayana in the early 20th century to caution us about not learning the lessons of history.  But with the arrival of the era of big data and dawn of the data scientist in the early 21st century, it seems like we no longer have to worry about this problem since not only is big data allowing us to digitize history, data science is also building us sophisticated statistical models from which we can analyze history in order to predict the future.

However, “every model is based on historical assumptions and perceptual biases,” Daniel Rasmus blogged. “Regardless of the sophistication of the science, we often create models that help us see what we want to see, using data selected as a good indicator of such a perception.”  Although perceptual bias is a form of the data silence I previously blogged about, even absent such a bias, there are limitations to what we can predict about the future based on our analysis of the past.

“We must remember that all data is historical,” Rasmus continued. “There is no data from or about the future.  Future context changes cannot be built into a model because they cannot be anticipated.”  Rasmus used the example that no models of retail supply chains in 1962 could have predicted the disruption eventually caused by that year’s debut of a small retailer in Arkansas called Wal-Mart.  And no models of retail supply chains in 1995 could have predicted the disruption eventually caused by that year’s debut of an online retailer called Amazon.  “Not only must we remember that all data is historical,” Rasmus explained, “but we must also remember that at some point historical data becomes irrelevant when the context changes.”

As I previously blogged, despite what its name implies, predictive analytics can’t predict what’s going to happen with certainty, but it can predict some of the possible things that could happen with a certain probability.  Another important distinction is that “there is a difference between being uncertain about the future and the future itself being uncertain,” Duncan Watts explained in his book Everything is Obvious (Once You Know the Answer).  “The former is really just a lack of information — something we don’t know — whereas the latter implies that the information is, in principle, unknowable.  The former is an orderly universe, where if we just try hard enough, if we’re just smart enough, we can predict the future.  The latter is an essentially random world, where the best we can ever hope for is to express our predictions of various outcomes as probabilities.”

“When we look back to the past,” Watts explained, “we do not wish that we had predicted what the search market share for Google would be in 1999.  Instead we would end up wishing we’d been able to predict on the day of Google’s IPO that within a few years its stock price would peak above $500, because then we could have invested in it and become rich.  If our prediction does not somehow help to bring about larger results, then it is of little interest or value to us.  We care about things that matter, yet it is precisely these larger, more significant predictions about the future that pose the greatest difficulties.”

Although we should heed Santayana’s caution and try to learn history’s lessons in order to factor into our predictions about the future what was relevant from the past, as Watts cautioned, there will be many times when “what is relevant can’t be known until later, and this fundamental relevance problem can’t be eliminated simply by having more information or a smarter algorithm.”

Although big data and data science can certainly help enterprises learn from the past in order to predict some probable futures, the future does not always resemble the past.  So, remember the past, but also remember the limitations of historical analysis.

This blog post is sponsored by the Enterprise CIO Forum and HP.

 

Related Posts

Data Silence

Magic Elephants, Data Psychics, and Invisible Gorillas

OCDQ Radio - Data Quality and Big Data

Big Data: Structure and Quality

WYSIWYG and WYSIATI

Will Big Data be Blinded by Data Science?

Big Data el Memorioso

Information Overload Revisited

HoardaBytes and the Big Data Lebowski

The Data-Decision Symphony

OCDQ Radio - Decision Management Systems

The Big Data Theory

Finding a Needle in a Needle Stack

Darth Vader, Big Data, and Predictive Analytics

Data-Driven Intuition

A Tale of Two Datas

Availability Bias and Data Quality Improvement

The availability heuristic is a mental shortcut that occurs when people make judgments based on the ease with which examples come to mind.  Although this heuristic can be beneficial, such as when it helps us recall examples of a dangerous activity to avoid, sometimes it leads to availability bias, where we’re affected more strongly by the ease of retrieval than by the content retrieved.

In his thought-provoking book Thinking, Fast and Slow, Daniel Kahneman explained how availability bias works by recounting an experiment where different groups of college students were asked to rate a course they had taken the previous semester by listing ways to improve the course — while varying the number of improvements that different groups were required to list.

Counterintuitively, students in the group required to list more necessary improvements gave the course a higher rating, whereas students in the group required to list fewer necessary improvements gave the course a lower rating.

According to Kahneman, the extra cognitive effort expended by the students required to list more improvements biased them into believing it was difficult to list necessary improvements, leading them to conclude that the course didn’t need much improvement, and conversely, the little cognitive effort expended by the students required to list few improvements biased them into concluding, since it was so easy to list necessary improvements, that the course obviously needed improvement.

This is counterintuitive because you’d think that the students would rate the course based on an assessment of the information retrieved from their memory regardless of how easy that information was to retrieve.  It would have made more sense for the course to be rated higher for needing fewer improvements, but availability bias lead the students to the opposite conclusion.

Availability bias can also affect an organization’s discussions about the need for data quality improvement.

If you asked stakeholders to rate the organization’s data quality by listing business-impacting incidents of poor data quality, would they reach a different conclusion if you asked them to list one incident versus asking them to list at least ten incidents?

In my experience, an event where poor data quality negatively impacted the organization, such as a regulatory compliance failure, is often easily dismissed by stakeholders as an isolated incident to be corrected by a one-time data cleansing project.

But would forcing stakeholders to list ten business-impacting incidents of poor data quality make them concede that data quality improvement should be supported by an ongoing program?  Or would the extra cognitive effort bias them into concluding, since it was so difficult to list ten incidents, that the organization’s data quality doesn’t really need much improvement?

I think that the availability heuristic helps explain why most organizations easily approve reactive data cleansing projects, and availability bias helps explain why most organizations usually resist proactively initiating a data quality improvement program.

 

Related Posts

DQ-View: The Five Stages of Data Quality

Data Quality: Quo Vadimus?

Data Quality and Chicken Little Syndrome

The Data Quality Wager

You only get a Return from something you actually Invest in

“Some is not a number and soon is not a time”

Why isn’t our data quality worse?

Data Quality and the Bystander Effect

Data Quality and the Q Test

Perception Filters and Data Quality

Predictably Poor Data Quality

WYSIWYG and WYSIATI

 

Related OCDQ Radio Episodes

Clicking on the link will take you to the episode’s blog post:

  • Organizing for Data Quality — Guest Tom Redman (aka the “Data Doc”) discusses how your organization should approach data quality, including his call to action for your role in the data revolution.
  • The Johari Window of Data Quality — Guest Martin Doyle discusses helping people better understand their data and assess its business impacts, not just the negative impacts of bad data quality, but also the positive impacts of good data quality.
  • Redefining Data Quality — Guest Peter Perera discusses his proposed redefinition of data quality, as well as his perspective on the relationship of data quality to master data management and data governance.
  • Studying Data Quality — Guest Gordon Hamilton discusses the key concepts from recommended data quality books, including those which he has implemented in his career as a data quality practitioner.

Social Media for Midsize Businesses

OCDQ Radio is a vendor-neutral podcast about data quality and its related disciplines, produced and hosted by Jim Harris.

During this episode, Paul Gillin and I discuss social media for midsize businesses, including how the less marketing you do, the more effective you will be with social media marketing, the war of generosity, where the more you give, the more you get, and the importance of the trust equation, which means the more people trust you, the more they will want to do business with you.

Paul Gillin is a veteran technology journalist and a thought leader in new media.  Since 2005, he has advised marketers and business executives on strategies to optimize their use of social media and online channels to reach buyers cost-effectively.  He is a popular speaker who is known for his ability to simplify complex concepts using plain talk, anecdotes, and humor.

Paul Gillin is the author of four books about social marketing: The New Influencers (2007), Secrets of Social Media Marketing (2008), Social Marketing to the Business Customer (2011), co-authored with Eric Schwartzman, and the forthcoming book Attack of the Customers (2012), co-authored with Greg Gianforte.

Paul Gillin was previously the founding editor of TechTarget and editor-in-chief of Computerworld.  He writes a monthly column for BtoB magazine and is an active blogger and media commentator.  He has appeared as an expert commentator on CNN, PBS, Fox News, MSNBC, and other television outlets.  He has also been quoted or interviewed for hundreds of news and radio reports in outlets such as The Wall Street Journal, The New York Times, NPR, and the BBC.  Paul Gillin is a Senior Research Fellow and member of the board of directors at the Society for New Communications Research.

Cooks, Chefs, and Data Governance

In their book Practical Wisdom, Barry Schwartz and Kenneth Sharpe quoted retired Lieutenant Colonel Leonard Wong, who is a Research Professor of Military Strategy in the Strategic Studies Institute at the United States Army War College, focusing on the human and organizational dimensions of the military.

“Innovation,” Wong explained, “develops when an officer is given a minimal number of parameters (e.g., task, condition, and standards) and the requisite time to plan and execute the training.  Giving the commanders time to create their own training develops confidence in operating within the boundaries of a higher commander’s intent without constant supervision.”

According to Wong, too many rules and requirements “remove all discretion, resulting in reactive instead of proactive thought, compliance instead of creativity, and adherence instead of audacity.”  Wong believed that it came down to a difference between cooks, those who are quite adept at carrying out a recipe, and chefs, those who can look at the ingredients available to them and create a meal.  A successful military strategy is executed by officers who are trained to be chefs, not cooks.

Data Governance’s Kitchen

Data governance requires the coordination of a complex combination of a myriad of factors, including executive sponsorship, funding, decision rights, arbitration of conflicting priorities, policy definition, policy implementation, data quality remediation, data stewardship, business process optimization, technology enablement, and, perhaps most notably, policy enforcement.

Because of this complexity, many organizations think the only way to run data governance’s kitchen is to institute a bureaucracy that dictates policies and demands compliance.  In other words, data governance policies are recipes and employees are cooks.

Although implementing data governance policies does occasionally require a cook-adept-at-carrying-out-a-recipe mindset, the long-term success of a data governance program is going to also require chefs since the dynamic challenges faced, and overcome daily, by business analysts, data stewards, technical architects, and others, exemplify today’s constantly changing business world, which can not be successfully governed by forcing employees to systematically apply rules or follow rigid procedures.

Data governance requires chefs who are empowered with an understanding of the principles of the policies, and who are trusted to figure out how to best implement the policies in a particular business context by combining rules with the organizational ingredients available to them, and creating a flexible procedure that operates within the boundaries of the policy’s principles.

But, of course, just like a military can not be staffed entirely by officers, and a kitchen can not be staffed entirely by chefs, in order to implement a data governance program successfully, an organization needs both cooks and chefs.

Similar to how data governance is neither all-top-down nor all-bottom-up, it’s also neither all-cook nor all-chef.

Only the unique corporate culture of your organization can determine how to best staff your data governance kitchen.

The Return of the Dumb Terminal

This blog post is sponsored by the Enterprise CIO Forum and HP.

In his book What Technology Wants, Kevin Kelly observed “computers are becoming ever more general-purpose machines as they swallow more and more functions.  Entire occupations and their workers’ tools have been subsumed by the contraptions of computation and networks.  You can no longer tell what a person does by looking at their workplace, because 90 percent of employees are using the same tool — a personal computer.  Is that the desk of the CEO, the accountant, the designer, or the receptionist?  This is amplified by cloud computing, where the actual work is done on the net as a whole and the tool at hand merely becomes a portal to the work.  All portals have become the simplest possible window — a flat screen of some size.”

Although I am an advocate for cloud computing and cloud-based services, sometimes I can’t help but wonder if cloud computing is turning our personal computers back into that simplest of all possible windows that we called the dumb terminal.

Twenty years ago, at the beginning of my IT career, when I was a mainframe production support specialist, my employer gave me a dumb terminal to take home for connecting to the mainframe via my dial-up modem.  Since I used it late at night when dealing with nightly production issues, the aptly nicknamed green machine (its entirely text-based display used bright green characters) would make my small apartment eerily glow green, which convinced my roommate and my neighbors that I was some kind of mad scientist performing unsanctioned midnight experiments with radioactive materials.

The dumb terminal was so-called because, when not connected to the mainframe, it was essentially a giant paperweight since it provided no offline functionality.  Nowadays, our terminals (smartphones, tablets, and laptops) are smarter, but in some sense, with more functionality moving to the cloud, even though they provide varying degrees of offline functionality, our terminals get dumbed back down when they’re not connected to the web or a mobile network, because most of what we really need is online.

It can even be argued that smartphones and tablets were actually designed to be dumb terminals because they intentionally offer limited offline data storage and computing power, and are mostly based on a mobile-app-portal-to-the-cloud computing model, which is well-supported by the widespread availability of high-speed network connectivity options (broadband, mobile, Wi-Fi).

Laptops (and the dwindling number of desktops) are the last bastions of offline data storage and computing power.  Moving more of those applications and data to the cloud would help eliminate redundant applications and duplicated data, and make it easier to use the right technology for a specific business problem.  And if most of our personal computers were dumb terminals, then our smart people could concentrate more on the user experience aspects of business-enabling information technology.

Perhaps the return of the dumb terminal is a smart idea after all.

This blog post is sponsored by the Enterprise CIO Forum and HP.

 

Related Posts

A Swift Kick in the AAS

The UX Factor

The Partly Cloudy CIO

Are Cloud Providers the Bounty Hunters of IT?

The Cloud Security Paradox

Sometimes all you Need is a Hammer

Why does the sun never set on legacy applications?

Are Applications the La Brea Tar Pits for Data?

The Diffusion of the Consumerization of IT

More Tethered by the Untethered Enterprise?

How Data Cleansing Saves Lives

When it comes to data quality best practices, it’s often argued, and sometimes quite vehemently, that proactive defect prevention is far superior to reactive data cleansing.  Advocates of defect prevention sometimes admit that data cleansing is a necessary evil.  However, at least in my experience, most of the time they conveniently, and ironically, cleanse (i.e., drop) the word necessary.

Therefore, I thought I would share a story about how data cleansing saves lives, which I read about in the highly recommended book Space Chronicles: Facing the Ultimate Frontier by Neil deGrasse Tyson.  “Soon after the Hubble Space Telescope was launched in April 1990, NASA engineers realized that the telescope’s primary mirror—which gathers and reflects the light from celestial objects into its cameras and spectrographs—had been ground to an incorrect shape.  In other words, the two-billion dollar telescope was producing fuzzy images.  That was bad.  As if to make lemonade out of lemons, though, computer algorithms came to the rescue.  Investigators at the Space Telescope Science Institute in Baltimore, Maryland, developed a range of clever and innovative image-processing techniques to compensate for some of Hubble’s shortcomings.”

In other words, since it would be three years before Hubble’s faulty optics could be repaired during a 1993 space shuttle mission, data cleansing allowed astrophysicists to make good use of Hubble despite the bad data quality of its early images.

So, data cleansing algorithms saved Hubble’s fuzzy images — but how did this data cleansing actually save lives?

“Turns out,” Tyson explained, “maximizing the amount of information that could be extracted from a blurry astronomical image is technically identical to maximizing the amount of information that can be extracted from a mammogram.  Soon the new techniques came into common use for detecting early signs of breast cancer.”

“But that’s only part of the story.  In 1997, for Hubble’s second servicing mission, shuttle astronauts swapped in a brand-new, high-resolution digital detector—designed to the demanding specifications of astrophysicists whose careers are based on being able to see small, dim things in the cosmos.  That technology is now incorporated in a minimally invasive, low-cost system for doing breast biopsies, the next stage after mammograms in the early diagnosis of cancer.”

Even though defect prevention was eventually implemented to prevent data quality issues in Hubble’s images of outer space, those interim data cleansing algorithms are still being used today to help save countless human lives here on Earth.

So, at least in this particular instance, we have to admit that data cleansing is a necessary good.